Evaluating Generative Vision Models: Insights into the Fréchet Inception Distance and CLIP

Mayank Khanduja (~mayank0)


1

Vote

Description:

There are plenty of Generative AI models for Computer Vision in the industry and numerous techniques to evaluate those models. However, none of the evaluation techniques can surpass the human level of evaluation for these models. Unlike object detection, segmentation and classification models, which machines can evaluate better than humans, generative models like Stable Diffusion, StyleGAN, CycleGAN, and Pix2Pix require both quantitative and manual inspection to evaluate their performance.
If a user sees hundreds of generative pretrained vision models on online platforms like HuggingFace or Kaggle, how will they evaluate and select the best model suitable for their dataset? Additionally, if they train their own model, how will they know how robust the trained model is?
Manually inspecting thousands of generated images by the model is quite challenging and time-consuming; hence, an evaluation metric is required to measure it quantitatively. This is a highly active area of research currently.

We will cover two metrics

  • Fréchet Inception Distance (FID) which compares two sets of images by leveraging a pre-trained InceptionNet model.
  • CLIP Score which measures compatibility of Image-Caption pairs by leveraging a pre-trained CLIP model.

These metrics are industry standard for their respective tasks currently.

Outline of the Talk

  • Challenges faced while evaluating Generative vision models.
  • Properties of AI generated images/text that need to be considered for evaluation
  • Gain an intuitive understanding of FID and CLIP score.
  • Understand the metrics mathematically and derive the formula.
  • Implementation of the derived formula in python.
  • Apply the python code on a use case by comparing two generative models on a particular dataset.
  • Look into a few caveats of these metrics on some uncommon datasets and explore how we can tackle them.

In the code, we will use PyTorch and some basic Python libraries for our task.

Key takeaways from the talk

  • You will get to know how close these metrics perform as compared to human judgement.
  • Researchers will be able to compare their models with the existing benchmarks.
  • Developers will get to know when to replace their models based on these metrics without human intervention.
  • You will deeply understand the equations behind these metrics and implement them in python from scratch.

Prerequisites:

Basic knowledge of Python, Convolutional Neural Networks and Statistics

Speaker Info:

Mayank Khanduja is a Data Scientist at Esri R&D center with five years of experience in the industry. He has worked on the development of Generative Adversarial Networks in his current organization and has also published informative guides and blogs.

Section: Data Science, AI & ML
Type: Talks
Target Audience: Intermediate
Last Updated: