In machine learning and AI, it's typically the case that we need to explicitly train a model on data examples in order for the model to be able to classify examples of that type.
So for example, if you need a model to correctly classify dogs, then you typically need to explicitly train the model on examples of dogs. But under some conditions, a model will be able to classify examples that were entirely absent from the training data. We call this zero-shot learning.
Zero-shot learning is a situation where models correctly classify previously unseen classes, and it's a very useful property of AI/ML models when we have scarce data.
Zero-shot learning (ZSL) is a machine learning approach where a model makes predictions on new, previously unseen classes. Importantly, in zero-shot learning, the classes for which the model makes predictions are absent in the original training data.
This is in contrast to traditional machine learning approaches that typically require large amounts of labeled data in order to classify examples. Instead, zero-shot learning commonly relies on knowledge transfer and other techniques to make predictions about new classes. (We'll look at some of the specific techniques that we can use for zero-shot later in this article.)
That said, we can use zero-shot learning in situations where labeled data is scarce. It's particularly useful in certain natural language processing (NLP) and computer vision tasks, where we need the model to generalize to new classes that were absent in the training data.
Zero-shot and few-shot learning are similar in that we use them both in situations where labeled data is scarce. However, they are different in terms of how they handle this issue of scarce data.
First, these two techniques are different in terms of the number of shots. So in few-shot learning, there are a few "shots" ... meaning a few labeled training examples per class. But with zero-shot learning, there are no new training examples for the new classes.
Zero-shot and few-shot also take slightly different approaches.
Zero-shot tends to rely on auxiliary knowledge built into pre-trained models, as well as other external information like textual descriptions and semantic attributes.
Few-shot learning employs techniques like meta learning and data augmentation to help models adapt to new training examples quickly. So few-shot employs techniques to help it learn from a few training examples, whereas these techniques are largely unavailable for zero-shot learning, since the very nature of zero-shot means that we use zero new examples.
Similarly, the differences between zero-shot and one-shot are related to the number of shots, and the approaches that we use for these paradigms.
So, zero-shot and one-shot differ first in terms of the number of shots. As mentioned previously, the number of "shots" is the number of labeled examples per class.
One-shot learning uses only one new example per class. And zero-shot learning uses no new examples.
In terms of approaches, zero-shot relies on the knowledge inside pre-trained models and semantic descriptions to generalize to new examples.
But one-shot (somewhat similar to few-shot) can use other techniques like optimization-based meta learning and metric-based meta learning that are unavailable to zero-shot, since they require one or more shots to implement.
There are several broad categories of methods that we use for zero-shot learning, to enable models to recognize previously unseen categories:
Attribute-based techniques depend on pre-defined semantic features that describe the characteristics of every class.
For example, in the case of image recognition tasks, these features can be things like color, shape, size, and texture. During training, the model learns to associate input data with these attributes. Later, when the model encounters a previously unseen class, the model uses the known attributes of that new class to predict its label.
This approach helps to bridge the gap between previously learned classes and new (previously unseen) classes by relying on shared attributes.
Embedding-based methods use semantic embedding spaces to help solve the problem of zero-shot learning.
These methods map the input features and the class labels to a shared semantic embedding space. These systems use word embeddings or sentence embeddings (i.e., from a language model) to encode the class labels, and a neural network encodes the input data. The model can then predict new, previously unseen classes by computing the similarity between the embeddings of the input data and the embeddings of the class labels.
Effectively, this method leverages semantic relationships from an embedding space to help the model generalize to previously unseen classes.
With generative methods, we use techniques like Generative Adversarial Networks (GANS) or Variational Autoencoders (VAEs).
When we use these techniques, they take semantic information about the unseen classes and create synthetic examples. We then train a model on real data from a proper training set and the synthetic examples for the new (previously unseen) classes. This improves the ability of a model to recognize new classes.
Knowledge graph techniques use structured graph methods to represent relationships between classes and their attributes. We can then incorporate this knowledge graph information into a model, which in turn, can help the model understand how new, unseen classes are related to previously seen classes that were in the training data.
Essentially, the knowledge graph data, which encodes information about relationships between classes, enables a model to infer properties of unseen classes, based on how those unseen classes connect to other classes in the graph.
Methods of this type often use graph neural networks or other techniques that can learn from graph-structured data.
Zero-shot learning has a variety of advantages in machine learning and AI systems.
One of the primary benefits of zero-shot learning gets to the heart of what ZSL does: zero-shot learning enables models to classify data examples that were absent in the training data.
This is particularly useful in dynamic and fast-changing environments where classes are likely to change or where new classes are likely to appear, such as evolving language models and real-time computer vision systems.
In AI and machine learning, collecting datasets and labeling examples for all classes is frequently time-consuming and expensive.
Zero-shot learning can reduce the financial and time costs by reducing the need for large labeled datasets, especially in situations where classes are likely to change. Furthermore, this reduced need for labeled data is even more important for tasks where data is particularly expensive or difficult to acquire.
Zero-shot learning also helps systems scale in tasks where there are potentially numerous classes, or if the classes might change. Traditional AI and machine learning systems often struggle with adapting and scaling for a wide range of new classes.
Zero-Shot Learning helps models handle this by enabling models to adapt to new classes in the absence of training examples for those classes. Again, we particularly see this benefit in areas like natural language processing (NLP) and real-time image classification, where the range of classes is large and often evolves over time.
As mentioned several times previously in this article, by leveraging relationships and shared attributes between classes, ZSL enables models to generalize to new and previously unseen classes. This helps models adapt to new tasks without extensive retraining on large datasets.
In turn, this adaptability enhances the flexibility and utility of models in changing environments.
Zero-shot learning has a variety of applications.
In computer vision tasks, ZSL allows models to recognize or classify object classes that were absent in the training data.
For example, in zoology, researchers might encounter new species or rarely seen species.
A model that's trained well for zero-shot learning (i.e., trained on a variety of common species) might be able to recognize these rare species by leveraging descriptive/semantic attributes like fur, color, habitat, and size.
In natural language processing tasks like sentiment analysis and text classification, zero-shot learning enables systems to analyze and classify text without explicit training examples.
For example, a chatbot being used for customer service might be able to understand new types of user requests by leveraging existing semantic relationships learned by the language model. This would enhance the ability of the chatbot to handle a broader range of user issues, and make the chatbot system more flexible.
Zero-shot is very useful in healthcare, particularly in medical diagnostics. ZSL can help models diagnose rare conditions where we might have limited training data due to the rarity of the disease.
By transferring and leveraging existing knowledge about symptoms from other conditions, zero-shot learning might be able to suggest diagnoses for conditions that were absent in the original training data.
In tasks like language translation, zero-shot can help models adapt to languages that were absent from the training data by leveraging shared features and attributes as well as semantic embeddings.
In turn, this can help models perform well on tasks like translating text or performing sentiment analysis on rare languages with limited training resources.
To sum up, zero-shot learning helps models become more flexible and adaptable in circumstances with limited training data.
In turn, the adaptability afforded by zero-shot enhances model utility in fields like NLP, computer vision, medical diagnostics and helps us use models in real-world situations where data is expensive, scarce, or continuously changing.
Share this post: