Model training is at the core of machine learning and AI. You train the model to perform on a task, and you want the model to perform well. That's the point. But sometimes, you train a model on a dataset with a particular set of characteristics, but you later need the model to perform well on data with related (but different characteristics).
For example, training a model on images with good light conditions, but then you need to use the model on images with bad light conditions.
How do you solve this problem? You can use domain adaptation.
In this post, I'm going to explain what domain adaptation is, how it works, some techniques that we use to implement this technique, and more.
Domain adaptation is a subtype of transfer learning that enables machine learning models that we train on one particular domain – which is commonly called the source domain – to perform well in a different domain, commonly called the target domain.
To perform domain adaptation, there are several specific techniques and algorithms that have emerged. We can categorize domain adaptation techniques roughly into three categories:
But whether we're using instance-based, feature-based, or parameter-based methods, all of these techniques attempt to minimize the differences between the source and target domains.
As for applications, we commonly use domain adaptation for a range of machine learning and AI tasks like natural language processing (NLP), speech recognition, and computer vision. We often apply domain adaptation specifically in tasks where labeled data is difficult or expensive to get for the target domain. In such cases, transferring model knowledge from one domain to another can make model training easier and cheaper.
Domain adaptation and transfer learning are similar, and in fact, related, but there are differences.
The first thing to note is that domain adaptation is a type of transfer learning. Transfer learning is a broader family of techniques that involve transferring knowledge from one model (typically a pre-trained model built for one specific domain or task) to another model. In transfer learning, we're attempting to transfer model "knowledge" (often the internal representations learned by the model) to another model in order to improve the target model's performance in some way.
Importantly, we can apply transfer learning in situations where the tasks are quite different. The aim is to leverage the learned information for one task and use it to improve performance on some different task. For example, we might train a model for image classification, but then "transfer" the knowledge of the image classification model to an object detection model.
On the other hand, domain adaptation is slightly different. In domain adaptation, the tasks are typically similar or even exactly the same. However, in domain adaptation, the data distributions for the source and target domains are different. Instead of transferring knowledge in order to perform well on new tasks, domain adaptation transfers knowledge to perform well in a new target domain where the data distribution or the conditions are different. So for example, we might first train a model on clear images that are well-lit, and then adapt that model to perform well on images that are somewhat blurry or poorly-lit. Almost exactly the same task, but different conditions (i.e., different light conditions in the images).
There is also another difference related to the data. Domain adaptation requires access to some target domain data. But in contrast to transfer learning, the data for the target domain may or may not be labeled. In fact, in many cases, we only have unlabeled data for the target domain. This is in contrast to many other types of transfer learning, where we sometimes do have new examples to fine-tune the new model on the new target task. Therefore, assuming minimal training data for domain adaptation, we often need to rely on special techniques to navigate the shift from the source domain to the target domain.
Domain generalization and domain adaptation are similar in that they both focus on how to improve model performance across differing domains. However, they have different assumptions that underpin how and when they are used, and they address different learning problems.
In domain adaptation, we're attempting to train a model on data from one particular domain, the source domain, and then adapt that model to data for a particular target domain (which may or may not be labeled). In domain adaptation, we're attempting to "adapt" a model to work on new data for the target domain, in spite of any differences between the data distributions for the source and target domains. To accomplish this, we attempt to decrease the gap between the distributions in the source and target domains with techniques like fine-tuning, feature alignment, and instance re-weighting.
But in domain generalization, we're trying to build a model that can generalize well to new, previously unseen target domains, without explicit exposure to those new domains during the training process. So in domain generalization, we're trying to help the model learn in a way that makes that model perform well across a range of domains and environments. This is particularly useful in situations where the target domain might be unknown, or the target domain for the model might change over time.
Domain adaptation assumes that we have access to data for the source domain and also assumes we have access to data for the target domain. However, the data for the target domain might not be labeled. Furthermore, it's important to highlight that typically, for the source domain, we have lots of data, but for the target domain, we might have very limited data.
In contrast, in domain generalization, instead of having data for a particular source domain, we typically have data from multiple source domains. As for the target domain, we may not have data for the target. In fact, for domain generalization, the target domain might even be unknown! Instead of trying to adapt a model for a particular known target domain (where we might have training data), domain generalization attempts to build a model that's robust across a wider range of potential domains.
Ultimately, domain adaptation has a specific target domain, and we need access to data for that target domain. But domain generalization lacks a specific target domain, and instead attempts to build models that generalize well to unseen domains.
There are many different techniques that we can use for domain adaptation, but four of the most important ones are:
Instance reweighting weights examples (i.e., "instances") in the source domain based on how similar they are to examples in the target domain.
Thus, instance reweighting attempts to prioritize examples that are more similar to data that would be found in the target domain. This enables the model trained with instance reweighting to generalize better to the target domain.
Adversarial domain adaptation is a technique for domain adaptation that draws inspiration from Generative Adversarial Networks.
This technique uses two elements: a feature generator and a domain classifier.
These two elements work in opposition to each other. The classifier attempts to differentiate the source and target domains, while the generator attempts to learn features that make it impossible to distinguish the two domains. These two competing elements (one trying to differentiate and the other trying to make it impossible to differentiate the domains) generate a feature space that works well for both domains (i.e., it makes a domain-invariant feature space). This allows the model to work well on both the source and target domains.
In fine-tuning, we take a model that we trained with data from the source domain and then adapt that model by continuing to train it on data from the new target domain. So in fine-tuning, we take an existing model, and then further adjust the model weights by training with additional data from the target domain (often with gradient descent).
This technique works well when we have a relatively large amount of data from the source domain (for training the original model), and we also have at least a small amount of data for the new target domain.
Domain-adversarial neural networks introduce a special classifier – a domain classifier – into the training process.
The role of the domain classifier is to predict which domain a data example comes from ... the source domain or target domain. So the domain classifier is like a classification model that predicts source domain or target domain. But at the same time, DANNs also try to generate features that confuse the domain classifier in an attempt to make the domain classification process more difficult.
This adversarial process between a classifier attempting to distinguish between source and target domains and a feature generation process that's attempting to frustrate that classifier ultimately forces the DANN to learn domain-invariant features. That is, DANN forces the model to learn features that work well in both the source and target domains.
Domain adaptation has several advantages, but at a high level, the main advantage is that domain adaptation helps us build robust models in situations where we have limited amounts of labeled data for a target domain.
If we have large amounts of labeled data from a related source domain, we can use that labeled data to train a model and then "adapt" the model for the target domain (for which we have scarce training data). This reduces the need for additional data collection and labeling for the target domain, which might be expensive, time-consuming, or very difficult.
Therefore, domain adaptation can be very useful in fields or situations where domain-specific data might be expensive or difficult to acquire, like certain types of natural language processing (NLP) or computer vision.
Related to the advantage of lower needs for data resources is increased model generalization.
Essentially, with domain adaptation, we can help models generalize to new domains and environments, even with limited training data and retraining. That is, we can enhance the ability of models to perform well under new conditions with limited retraining. Moreover, this enhanced ability to generalize limits the degradation of model performance when we deploy the model in real-world conditions that might be different from the conditions present in the original training data.
This enhanced adaptability leads to better models that can maintain high performance even as the conditions where we deploy the model might change.
There are other benefits. For example, domain adaptation can decrease costs of model training, since domain adaptation helps us adapt existing models instead of training on a new domain from scratch. It can also help us deploy of models in new domains faster by reducing the need for extensive data collection, labeling, and model retraining.
We can use domain adaptation for a variety of applications and use-cases in order to improve performance of models when we have data distributions that change or differ between source and target domains.
Having said that, a few high-profile use cases are in:
To train self-driving cars, we often collect data from specific environments or conditions, like sunny weather conditions or urban environments.
After training models on data in these particular domains (e.g., sunny weather), we can use domain adaptation to adapt those models so they can generalize to new environments or conditions, like rainy weather conditions or more rural environments.
This helps make sure that these autonomous vehicles can perform well under a variety of conditions and settings, and this in turn increases safety.
In medical imaging, we often train models using labeled data from a particular institution or particular medical equipment.
For example, we might collect medical imaging data from a particular hospital or set of hospitals that use specific equipment, and then train a model on that specific medical image dataset. Using domain adaptation, we could then adapt the model trained on data from a narrow domain (e.g., one specific hospital) in order to help that model generalize to a new target domain (e.g., a different hospital).
Using domain adaptation this way decreases the need for additional model re-training or fine-tuning, as well as the expensive data collection and example labeling that might accompany those re-training efforts.
Thus, in a medical imaging context, domain adaptation can increase the applicability of medical imaging models to a wider range of circumstances and data domains.
In NLP, we can use domain adaptation to adapt a model trained with text data from one domain for a new domain.
For example, let's say that we've trained a model to perform sentiment analysis using data from product reviews. We could then use domain adaptation to adapt that sentiment analysis model to perform well on a new domain, like social media posts.
So in NLP, we can use domain adaptation to help models perform well across a variety of text datasets, thus increasing the versatility of NLP models.
In robotics, we can use domain adaptation to help robot control models adapt to new environments and conditions (similar to autonomous driving).
For example, we might initially train a robot in a somewhat constrained setting (like a lab setting), with specific lighting and particular types of objects in the environment. After training a robot in this constrained environment, you might use domain adaptation to help the robot adapt to new real-world settings that have different lighting conditions or different objects.
Thus, we can use domain adaptation to adapt a robot trained in a constrained environment to more specific or complex environments, such as a household or industrial setting.
Domain adaptation is a powerful technique that we can use to help models adapt to new environments and circumstances that are different from the environments from which we gathered the training data.
By employing domain adaptation techniques, like adversarial domain adaptation, fine-tuning DANNs and others, we can help our models become more flexible in tasks like computer vision and NLP, particularly when we have limited training data for new or quickly changing domains.
Share this post: