Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Dec 20;10(51):eadp6040.
doi: 10.1126/sciadv.adp6040. Epub 2024 Dec 20.

Domain adaptation in small-scale and heterogeneous biological datasets

Affiliations
Review

Domain adaptation in small-scale and heterogeneous biological datasets

Seyedmehdi Orouji et al. Sci Adv. .

Abstract

Machine-learning models are key to modern biology, yet models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories due to both technical and biological differences. Domain adaptation, a type of transfer learning, alleviates this problem by aligning different datasets so that models can be applied across them. However, most state-of-the-art domain adaptation methods were designed for large-scale data such as images, whereas biological datasets are smaller and have more features, and these are also complex and heterogeneous. This Review discusses domain adaptation methods in the context of such biological data to inform biologists and guide future domain adaptation research. We describe the benefits and challenges of domain adaptation in biological research and critically explore some of its objectives, strengths, and weaknesses. We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit, with further development of customized approaches.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Diagrammatic overview of the machine learning pipeline and modifications needed to engage in transfer learning or domain adaptation (DA).
(A) In traditional machine learning, each domain has its own model, trained on domain-specific features. This means that the model can make predictions about data from that domain, but transferring the model to apply it to other domains is typically difficult or even impossible (indicated by red Xs). (B) In transfer learning or DA, data from one or more source domains are aligned (denoted by dashed outlines) with those in the target domain to find common feature spaces with similar statistical distributions such that a single model can be trained on aggregate source domain data and evaluated on target domain. This process can produce generalizable knowledge that is not domain specific. Of note, in some cases, target data will only be used after the model has been trained and not in the alignment stage (152).
Fig. 2.
Fig. 2.. A cartoon representation of source and target domains before and after alignment.
In this cartoon, features vary in their values along two dimensions, and each domain’s features take on a different mean and covariance. Unless the domains are aligned, these differences could both obscure other meaningful variation in the data that are shared across domains and prevent models trained on one domain from generalizing to another.

References

    1. Ross L. N., Bassett D. S., Causation in neuroscience: Keeping mechanism meaningful. Nat. Rev. Neurosci. 25, 81–90 (2024). - PubMed
    1. DeGrave A. J., Janizek J., Lee S.-I., AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).
    1. Li X., Gu Y., Dvornek N., Staib L. H., Ventola P., Duncan J. S., Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results. Med. Image Anal. 65, 101765 (2020). - PMC - PubMed
    1. M. Zizienová, New OSF metadata to support data sharing policy compliance. (2023).
    1. Musen M. A., Bean C. A., Cheung K.-H., Dumontier M., Durante K. A., Gevaert O., Gonzalez-Beltran A., Khatri P., Kleinstein S. H., O’Connor M. J., Pouliot Y., Rocca-Serra P., Sansone S.-A., Wiser J. A., CEDAR team, The center for expanded data annotation and retrieval. J. Am. Med. Inform. Assoc. 22, 1148–1152 (2015). - PMC - PubMed