Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Nov;17(11):710-724.
doi: 10.1038/s41581-021-00463-x. Epub 2021 Aug 20.

Multi-omics integration in the age of million single-cell data

Affiliations
Review

Multi-omics integration in the age of million single-cell data

Zhen Miao et al. Nat Rev Nephrol. 2021 Nov.

Abstract

An explosion in single-cell technologies has revealed a previously underappreciated heterogeneity of cell types and novel cell-state associations with sex, disease, development and other processes. Starting with transcriptome analyses, single-cell techniques have extended to multi-omics approaches and now enable the simultaneous measurement of data modalities and spatial cellular context. Data are now available for millions of cells, for whole-genome measurements and for multiple modalities. Although analyses of such multimodal datasets have the potential to provide new insights into biological processes that cannot be inferred with a single mode of assay, the integration of very large, complex, multimodal data into biological models and mechanisms represents a considerable challenge. An understanding of the principles of data integration and visualization methods is required to determine what methods are best applied to a particular single-cell dataset. Each class of method has advantages and pitfalls in terms of its ability to achieve various biological goals, including cell-type classification, regulatory network modelling and biological process inference. In choosing a data integration strategy, consideration must be given to whether the multi-omics data are matched (that is, measured on the same cell) or unmatched (that is, measured on different cells) and, more importantly, the overall modelling and visualization goals of the integrated analysis.

PubMed Disclaimer

Conflict of interest statement

Competing interests

APM is a scientific advisor to Novartis, eGENESIS, TRESTLE Therapeutics and IVIVA Medical. The other authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Frameworks for the integration of single cell multiomics data
Computational methods enable the integration of measured attributes (that is, features) obtained using multiomics approaches (for example, transcriptome and protein data) from single cells. These methods can be classified into four broad categories. (a) Integration based on quantitative causal models. For example, the rates of RNA synthesis, splicing, translation and degradation might be modelled by differential equations and single cell multiomics data (for example, gene and protein expression data) can be used to estimate parameters (p) in the model. After obtaining the parameters, current and future cell states can be inferred. (b) Statistical modeling between features. A statistical function is used to associate data in one modality to another modality, such that the two sets of features (again, for example, gene or protein expression data) can be harmonized into one modality for downstream analyses. Such models can be calibrated from reference datasets or potentially fit to the dataset of interest. (c) Latent space modeling. Data from different modalities are assumed to be generated from a common latent space, and integrated based on the assumption that specific mapping functions are able to map the common latent space onto different modalities. The latent space can be viewed as an integrated low dimensional embedding of the multiomics or multi-modal data and the mapping functions can be regarded as a model of the abstract latent space to real observations. (d) Consensus of individual inferences (late integration). Analyses (such as clustering or dimension reduction) are performed for each individual data modality after which the results are combined to obtain common consensus outputs or complementary evidence.
Figure 2.
Figure 2.. Considerations for choosing an integration method for single cell multiomics analysis.
Various data integration methods can be used depending on the nature of the data and whether they are matched (different modalities were profiled from the same cell) or unmatched (different modalities were profiled from different cells). For unmatched data, analyses can be performed with matched clusters if manual annotations of cell types are available, for example, if we are only interested in the cell-type level relationship between open chromatin and DNA metholation, we can perform clustering and cell type annotation for each modality, and integrate at the level of cell type. If manual annotations are not available or a higher resolution of integration is needed, two different strategies are available depending on whether feature conversion is possible. For data with a common feature set or converted features (e.g., open chromatin to gene activity), tools developed for matching with converted features can be used. For data without common features or feature conversion, integration by aligning common spaces can be applied.
Figure 3.
Figure 3.. Desired properties and functionalities of visualization tools for single cell
Visualization of multiomics data requires additional functionalities given the complex data structure, for example, the ability to switch the view between different modalities. Some other desirable features include: a) Multiple layers of data visualization based on data obtained for different modalities with mapping between each layer. Ideally, the mapping between each observation and their spatial location can also be displayed as another layer of information. b) The addition of knowledge-based visualizations that incorporate down-stream analyses or prior knowledge. c) Multi-scale views with multiple resolutions to assist the dissection of very large datasets. d) Integration of prior knowledge such as ontology and anatomy with multiomics data to help anchor biological knowledge to the data e) Tools that enable on-the-fly or dynamic visualization of data to enable more flexible data visualization

References

    1. Richardson S, Tseng GC & Sun W Statistical Methods in Integrative Genomics. Annu. Rev. Stat. Its Appl. 3, 181–209 (2016). - PMC - PubMed
    1. Yuan G-C et al. Challenges and emerging directions in single-cell analysis. Genome Biol. 18, 84 (2017). - PMC - PubMed
    1. Eberwine J, Sul J-Y, Bartfai T & Kim J The promise of single-cell sequencing. Nat. Methods 11, 25–27 (2014). - PubMed
    1. Yao Z et al. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. bioRxiv 2020.03.30.015214 (2020) doi:10.1101/2020.03.30.015214. - DOI - PMC - PubMed
    1. Cao J et al. A human cell atlas of fetal gene expression. Science 370, (2020). - PMC - PubMed