Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 26;5(3):lqad070.
doi: 10.1093/nargab/lqad070. eCollection 2023 Sep.

Single-cell reference mapping to construct and extend cell-type hierarchies

Affiliations

Single-cell reference mapping to construct and extend cell-type hierarchies

Lieke Michielsen et al. NAR Genom Bioinform. .

Abstract

Single-cell genomics is now producing an ever-increasing amount of datasets that, when integrated, could provide large-scale reference atlases of tissue in health and disease. Such large-scale atlases increase the scale and generalizability of analyses and enable combining knowledge generated by individual studies. Specifically, individual studies often differ regarding cell annotation terminology and depth, with different groups specializing in different cell type compartments, often using distinct terminology. Understanding how these distinct sets of annotations are related and complement each other would mark a major step towards a consensus-based cell-type annotation reflecting the latest knowledge in the field. Whereas recent computational techniques, referred to as 'reference mapping' methods, facilitate the usage and expansion of existing reference atlases by mapping new datasets (i.e. queries) onto an atlas; a systematic approach towards harmonizing dataset-specific cell-type terminology and annotation depth is still lacking. Here, we present 'treeArches', a framework to automatically build and extend reference atlases while enriching them with an updatable hierarchy of cell-type annotations across different datasets. We demonstrate various use cases for treeArches, from automatically resolving relations between reference and query cell types to identifying unseen cell types absent in the reference, such as disease-associated cell states. We envision treeArches enabling data-driven construction of consensus atlas-level cell-type hierarchies and facilitating efficient usage of reference atlases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A schematic version of treeArches and an example using PBMC and bone marrow datasets. (A) Pre-training of a latent representation using labeled public reference datasets. After integration, a cell-type hierarchy is created by matching the cell types of the different datasets. Here, for instance, cell types (CT) 1 and 2 from study (S) 2 are subtypes of CT1 from S1. (B) (Un)labeled query datasets can be added to the latent representation by applying architectural surgery. After integration, the cell-type hierarchy is updated with labeled query datasets. Unlabeled query datasets can be annotated using the learned hierarchy. (C) UMAP embedding showing the integrated latent space of the three reference datasets. (D) Cell-type hierarchy learned from the three reference datasets. MC derived DC: monocyte-derived dendritic cells, MC: monocytes, pDC: plasmacytoid dendritic cells, HSPC: hematopoietic stem and progenitor cell. (E) Updated hierarchy after the 10X dataset was added. (F) UMAP embedding showing the integrated latent space of the reference and query datasets.
Figure 2.
Figure 2.
Updated hierarchy when adding Meyer to the reference atlas. (A) The cell-type hierarchy corresponding to the reference atlas (only the first two levels are shown). Each node represents a cell type in the reference atlas instead of a cell type in a separate dataset of the reference atlas. The UMAP embedding shows the aligned reference and query dataset. The cells in the reference dataset are colored according to their level 2 annotation. (BC) Updated hierarchy zoomed in on the blood vessels and airway epithelium secretory cells respectively. The UMAP embeddings are colored according to their finest resolution. (D) Expression of marker genes for club and goblet cells in the reference and query cell types. (E) Comparison of the predictions using the original and updated reference on the T-cells of the Tata dataset. (F) Expression of marker genes for CD8 + GZMK + cells.
Figure 3.
Figure 3.
Identifying diseased cells in IPF data. (A–C) UMAPs show the HLCA and IPF datasets after alignment. The cells are colored according to their cell type or condition. (D) Heatmap showing the predicted labels by scHPL and original labels. The dark boundaries indicate the hierarchy of the reference tree. (E) Sankey diagram showing the new annotations and predictions for the macrophages for the IPF condition. (F) Expression of SPP1 in the different cell types of the reference and query datasets.
Figure 4.
Figure 4.
Results motor cortex across species. (A) UMAP embedding of the integrated reference datasets. (B) Learned hierarchy when combining mouse and marmoset (step 1) and after adding human (step 2). The color of each node represents the dataset(s) from which the cell type originates. (C) UMAP embedding after architectural surgery with the human dataset.

Similar articles

Cited by

References

    1. Suo C., Dann E., Goh I., Jardine L., Kleshchevnikov V., Park J.-E., Botting R.A., Stephenson E., Engelbert J., Tuong Z.K.et al. .. Mapping the developing human immune system across organs. Science. 2022; 376:eabo0510. - PMC - PubMed
    1. Sikkema L., Ramírez-Suástegui C., Strobl D.C., Gillett T.E., Zappia L., Madissoon E., Markov N.S., Zaragosi L., Ji Y., Ansari M.et al. .. An integrated cell atlas of the human lung in health and disease. Nat. Med. 2023; 29:1563–1577. - PMC - PubMed
    1. Tabula Sapiens Consortium* Jones R.C., Karkanias J., Krasnow M.A., Pisco A.O., Quake S.R., Salzman J., Yosef N., Bulthaup B., Brown P.et al. .. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science. 2022; 376:eabl4896. - PMC - PubMed
    1. Hao Y., Hao S., Andersen-Nissen E., Mauck W.M., Zheng S., Butler A., Lee M.J., Wilk A.J., Darby C., Zager M.et al. .. Integrated analysis of multimodal single-cell data. Cell. 2021; 184:3573–3587. - PMC - PubMed
    1. Swamy V.S., Fufa T.D., Hufnagel R.B., McGaughey D.M.. Building the mega single-cell transcriptome ocular meta-atlas. Gigascience. 2021; 10:giab061. - PMC - PubMed