Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 4;19(5):e3001077.
doi: 10.1371/journal.pbio.3001077. eCollection 2021 May.

Cell-level metadata are indispensable for documenting single-cell sequencing datasets

Affiliations

Cell-level metadata are indispensable for documenting single-cell sequencing datasets

Sidhant Puntambekar et al. PLoS Biol. .

Abstract

Single-cell RNA sequencing (scRNA-seq) provides an unprecedented view of cellular diversity of biological systems. However, across the thousands of publications and datasets generated using this technology, we estimate that only a minority (<25%) of studies provide cell-level metadata information containing identified cell types and related findings of the published dataset. Metadata omission hinders reproduction, exploration, validation, and knowledge transfer and is a common problem across journals, data repositories, and publication dates. We encourage investigators, reviewers, journals, and data repositories to improve their standards and ensure proper documentation of these valuable datasets.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Processed data files necessary for replicating single-cell studies.
(A) Example of a gene-by-cell count matrix containing single-cell measurements and a cell-level metadata table containing annotations inferred from the analysis of the single-cell dataset. (B) Workflow of analysis steps for regenerating cell type or gene-expression signatures from public datasets for comparative analysis of single-cell datasets. * indicates a step requiring an analyst to make subjective decisions; ** indicates a step that often includes a nondeterministic algorithm.
Fig 2
Fig 2. The majority of single-cell sequencing datasets archived on GEO do not have cell-level annotations.
(A) Number of single-cell datasets in GEO annotated with the proportion that contain cell-level metadata per year, either as plain text tables or binary objects. (B) Fraction of studies published in each group of journals compared to the total number of studies published by each group. (C) Comparison of the number of citations for studies containing or lacking cell-level metadata in 2016, 2017, or 2018. (D) Fraction of studies, since 2017, containing cell-level annotations published by authors with a previous publication of a single cell–related software tool. The numerical data underlying plots may be found at https://github.com/rnabioco/someta/tree/master/inst/manuscript and http://doi.org/10.5281/zenodo.4695069.

References

    1. Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, et al. Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017;14:865–8. 10.1038/nmeth.4380 - DOI - PMC - PubMed
    1. Setliff I, Shiakolas AR, Pilewski KA, Murji AA, Mapengo RE, Janowska K, et al. High-throughput mapping of B cell receptor sequences to antigen specificity. Cell. 2019;179:1636–1646.e15. 10.1016/j.cell.2019.11.003 - DOI - PMC - PubMed
    1. Cao Z-J, Wei L, Lu S, Yang D-C, Gao G. Searching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST. Nat Commun. 2020;11:3458. 10.1038/s41467-020-17281-7 - DOI - PMC - PubMed
    1. Franzén O, Gan L-M, Björkegren JLM. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database. 2019. 10.1093/database/baz046 - DOI - PMC - PubMed
    1. Mori T, Shinwari N, Fujibuchi W. scMontage: Fast and Robust Gene Expression Similarity Search for Massive Single-cell Data. 2020. p. 2020.08.30.271395. 10.1101/2020.08.30.271395 - DOI

Publication types

LinkOut - more resources