Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug;21(8):1481-1491.
doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.

Large-scale foundation model on single-cell transcriptomics

Affiliations

Large-scale foundation model on single-cell transcriptomics

Minsheng Hao et al. Nat Methods. 2024 Aug.

Abstract

Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the 'languages' of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named 'xTrimoscFoundationα', with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.

PubMed Disclaimer

References

    1. Srivastava, A. et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models. Preprint at arXiv https://doi.org/10.48550/arXiv.2206.04615 (2023).
    1. Jovic, D. et al. Single-cell RNA sequencing technologies and applications: a brief overview. Clin. Transl. Med. 12, e694 (2022). - PubMed - PMC - DOI
    1. Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017). - PubMed - PMC - DOI
    1. Chen, S. et al. hECA: the cell-centric assembly of a cell atlas. iScience 25, 104318 (2022). - PubMed - PMC - DOI
    1. Snyder, M. P. et al. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019). - DOI

LinkOut - more resources