scGPT: toward building a foundation model for single-cell multi-omics using generative AI
- PMID: 38409223
- DOI: 10.1038/s41592-024-02201-0
scGPT: toward building a foundation model for single-cell multi-omics using generative AI
Abstract
Generative pretrained models have achieved remarkable success in various domains such as language and computer vision. Specifically, the combination of large-scale diverse datasets and pretrained transformers has emerged as a promising approach for developing foundation models. Drawing parallels between language and cellular biology (in which texts comprise words; similarly, cells are defined by genes), our study probes the applicability of foundation models to advance cellular biology and genetic research. Using burgeoning single-cell sequencing data, we have constructed a foundation model for single-cell biology, scGPT, based on a generative pretrained transformer across a repository of over 33 million cells. Our findings illustrate that scGPT effectively distills critical biological insights concerning genes and cells. Through further adaptation of transfer learning, scGPT can be optimized to achieve superior performance across diverse downstream applications. This includes tasks such as cell type annotation, multi-batch integration, multi-omic integration, perturbation response prediction and gene network inference.
© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.
Similar articles
-
Linking transcriptome and morphology in bone cells at cellular resolution with generative AI.J Bone Miner Res. 2024 Dec 31;40(1):20-26. doi: 10.1093/jbmr/zjae151. J Bone Miner Res. 2024. PMID: 39303095
-
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025. Front Oncol. 2025. PMID: 40606969 Free PMC article.
-
Short-Term Memory Impairment.2024 Jun 8. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2024 Jun 8. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 31424720 Free Books & Documents.
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Artificial intelligence for diagnosing exudative age-related macular degeneration.Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2. Cochrane Database Syst Rev. 2024. PMID: 39417312
Cited by
-
Quantized multi-task learning for context-specific representations of gene network dynamics.bioRxiv [Preprint]. 2024 Aug 19:2024.08.16.608180. doi: 10.1101/2024.08.16.608180. bioRxiv. 2024. PMID: 39229018 Free PMC article. Preprint.
-
A multi-center study on the adaptability of a shared foundation model for electronic health records.NPJ Digit Med. 2024 Jun 27;7(1):171. doi: 10.1038/s41746-024-01166-w. NPJ Digit Med. 2024. PMID: 38937550 Free PMC article.
-
Zero-shot evaluation reveals limitations of single-cell foundation models.Genome Biol. 2025 Apr 18;26(1):101. doi: 10.1186/s13059-025-03574-x. Genome Biol. 2025. PMID: 40251685 Free PMC article.
-
FuncFetch: an LLM-assisted workflow enables mining thousands of enzyme-substrate interactions from published manuscripts.Bioinformatics. 2024 Dec 26;41(1):btae756. doi: 10.1093/bioinformatics/btae756. Bioinformatics. 2024. PMID: 39718779 Free PMC article.
-
Chemical Tomography of Cancer Organoids and Cyto-Proteo-Genomic Development Stages Through Chemical Communication Signals.Adv Mater. 2025 Mar;37(12):e2413017. doi: 10.1002/adma.202413017. Epub 2025 Feb 11. Adv Mater. 2025. PMID: 39935131 Free PMC article.
References
-
- Silverman, A. D., Karim, A. S. & Jewett, M. C. Cell-free gene expression: an expanded repertoire of applications. Nat. Rev. Genet. 21, 151–170 (2020). - PubMed
-
- Preissl, S., Gaulton, K. J. & Ren, B. Characterizing cis-regulatory elements using single-cell epigenomics. Nat. Rev. Genet. 24, 21–43 (2022).
MeSH terms
Grants and funding
- RGPIN-2020-06189/Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (NSERC Canadian Network for Research and Innovation in Machining Technology)
- DGECR-2020-00294/Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (NSERC Canadian Network for Research and Innovation in Machining Technology)
- Doctoral Fellowship/Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada (NSERC Canadian Network for Research and Innovation in Machining Technology)
- Peter Munk Cardiac Centre AI Fund/University Health Network (UHN)
LinkOut - more resources
Full Text Sources
Other Literature Sources