Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug;21(8):1462-1465.
doi: 10.1038/s41592-024-02235-4. Epub 2024 Mar 25.

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Affiliations

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Wenpin Hou et al. Nat Methods. 2024 Aug.

Abstract

Here we demonstrate that the large language model GPT-4 can accurately annotate cell types using marker gene information in single-cell RNA sequencing analysis. When evaluated across hundreds of tissue and cell types, GPT-4 generates cell type annotations exhibiting strong concordance with manual annotations. This capability can considerably reduce the effort and expertise required for cell type annotation. Additionally, we have developed an R software package GPTCelltype for GPT-4's automated cell type annotation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Examples of GPT-4’s cell type annotation and comparisons with other methods.
a, Comparison of cell type annotations by human experts, GPT-4, and other automated methods. b, Example of GPT-4 annotating human prostate cells with increasing granularity. c, Example of GPT-4 annotating single, mixed and new cell types.
Fig. 2
Fig. 2. Performance evaluation.
a, Average agreement scores for varying numbers of top differential genes, statistical tests for differential analysis, and prompt strategies. b, Proportion of cell types with varying agreement levels in each study and tissue, most abundant broad cell types, malignant cells, different cell population sizes, and major cell types versus cell subtypes. c, log2-transformed ratio of type I (COL1A1 and COL1A2) and II (COL2A1) collagen gene expression. d,e, Comparison of average agreement scores (d) and running times (e). In e, n = 59 for GPT-4 and GPT-3.5 and n = 36 for ScType and SingleR. Each boxplot shows the distribution (center: median; bounds of box: first and third quartiles; bounds of whiskers: data points within 1.5× interquartile range from the box; minima; maxima) of running time. f, Financial cost of querying GPT-4 API versus cell type numbers. g, GPT-4’s performance in identifying mixed/single cell types and known/unknown cell types, and under different subsampling and noise levels in multiple simulation rounds (dots). h, Reproducibility of GPT-4 annotations. i, Consistency of agreement scores between two versions of GPT-4.

Update of

Similar articles

Cited by

References

    1. Hou, W. et al. GeneTuring tests GPT models in genomics. Preprint at bioRxiv10.1101/2023.03.11.532238 (2023).
    1. Hou, W. et al. GPT-4V exhibits human-like performance in biomedical image classification. Preprint at bioRxiv10.1101/2023.12.31.573796 (2024).
    1. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell184, 3573–3587 (2021). 10.1016/j.cell.2021.04.048 - DOI - PMC - PubMed
    1. HuBMAP Consortium. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature574, 187–192 (2019). 10.1038/s41586-019-1629-x - DOI - PMC - PubMed
    1. Eraslan, G. et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science376, eabl4290 (2022). 10.1126/science.abl4290 - DOI - PMC - PubMed

LinkOut - more resources