Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 May 2:rs.3.rs-2824971.
doi: 10.21203/rs.3.rs-2824971/v1.

Reference-free and cost-effective automated cell type annotation with GPT-4 in single-cell RNA-seq analysis

Affiliations

Reference-free and cost-effective automated cell type annotation with GPT-4 in single-cell RNA-seq analysis

Wenpin Hou et al. Res Sq. .

Update in

Abstract

Cell type annotation is an essential step in single-cell RNA-seq analysis. However, it is a time-consuming process that often requires expertise in collecting canonical marker genes and manually annotating cell types. Automated cell type annotation methods typically require the acquisition of high-quality reference datasets and the development of additional pipelines. We demonstrate that GPT-4, a highly potent large language model, can automatically and accurately annotate cell types by utilizing marker gene information generated from standard single-cell RNA-seq analysis pipelines. Evaluated across hundreds of tissue types and cell types, GPT-4 generates cell type annotations exhibiting strong concordance with manual annotations, and has the potential to considerably reduce the effort and expertise needed in cell type annotation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
a, Diagram comparing cell type annotations by human experts, GPT-4, and other automated methods. b, An example showing GPT-4 prompts and answers for annotating human prostate cells with increasing granularity. c, An example showing GPT-4 prompts and answers for annotating single cell types (first two cell types), mixed cell types (third cell type), and new cell types (fourth cell type).
Figure 2.
Figure 2.
Evaluation of cell type annotation by GPT-4. a, Datasets included in this study b, Agreement between original and GPT-4 annotations in identifying cell types of human prostate cells. c, Averaged agreement score (y-axis) and the number of top differential genes (x-axis) in HCA, HCL, and MCA datasets. d, Proportion of cell types with different levels of agreement in each study and tissue. Averaged agreement scores are shown as black dots. e, Proportion of cell types with different levels of agreement in each cell category. Averaged agreement scores are shown as black dots. f, Proportion of cell types that include type I collagen gene in the differential gene lists. The cell types are either classified as stromal cells by manual annotations and fibroblast, osteroblast, or chondrocyte by GPT-4 annotations, or classified as fibroblast, osteroblast, or chondrocyte by manual annotations. g, Proportion of cases where GPT-4 correctly identifies mixed and single cell types. Each dot represents one round of simulation. h, Proportion of cases where GPT-4 correctly identifies known and unknown cell types. Each dot represents one round of simulation. i, Reproducibility of GPT-4 annotations. Each dot represents one cell type.

References

    1. Tang F. et al. mrna-seq whole-transcriptome analysis of a single cell. Nat. methods 6, 377–382 (2009). - PubMed
    1. Tang F. et al. Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell rna-seq analysis. Cell stem cell 6, 468–478 (2010). - PMC - PubMed
    1. Hao Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587 (2021). - PMC - PubMed
    1. Wolf F. A., Angerer P. & Theis F. J. Scanpy: large-scale single-cell gene expression data analysis. Genome biology 19, 1–5 (2018). - PMC - PubMed
    1. Abdelaal T. et al. A comparison of automatic cell identification methods for single-cell rna sequencing data. Genome biology 20, 1–19 (2019). - PMC - PubMed

Publication types

LinkOut - more resources