Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 16;162(1):164.
doi: 10.1186/s41065-025-00528-y.

Global trends in machine learning applications for single-cell transcriptomics research

Affiliations

Global trends in machine learning applications for single-cell transcriptomics research

Xinyu Liu et al. Hereditas. .

Abstract

Background: Single-cell RNA sequencing (scRNA-seq) has revolutionized cellular heterogeneity analysis by decoding gene expression profiles at individual cell level, while machine learning (ML) has emerged as core computational tool for clustering analysis, dimensionality reduction modeling and developmental trajectory inference in single-cell transcriptomics(SCT). Although 3,307 papers have been published in past two decades, there remains lack of bibliometric review comprehensively addressing methodological evolution, technical challenges and clinical translation pathways. This study aims to fill research gap through bibliometric and visual analysis, revealing technological evolution trends and future development directions.

Methods: Using 3,307 publications from Web of Science Core Collection(WOSCC), we conducted bibliometric and visualization analysis through CiteSpace and VOSviewer to systematically review research trends, national/institutional contributions, keyword co-occurrence networks and co-citation relationships. Data screening strictly limited to English articles and reviews, excluding irrelevant document types, focusing on core application scenarios of ML in SCT.

Results: China and United States dominated research output (combined 65%), with China leading in publication volume (54.8%) while US demonstrating academic influence through H-index 84 and 37,135 total citations. Research hotspots concentrated on random forest (RF) and deep learning models, showing transition from algorithm development to clinical applications (e.g., tumor immune microenvironment analysis). Chinese Academy of Sciences and Harvard University emerged as core collaboration hubs, with international cooperation network primarily featuring US-China collaboration. Keyword clustering revealed four themes: gene expression, immunotherapy, bioinformatics, and inflammation-related research. Technical bottlenecks included data heterogeneity, insufficient model interpretability and weak cross-dataset generalization capability.

Conclusion: ML-scRNA-seq integration has advanced cellular heterogeneity analysis and precision medicine development. Future directions should optimize deep learning architectures, enhance model generalization capabilities, and promote technical translation through multi-omics and clinical data integration. Interdisciplinary collaboration represents key to overcoming current limitations (e.g., data standardization, algorithm interpretability), ultimately realizing deep integration between single-cell technologies and precision medicine.

Keywords: Bibliometric analysis; Deep learning; Machine learning; Random forest; Single-cell transcriptomics.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
(A) Flowchart of Literature Search Strategy and Selection Process for Machine Learning and Single-Cell Transcriptomics. (B) Annual and Cumulative Publication Output in Machine Learning and Single-Cell Transcriptomics Research (1997–2024)
Fig. 2
Fig. 2
(A) Geographical distribution map based on the total publications of different countries/regions. (B) Temporal trends in production outputs of the top 5 countries/regions. (C) Visualization map of international collaborations by countries/regions. (D) The countries/regions’ citation network visualization map was generated by using a VOS viewer. The thickness of the lines reflects the citation strength
Fig. 3
Fig. 3
(A) The institutions’ collaboration network visualization map generated by VOSviewer software. (B) Visualization analysis of author collaboration networks in VOSviewer.This figure displays authors with three or more publications. Nodes of different colors represent authors from distinct clusters, and the node size corresponds to the frequency of their appearances (i.e., publication count). (C) Visualization analysis of citation-based collaboration networks in VOSviewer.The node size reflects the frequency of their appearances (i.e., citation count)
Fig. 4
Fig. 4
The dual-map overlay of journals. The labels on the left represent citing journals, the labels on the right represent cited journal, and colored paths indicate citation relationships
Fig. 5
Fig. 5
(A) Citespace visualization timeline view of co-citation references. The time evolution is indicated with different colored lines, and the nodes on the lines indicate the references cited. (B) Clustering analysis of articles co-citation.The parameters were set as follows: Time slice (1997–2024), year per slice (1), selection criteria (K = 5)
Fig. 6
Fig. 6
Top 20 literatures with the strongest citation bursts analysis. The red areas in the graph represented the period when the number of citations for each article surged
Fig. 7
Fig. 7
Analysis of Keyword Co-Occurrence. (A) Clustering and co-occurrence visualization of major keywords in Machine Learning and Single-Cell Transcriptomics research. (B) Domain-Specific Keyword Clustering Analysis
Fig. 8
Fig. 8
The timeline graph of keywords in CiteSpace. Each horizontal line represents a cluster. Nodes size reflects co-citation frequency, and the links between nodes indicate co-citation relationships. Nodes occurrence year is the time when they were first co-cited
Fig. 9
Fig. 9
The top 40 keywords with the strongest citation bursts. The blue line indicates the time interval, and the red line indicates the period when the keyword burst occurs

Similar articles

References

    1. Stuart T et al. Comprehensive integration of single-cell data. 2019. 177(7): pp. 1888–1902. e21. - PMC - PubMed
    1. Chen X et al. Top-100 highest-cited original articles in inflammatory bowel disease: A bibliometric analysis. 2019. 98(20): p. e15718. - PMC - PubMed
    1. Hwang B et al. Single-cell RNA sequencing technologies and bioinformatics pipelines. 2018. 50(8): pp. 1–14. - PMC - PubMed
    1. Wani SA, Khan SA. and S.J.A.o.C.M.i.E. Quadri, Application of deep learning for single cell multi-omics: a state-of-the-art review. 2025: pp. 1–43.
    1. Jin S et al. Inference and analysis of cell-cell communication using cellchat. 2021. 12(1): p. 1088. - PMC - PubMed

LinkOut - more resources