Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 23:11:717616.
doi: 10.3389/fonc.2021.717616. eCollection 2021.

Integrative Analysis of Gene Expression Data by RNA Sequencing for Differential Diagnosis of Acute Leukemia: Potential Application of Machine Learning

Affiliations

Integrative Analysis of Gene Expression Data by RNA Sequencing for Differential Diagnosis of Acute Leukemia: Potential Application of Machine Learning

Jaewoong Lee et al. Front Oncol. .

Abstract

BCR-ABL1-positive acute leukemia can be classified into three disease categories: B-lymphoblastic leukemia (B-ALL), acute myeloid leukemia (AML), and mixed-phenotype acute leukemia (MPAL). We conducted an integrative analysis of RNA sequencing (RNA-seq) data obtained from 12 BCR-ABL1-positive B-ALL, AML, and MPAL samples to evaluate its diagnostic utility. RNA-seq facilitated the identification of all p190 BCR-ABL1 with accurate splicing sites and a new gene fusion involving MAP2K2. Most of the clinically significant mutations were also identified including single-nucleotide variations, insertions, and deletions. In addition, RNA-seq yielded differential gene expression profile according to the disease category. Therefore, we selected 368 genes differentially expressed between AML and B-ALL and developed two differential diagnosis models based on the gene expression data using 1) scoring algorithm and 2) machine learning. Both models showed an excellent diagnostic accuracy not only for our 12 BCR-ABL1-positive cases but also for 427 public gene expression datasets from acute leukemias regardless of specific genetic aberration. This is the first trial to develop models of differential diagnosis using RNA-seq, especially to evaluate the potential role of machine learning in identifying the disease category of acute leukemia. The integrative analysis of gene expression data by RNA-seq facilitates the accurate differential diagnosis of acute leukemia with successful detection of significant gene fusion and/or mutations, which warrants further investigation.

Keywords: BCR-ABL1; RNA sequencing; acute leukemia; expression; gene fusion; machine learning; mixed-phenotype acute leukemia; mutation.

PubMed Disclaimer

Conflict of interest statement

Author SC is the CEO of Delvine Inc. Author SH is employed by Theragen Bio Co. Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
GO of ALL, AML, and MPAL. General functional classification of highly expressed gene in B-lymphoblastic leukemia (B-ALL) compared to acute myeloid leukemia (AML) (A), AML compared to B-ALL (B), and in mixed-phenotype acute leukemia (MPAL) compared to B-ALL and AML (C). Gene Ontology (GO) analysis within target genes of significantly altered transcripts was performed using the database for annotation, visualization and integrated discovery (DAVID) bioinformatics tool. Enriched GO biological processes were identified and listed according to their enrichment P value (P < 0.05) and false discovery rate (FDR < 0.25). Both P and FDR values were obtained using DAVID 2.1 statistical function classification tool. scale: -log10 of p-value.
Figure 2
Figure 2
AML score and B-ALL score of public data. (A) Except AML score between AML and MPAL, AML score and B-ALL score between each leukemia were significantly different (p < 0.01). (B) Public data of AML and B-ALL show clustering in scatter plot of AML score and B-ALL score. MPAL samples cluster between AML and B-ALL samples.

Similar articles

Cited by

References

    1. Rusch M, Nakitandwe J, Shurtleff S, Newman S, Zhang Z, Edmonson MN, et al. . Clinical Cancer Genomic Profiling by Three-Platform Sequencing of Whole Genome, Whole Exome and Transcriptome. Nat Commun (2018) 9:3962. 10.1038/s41467-018-06485-7 - DOI - PMC - PubMed
    1. Roberts KG, Li Y, Payne-Turner D, Harvey RC, Yang YL, Pei D, et al. . Targetable Kinase-Activating Lesions in Ph-Like Acute Lymphoblastic Leukemia. N Engl J Med (2014) 371:1005–15. 10.1056/NEJMoa1403088 - DOI - PMC - PubMed
    1. Den Boer ML, van Slegtenhorst M, De Menezes RX, Cheok MH, Buijs-Gladdines JG, Peters ST, et al. . A Subtype of Childhood Acute Lymphoblastic Leukaemia With Poor Treatment Outcome: A Genome-Wide Classification Study. Lancet Oncol (2009) 10:125–34. 10.1016/S1470-2045(08)70339-5 - DOI - PMC - PubMed
    1. Coudray A, Battenhouse AM, Bucher P, Iyer VR. Detection and Benchmarking of Somatic Mutations in Cancer Genomes Using RNA-Seq Data. PeerJ (2018) 6:e5362. 10.7717/peerj.5362 - DOI - PMC - PubMed
    1. Gu M, Zwiebel M, Ong SH, Boughton N, Nomdedeu J, Basheer F, et al. . RNAmut: Robust Identification of Somatic Mutations in Acute Myeloid Leukemia Using RNA-Sequencing. Haematologica (2020) 105:e290–e3. 10.3324/haematol.2019.230821 - DOI - PMC - PubMed