Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2021 Nov 11;138(19):1885-1895.
doi: 10.1182/blood.2020010603.

Machine learning integrates genomic signatures for subclassification beyond primary and secondary acute myeloid leukemia

Affiliations
Multicenter Study

Machine learning integrates genomic signatures for subclassification beyond primary and secondary acute myeloid leukemia

Hassan Awada et al. Blood. .

Erratum in

Abstract

Although genomic alterations drive the pathogenesis of acute myeloid leukemia (AML), traditional classifications are largely based on morphology, and prototypic genetic founder lesions define only a small proportion of AML patients. The historical subdivision of primary/de novo AML and secondary AML has shown to variably correlate with genetic patterns. The combinatorial complexity and heterogeneity of AML genomic architecture may have thus far precluded genomic-based subclassification to identify distinct molecularly defined subtypes more reflective of shared pathogenesis. We integrated cytogenetic and gene sequencing data from a multicenter cohort of 6788 AML patients that were analyzed using standard and machine learning methods to generate a novel AML molecular subclassification with biologic correlates corresponding to underlying pathogenesis. Standard supervised analyses resulted in modest cross-validation accuracy when attempting to use molecular patterns to predict traditional pathomorphologic AML classifications. We performed unsupervised analysis by applying the Bayesian latent class method that identified 4 unique genomic clusters of distinct prognoses. Invariant genomic features driving each cluster were extracted and resulted in 97% cross-validation accuracy when used for genomic subclassification. Subclasses of AML defined by molecular signatures overlapped current pathomorphologic and clinically defined AML subtypes. We internally and externally validated our results and share an open-access molecular classification scheme for AML patients. Although the heterogeneity inherent in the genomic changes across nearly 7000 AML patients was too vast for traditional prediction methods, machine learning methods allowed for the definition of novel genomic AML subclasses, indicating that traditional pathomorphologic definitions may be less reflective of overlapping pathogenesis.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1.
Figure 1.
Survival outcomes and mutational landscape of pAML vs sAML. (A-C) Kaplan-Meier survival curves of pAML vs sAML (A), NK-pAML vs NK-sAML (B), and AK-pAML vs AK-sAML (C). (D) Bar graph showing the frequency (percentage) of somatic mutations in pAML vs sAML. (E-F) Forest plots representing univariate logistic regression and MLR analyses showing the odds ratio (OR; in log scale) of the association of somatic mutations in pAML vs sAML, respectively. (G) Forest plots representing univariate analyses showing the OR (in log scale) of the association of dominant/ancestral and secondary/subclonal somatic mutations in pAML vs sAML, respectively. Levels of statistical significance, indicated by green, orange, and black (P < .0001, P < .05, and P > .05, respectively), were obtained by Fisher’s exact test. (H) Bar graph showing the average predictive performance (∽0.74) of MLR using cross-validation area under the curve (ie, we correctly predicted pAML and sAML classification in ∼74% of AML cases in our cohort using the distinct genomic characteristics of each subtype). ns, not significant.
Figure 2.
Figure 2.
Novel genomic clusters (GCs) of AML identified by unsupervised analyses. (A) Consensus matrix generated by applying latent class analysis on 1000 subsamples representing the frequency of 2 observations being clustered in the same group. (B) Kaplan-Meier analysis showing the overall survival (OS; in months) of each GC (GC-1 to GC-4). (C) Pie charts showing the percentage of cases belonging to each GC (GC-1 to GC-4) in pAML (left) and sAML (right). (D) Bar graph showing the frequency of pAML and sAML in the GCs after normalizing the samples by bootstrapping. (E) Hyperparameter selection plot for RF modeling; cross-validation accuracy (CVA) is shown on the y-axis. CVA saturation in this plot indicates that 3 variables suffice to achieve the maximal accuracy of ∼0.97, (ie, this model correctly assigns prognosis for ∼97% of AML cases in our cohort using their corresponding genomic features).
Figure 3.
Figure 3.
Invariant genomic features driving each genomic group. Bar plots representing the mutational profiles of GC-1 (A), GC-2 (B), GC-3 (C), and GC-4 (D) and their importance. Red asterisks represent the most important genomic features based on an arbitrary importance cutoff of a mean decrease in accuracy ≥0.01. In addition, circos diagrams showing the pairwise cooccurrence of mutations in all GCs are illustrated to the right of the bar graphs. The colors of circos diagrams correspond to the GCs. The percentage of a cooccurrence between first and second gene mutations is represented by the color intensity of the ribbon connecting both genes.
Figure 4.
Figure 4.
Model validation and uniform resource locator. (A-D) Kaplan-Meier survival analyses (time in months) for the external validation of the model using external data from the MD Anderson Cancer Center (MDACC) vs the original data for each cluster: GC-1 (A), GC-2 (B), GC-3 (C), and GC-4 (D). (E) Screenshot of the Web site interface to our model.

Comment in

References

    1. Döhner H, Weisdorf DJ, Bloomfield CD. Acute myeloid leukemia. N Engl J Med. 2015;373(12):1136-1152. - PubMed
    1. Grimwade D, Hills RK, Moorman AV, et al. ; National Cancer Research Institute Adult Leukaemia Working Group . Refinement of cytogenetic classification in acute myeloid leukemia: determination of prognostic significance of rare recurring chromosomal abnormalities among 5876 younger adult patients treated in the United Kingdom Medical Research Council trials. Blood. 2010;116(3):354-365. - PubMed
    1. Grimwade D, Ivey A, Huntly BJ. Molecular landscape of acute myeloid leukemia in younger adults and its clinical relevance. Blood. 2016;127(1):29-41. - PMC - PubMed
    1. Ley TJ, Miller C, Ding L, et al. ; Cancer Genome Atlas Research Network . Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;368(22):2059-2074. - PMC - PubMed
    1. Papaemmanuil E, Gerstung M, Bullinger L, et al. . Genomic classification and prognosis in acute myeloid leukemia. N Engl J Med. 2016;374(23):2209-2221. - PMC - PubMed

Publication types