Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database
- PMID: 36815580
- PMCID: PMC10402395
- DOI: 10.1002/pbc.30260
Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database
Abstract
Background: Administrative datasets are useful for identifying rare disease cohorts such as pediatric acute myeloid leukemia (AML). Previously, cohorts were assembled using labor-intensive, manual reviews of patients' longitudinal chemotherapy data.
Methods: We utilized a two-step machine learning (ML) method to (i) identify pediatric patients with newly diagnosed AML, and (ii) among the identified AML patients, their chemotherapy courses, in an administrative/billing database. Using 2558 patients previously manually reviewed, multiple ML algorithms were derived from 75% of the study sample, and the selected model was tested in the remaining hold-out sample. The selected model was also applied to assemble a new pediatric AML cohort and further assessed in an external validation, using a standalone cohort established by manual chart abstraction.
Results: For patient identification, the selected Support Vector Machine model yielded a sensitivity of 0.97 and a positive predictive value (PPV) of 0.97 in the hold-out test sample. For course-specific chemotherapy regimen and start date identification, the selected Random Forest model yielded overall PPV greater than or equal to 0.88 and sensitivity greater than or equal to 0.86 across all courses in the test sample. When applied to new cohort assembly, ML identified 3016 AML patients with 10,588 treatment courses. In the external validation subset, PPV was greater than or equal to 0.75 and sensitivity was greater than or equal to 0.82 for patient identification, and PPV was greater than or equal to 0.93 and sensitivity was greater than or equal to 0.94 for regimen identifications.
Conclusion: A carefully designed ML model can accurately identify pediatric AML patients and their chemotherapy courses from administrative databases. This approach may be generalizable to other diseases and databases.
Keywords: acute myeloid leukemia; administrative database; case identification machine learning.
© 2023 Wiley Periodicals LLC.
Conflict of interest statement
CONFLICT OF INTEREST STATEMENT
BTF receives funding from Pfizer, Merck, and Allovir. He also serves on a data safety monitoring board for Astellas. Other authors do not have a conflict of interest to disclose.
Figures

Similar articles
-
Applying machine learning to identify pediatric patients with newly diagnosed acute lymphoblastic leukemia using administrative data.Pediatr Blood Cancer. 2024 Mar;71(3):e30858. doi: 10.1002/pbc.30858. Epub 2024 Jan 8. Pediatr Blood Cancer. 2024. PMID: 38189744
-
Assembly of a cohort of children treated for acute myeloid leukemia at free-standing children's hospitals in the United States using an administrative database.Pediatr Blood Cancer. 2013 Mar;60(3):508-11. doi: 10.1002/pbc.24402. Epub 2012 Nov 28. Pediatr Blood Cancer. 2013. PMID: 23192853 Free PMC article.
-
Combining clinical and molecular data for personalized treatment in acute myeloid leukemia: A machine learning approach.Comput Methods Programs Biomed. 2024 Dec;257:108432. doi: 10.1016/j.cmpb.2024.108432. Epub 2024 Sep 18. Comput Methods Programs Biomed. 2024. PMID: 39316958
-
Current and emerging therapies for acute myeloid leukemia.Clin Ther. 2009;31 Pt 2:2349-70. doi: 10.1016/j.clinthera.2009.11.017. Clin Ther. 2009. PMID: 20110045 Review.
-
Identifying and Validating Pediatric Hospitalizations for MIS-C Through Administrative Data.Pediatrics. 2023 May 1;151(5):e2022059872. doi: 10.1542/peds.2022-059872. Pediatrics. 2023. PMID: 37102310 Free PMC article. Review.
Cited by
-
Discovery of Dynamic Models for AML Disease Progression from Longitudinal Multi-Modal Clinical Data Using Explainable Machine Learning.medRxiv [Preprint]. 2025 Apr 15:2025.04.07.25325267. doi: 10.1101/2025.04.07.25325267. medRxiv. 2025. PMID: 40297459 Free PMC article. Preprint.
-
Extracting Electronic Health Record Neuroblastoma Treatment Data With High Fidelity Using the REDCap Clinical Data Interoperability Services Module.JCO Clin Cancer Inform. 2024 May;8:e2400009. doi: 10.1200/CCI.24.00009. JCO Clin Cancer Inform. 2024. PMID: 38815188 Free PMC article.
-
Making sense of the risks: what to tell adolescents and young adults diagnosed with cancer during pregnancy.J Natl Cancer Inst. 2023 Jun 8;115(6):603-604. doi: 10.1093/jnci/djad066. J Natl Cancer Inst. 2023. PMID: 37040085 Free PMC article. No abstract available.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical