Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May;70(5):e30260.
doi: 10.1002/pbc.30260. Epub 2023 Feb 23.

Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database

Affiliations

Leveraging machine learning to identify acute myeloid leukemia patients and their chemotherapy regimens in an administrative database

Lusha Cao et al. Pediatr Blood Cancer. 2023 May.

Abstract

Background: Administrative datasets are useful for identifying rare disease cohorts such as pediatric acute myeloid leukemia (AML). Previously, cohorts were assembled using labor-intensive, manual reviews of patients' longitudinal chemotherapy data.

Methods: We utilized a two-step machine learning (ML) method to (i) identify pediatric patients with newly diagnosed AML, and (ii) among the identified AML patients, their chemotherapy courses, in an administrative/billing database. Using 2558 patients previously manually reviewed, multiple ML algorithms were derived from 75% of the study sample, and the selected model was tested in the remaining hold-out sample. The selected model was also applied to assemble a new pediatric AML cohort and further assessed in an external validation, using a standalone cohort established by manual chart abstraction.

Results: For patient identification, the selected Support Vector Machine model yielded a sensitivity of 0.97 and a positive predictive value (PPV) of 0.97 in the hold-out test sample. For course-specific chemotherapy regimen and start date identification, the selected Random Forest model yielded overall PPV greater than or equal to 0.88 and sensitivity greater than or equal to 0.86 across all courses in the test sample. When applied to new cohort assembly, ML identified 3016 AML patients with 10,588 treatment courses. In the external validation subset, PPV was greater than or equal to 0.75 and sensitivity was greater than or equal to 0.82 for patient identification, and PPV was greater than or equal to 0.93 and sensitivity was greater than or equal to 0.94 for regimen identifications.

Conclusion: A carefully designed ML model can accurately identify pediatric AML patients and their chemotherapy courses from administrative databases. This approach may be generalizable to other diseases and databases.

Keywords: acute myeloid leukemia; administrative database; case identification machine learning.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST STATEMENT

BTF receives funding from Pfizer, Merck, and Allovir. He also serves on a data safety monitoring board for Astellas. Other authors do not have a conflict of interest to disclose.

Figures

FIGURE 1
FIGURE 1
Overview of study process. ML: machine learning, PHIS: Pediatric Health Information System, AML: acute myeloid leukemia, CV: cross-validation, SVM: Support Vector Machine, RF: Random Forest, HAFH: Home or Away from Home study, CHOP: Children’s Hospital of Philadelphia, eMRN: encrypted medical record number.

Similar articles

Cited by

References

    1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J Clin. 2021;71(1):7–33. 10.3322/caac.21654 - DOI - PubMed
    1. Kadauke S, Myers RM, Li Y.et al. Risk-adapted preemptive tocilizumab to prevent severe cytokine release syndrome after CTL019 for pediatric B-cell acute lymphoblastic leukemia: a prospective clinical trial. Am J Clin Oncol. 2021;39(8):920–930. 10.1200/jco.20.02477 - DOI - PMC - PubMed
    1. Savla JJ, Faerber JA Huang YV, et al. 2-Year outcomes after complete or staged procedure for tetralogy of fallot in neonates. J Am Coll Cardiol. 2019;74(12):1570–1579. 10.1016/j.jacc.2019.05.057 - DOI - PMC - PubMed
    1. Smith MG, Royer J, Mann JR, McDermott S. Using administrative data to ascertain true cases of muscular dystrophy: rare disease surveillance. JMIR Public Health Surveill. 2017;3(1):e2. 10.2196/publichealth.6720 - DOI - PMC - PubMed
    1. Kavcic M, Fisher BT, Torp K, et al. Assembly of a cohort of children treated for acute myeloid leukemia at free-standing children’s hospitals in the United States using an administrative database. Pediatr Blood Cancer. 2013;60(3):508–511. 10.1002/pbc.24402 - DOI - PMC - PubMed

Publication types