Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 18;24(1):379.
doi: 10.1186/s12911-024-02781-z.

A potential predictive model based on machine learning and CPD parameters in elderly patients with aplastic anemia and myelodysplastic neoplasms

Affiliations

A potential predictive model based on machine learning and CPD parameters in elderly patients with aplastic anemia and myelodysplastic neoplasms

Yuxiang Qi et al. BMC Med Inform Decis Mak. .

Abstract

Background: Aplastic anemia (AA) and myelodysplastic neoplasms (MDS) have similar peripheral blood manifestations and are clinically characterized by reduced hematological triad. It is challenging to distinguish and diagnose these two diseases. Hence, utilizing machine learning methods, we employed and validated an algorithm that used cell population data (CPD) parameters to diagnose AA and MDS.

Methods: In this study, CPD parameters were obtained from the Beckman Coulter DxH800 analyzer for 160 individuals diagnosed with AA or MDS through a comprehensive retrospective analysis. The individuals were unselectively assigned to a training cohort (77%) and a testing cohort (23%). Additionally, an external validation cohort consisting of eighty-six elderly patients with AA and MDS from two additional centers was established. The discriminative parameters were carefully analyzed through univariate analysis, and the most predictive variables were selected using least absolute shrinkage and selection operator (LASSO) regression. Six machine learning algorithms were utilized to compare the performance of forecasting AA and MDS patients. The area under the curves (AUCs), calibration curves, decision curves analysis (DCA), and shapley additive explanations (SHAP) plots were employed to interpret and assess the model's predictive accuracy, clinical utility, and stability.

Results: After the comparative evaluation of various models, the logistic regression model emerged as the most suitable machine learning model for predicting the probability of AA and MDS, which utilized five principal variables (age, MNVLY, SDVLY, MNLALSEGC, and MNCEGC) to accurately estimate the risk of these diseases. The best model delivered an AUC of 0.791 in the testing cohort and had a high specificity (0.850) and positive predictive value (0.818). Furthermore, the calibration curve indicated excellent agreement between actual and predicted probabilities. The DCA curve further supported the clinical utility of our model and offered significant clinical advantages in guiding treatment decisions. Moreover, the model's performance was consistent in an external validation group, with an AUC of 0.719.

Conclusions: We developed a novel model that effectively distinguished elderly patients with AA and MDS, which had the potential to provide physicians assistance in early diagnosis and the proper treatment for the elderly.

Keywords: Aplastic anemia; Cell population data; Machine learning; Myelodysplastic neoplasms; Parameters.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics statement and consent to participate: This study was approved by the Ethics Committee of the First Affiliated Hospital of Zhejiang Chinese Medical University with approval number 2024-KLS-348–01. Written informed consent to participate was obtained from all of the participants in the study. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The flow chart demonstrated the participants encompassed within in the study
Fig. 2
Fig. 2
Screening the optimal predictors via LASSO regression. A Regression coefficient path plot in LASSO regression. Diverse colored lines indicate that different variables will gradually become zero, and the later they become zero, the more important the indicator. B The cross-validation curve of LASSO regression. The minimum standard is on the left line and the 1-SE standard is on the right line. In the current study, we selected 9 non-zero predictors according to the 1-SE standard. SE, the standard error
Fig. 3
Fig. 3
The ROC curves of AA and MDS were independently predicted by 9 predictors between the two groups
Fig. 4
Fig. 4
A The weight importance of nine filtered indicators. B Heat map of correlation of top five indicators. The correlation degree is from low (blue) to high (red)
Fig. 5
Fig. 5
The performance of six machine learning models. A ROC curve of the training cohort; B ROC curve of the testing cohort; C Calibration curve; (D) Decision curve analysis
Fig. 6
Fig. 6
Model explainability via the SHAP algorithm. A The horizontal SHAP value represents the influence on the prediction result, and the vertical coordinate is each indicator, the contribution degree is from low (blue) to high (red). B The importance ranking of independent variables. C The SHAP force plot of patients with myelodysplastic neoplasms. D The SHAP force plot of patients with aplastic anemia
Fig. 7
Fig. 7
ROC for the external validation cohort

Similar articles

Cited by

References

    1. Young NS. Aplastic anemia. N Engl J Med. 2018;379(17):1643–56. - PMC - PubMed
    1. DeZern AE, Churpek JE. Approach to the diagnosis of aplastic anemia. Blood Adv. 2021;5(12):2660–71. - PMC - PubMed
    1. Cazzola M. Myelodysplastic Syndromes. N Engl J Med. 2020;383(14):1358–74. - PubMed
    1. Kim SY, Park Y, Kim H, et al. Discriminating myelodysplastic syndrome and other myeloid malignancies from non-clonal disorders by multiparametric analysis of automated cell data. Clin Chim Acta. 2018;480:56–64. - PubMed
    1. Bennett JM, Orazi A. Diagnostic criteria to distinguish hypocellular acute myeloid leukemia from hypocellular myelodysplastic syndromes and aplastic anemia: recommendations for a standardized approach. Haematologica. 2009;94(2):264–8. - PMC - PubMed