Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 21;13(1):3043.
doi: 10.1038/s41598-023-29653-2.

A hyperaldosteronism subtypes predictive model using ensemble learning

Collaborators, Affiliations

A hyperaldosteronism subtypes predictive model using ensemble learning

Shigehiro Karashima et al. Sci Rep. .

Abstract

This study aimed to develop a machine-learning algorithm to diagnose aldosterone-producing adenoma (APA) for predicting APA probabilities. A retrospective cross-sectional analysis of the Japan Rare/Intractable Adrenal Diseases Study dataset was performed using the nationwide PA registry in Japan comprised of 41 centers. Patients treated between January 2006 and December 2019 were included. Forty-six features at screening and 13 features at confirmatory test were used for model development to calculate APA probability. Seven machine-learning programs were combined to develop the ensemble-learning model (ELM), which was externally validated. The strongest predictive factors for APA were serum potassium (s-K) at first visit, s-K after medication, plasma aldosterone concentration, aldosterone-to-renin ratio, and potassium supplementation dose. The average performance of the screening model had an AUC of 0.899; the confirmatory test model had an AUC of 0.913. In the external validation, the AUC was 0.964 in the screening model using an APA probability of 0.17. The clinical findings at screening predicted the diagnosis of APA with high accuracy. This novel algorithm can support the PA practice in primary care settings and prevent potentially curable APA patients from falling outside the PA diagnostic flowchart.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Importance ranking with random forest. (A) and (B) are rankings in the screening and confirmatory testing models, respectively. There are three different APA datasets depending on the definition of APA. This figure shows 6 of the top-10 importance score rankings. The ranking order is reflected in Set B and Set C with priority given to the order of Set A. ARR aldosterone-to-renin ratio, CCT captopril-challenge test, DDD The ATC/DDD index, FUT furosemide-upright test, HT hypertension, PAC plasma aldosterone concentration, s-K serum potassium level, s-Na serum sodium level, TG triglyceride.
Figure 2
Figure 2
Heatmap comparing the area under the receiver operating characteristic curve, sensitivity, and specificity. The counts in each box are the average of 50 runs and are shown in units of 10–3. AUC area under the curve, ELM ensemble learning model, KNN k-nearest neighbor algorithm, LGBM light gradient boosting machine, LR logistic regression, MLP multilayer perceptron, NB naïve Bayes, RF Random Forest, SVM support vector machine.
Figure 3
Figure 3
Receiver operating characteristic curves for predictive diagnosis of aldosterone-producing adenoma using external validation data. There are three different APA datasets depending on the definition of APA (Set A, Set B, and Set C). A top 5 model is a model developed using only the top 5 features of the importance score of the screening model. The black line shows the Top 5 model, the dashed line shows the screening model, and the gray line shows the confirmatory test model.
Figure 4
Figure 4
Machine-learning workflow for data processing and model development. MissForest used training data to impute missing values. Feature selection determined the best set of features based on importance ranking analyzed using Random Forest. Oversampling was performed with Synthetic Minority Oversampling Technique (SMOTE) to resolve class imbalance problems. Training, validation, and test datasets were used to train prediction models, parameter tuning, and evaluate generalization performance. The performance of each model was evaluated using the 50-replicate average. Finally, generalization performance was evaluated using an external database. APA aldosterone-producing adenoma, AUC area under the curve, ELM ensemble learning model, JRAS Japan Rare/Intractable Adrenal Diseases Study, KNN k-nearest neighbor algorithm, LGBM Light gradient boosting machine, LR logistic regression, MLA machine-learning algorithm, MLP multilayer perceptron, NB naïve Bayes, RF Random Forest, SVM support vector machine.

References

    1. Gordon RD, Stowasser M, Tunny TJ, Klemm SA, Rutherford JC. High incidence of primary aldosteronism in 199 patients referred with hypertension. Clin. Exp. Pharmacol. Physiol. 1994;21:315–318. doi: 10.1111/j.1440-1681.1994.tb02519.x. - DOI - PubMed
    1. Brown JM, et al. The unrecognized prevalence of primary aldosteronism: A cross-sectional study. Ann. Intern. Med. 2020;173:10–20. doi: 10.7326/M20-0065. - DOI - PMC - PubMed
    1. Calhoun DA, Nishizaka MK, Zaman MA, Thakkar RB, Weissmann P. Hyperaldosteronism among black and white subjects with resistant hypertension. Hypertension. 2002;40:892–896. doi: 10.1161/01.HYP.0000040261.30455.B6. - DOI - PubMed
    1. Funder JW, et al. The management of primary aldosteronism: Case detection, diagnosis, and treatment: An Endocrine Society clinical practice guideline. J. Clin. Endocrinol. Metab. 2016;101:1889–1916. doi: 10.1210/jc.2015-4061. - DOI - PubMed
    1. Käyser SC, et al. Study heterogeneity and estimation of prevalence of primary aldosteronism: A systematic review and meta-regression analysis. J. Clin. Endocrinol. Metab. 2016;101:2826–2835. doi: 10.1210/jc.2016-1472. - DOI - PubMed

Publication types