Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 21;17(7):e0271610.
doi: 10.1371/journal.pone.0271610. eCollection 2022.

Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks

Affiliations

Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks

Ann-Kristin Becker et al. PLoS One. .

Erratum in

Abstract

Background: Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability.

Method: We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality.

Results: Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable.

Conclusion: We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Workflow.
Schematic representation of the workflow. After data preparation, a RF model is trained using nested cross-validation. Relevant predictors are identified based on two feature importance measures and a mixture model approach. Lastly, feature interactions among the relevant predictors are examined in a Bayesian network analysis.
Fig 2
Fig 2. Inferred Bayesian network structure among the extracted relevant predictors and the TSH level.
The four hub nodes, sex, age, medication (taken during the last seven days), and hip circumference are colored in blue. Arcs originating from the hub nodes are plotted in light gray to make the network more readable. The TSH level is colored in dark red, thyroid-related examinations in red. Yellow nodes refer to metabolic factors, green nodes to hematological and hemostasis factors, and grey nodes to socioeconomic parameters. Antibody titer against toxoplasmosis is presented in orange. Further information on the features can be found in S1 Table. The completed partially directed acyclic graph is shown.

Similar articles

Cited by

References

    1. Breiman L. Random forests. Mach Learn. 2001;45: 5–32. doi: 10.1023/A:1010933404324 - DOI
    1. Völzke H, Alte D, Schmidt CO, Radke D, Lorbeer R, Friedrich N, et al.. Cohort profile: The study of health in Pomerania. Int J Epidemiol. 2011. doi: 10.1093/ije/dyp394 - DOI - PubMed
    1. Madariaga AG, Santos Palacios S, Guillén-Grima F, Galofré JC. The incidence and prevalence of thyroid dysfunction in Europe: A meta-analysis. J Clin Endocrinol Metab. 2014. doi: 10.1210/jc.2013-2409 - DOI - PubMed
    1. Taylor PN, Albrecht D, Scholz A, Gutierrez-Buey G, Lazarus JH, Dayan CM, et al.. Global epidemiology of hyperthyroidism and hypothyroidism. Nat Rev Endocrinol. 2018;14: 301–316. doi: 10.1038/nrendo.2018.18 - DOI - PubMed
    1. Biondi B, Cooper DS. The clinical significance of subclinical thyroid dysfunction. Endocrine Reviews. 2008. doi: 10.1210/er.2006-0043 - DOI - PubMed

Publication types