Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 3;18(3):e0282235.
doi: 10.1371/journal.pone.0282235. eCollection 2023.

Applied machine learning to identify differential risk groups underlying externalizing and internalizing problem behaviors trajectories: A case study using a cohort of Asian American children

Affiliations

Applied machine learning to identify differential risk groups underlying externalizing and internalizing problem behaviors trajectories: A case study using a cohort of Asian American children

Samrachana Adhikari et al. PLoS One. .

Abstract

Background: Internalizing and externalizing problems account for over 75% of the mental health burden in children and adolescents in the US, with higher burden among minority children. While complex interactions of multilevel factors are associated with these outcomes and may enable early identification of children in higher risk, prior research has been limited by data and application of traditional analysis methods. In this case example focused on Asian American children, we address the gap by applying data-driven statistical and machine learning methods to study clusters of mental health trajectories among children, investigate optimal predictions of children at high-risk cluster, and identify key early predictors.

Methods: Data from the US Early Childhood Longitudinal Study 2010-2011 were used. Multilevel information provided by children, families, teachers, schools, and care-providers were considered as predictors. Unsupervised machine learning algorithm was applied to identify groups of internalizing and externalizing problems trajectories. For prediction of high-risk group, ensemble algorithm, Superlearner, was implemented by combining several supervised machine learning algorithms. Performance of Superlearner and candidate algorithms, including logistic regression, was assessed using discrimination and calibration metrics via crossvalidation. Variable importance measures along with partial dependence plots were utilized to rank and visualize key predictors.

Findings: We found two clusters suggesting high- and low-risk groups for both externalizing and internalizing problems trajectories. While Superlearner had overall best discrimination performance, logistic regression had comparable performance for externalizing problems but worse for internalizing problems. Predictions from logistic regression were not well calibrated compared to those from Superlearner, however they were still better than few candidate algorithms. Important predictors identified were combination of test scores, child factors, teacher rated scores, and contextual factors, which showed non-linear associations with predicted probabilities.

Conclusions: We demonstrated the application of data-driven analytical approach to predict mental health outcomes among Asian American children. Findings from the cluster analysis can inform critical age for early intervention, while prediction analysis has potential to inform intervention programing prioritization decisions. However, to better understand external validity, replicability, and value of machine learning in broader mental health research, more studies applying similar analytical approach is needed.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Longitudinal trajectories of externalizing problem behavior scoresLeft panel shows individual observed trajectories with dark line representing the mean for each latent class.
Right panel shows class mean with shaded region representing the bootstrapped 95% confidence band.
Fig 2
Fig 2. Longitudinal trajectories of internalizing problem behavior scores.
Left panel shows individual observed trajectories with dark line representing the mean for each latent class. Right panel shows class mean with shaded region representing the bootstrapped 95% confidence band.
Fig 3
Fig 3
Crossvalidated performance metrics at optimal threshold for predicting externalizing problem behavior (externalization; left panel) and internalizing problem behavior (internalization; right panel); using predictions from Superlearner ensemble as well as individual candidate algorithms. Individual candidate algorithms included mean learner (Mean), logistic regression (Logistic), lasso regression (Lasso), group lasso regression (Group Lasso,) random forest with different combinations of number of variables sampled at each split and number of trees, and support vector machine with radial basis kernel function (KSVM). ACU = area under the curve, AUCpr = area under the precision recall curve, TPR = true positive rate, TNR = true negative rate.
Fig 4
Fig 4. Variable importance metric as measured by mean decrease in gini index (standardized between 0 and 100) for externalizing problem behavior, based on random forest with the number of variables sampled at each split (mtry) = 7 and number of trees (ntree) = 700 fitted on the entire dataset.
BMI = body mass index, SES = socioeconomic status.
Fig 5
Fig 5. Variable importance metric as measured by mean decrease in gini index (standardized between 0 and 100) for internalizing problem behavior, based on random forest with the number of variables sampled at each split (mtry) = 5, and number of trees (ntree) = 100 fitted on the entire dataset.
BMI = body mass index, SES = socioeconomic status.
Fig 6
Fig 6. Partial dependence plots showing the dependence of the top ten important variables with prediction of externalizing problem behaviors.
The y-axis represents the log odds of predicted probability for a fixed value of the variable of interest, conditional on all other variables (marginal effect). The x-axis represents the observed values of the variables scaled to have mean of zero and standard deviation of 1. BMI = body mass index, SES = socioeconomic status.
Fig 7
Fig 7. Partial dependence plots showing the dependence of the top ten important variables with prediction of internalizing problem behaviors.
The y-axis represents the log odds of predicted probability for a fixed value of the variable of interest, conditional on all other variables (marginal effect). The x-axis represents the observed values of the variables scaled to have mean of zero and standard deviation of 1. BMI = body mass index, SES = socioeconomic status.

Similar articles

References

    1. Murray C.J., et al.., Disability-adjusted life years (DALYs) for 291 diseases and injuries in 21 regions, 1990–2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet, 2013. 380: p. 2197–223. - PubMed
    1. Omar M.A., et al.., Mental health policy process: a comparative study of Ghana, South Africa, Uganda and Zambia. Int J Ment Health Syst, 2010. 4: p. 24. - PMC - PubMed
    1. Kessler R.C., et al.., National comorbidity survey replication adolescent supplement (NCS-A): III. Concordance of DSM-IV/CIDI diagnoses with clinical reassessments. J Am Acad Child Adolesc Psychiatry, 2009. 48(4): p. 386–399. - PMC - PubMed
    1. WHO. Health for the world’s adolescents: A second chance in the second decade. 2014. [cited 2019 Oct]; Available from: https://www.who.int/maternal_child_adolescent/documents/second-decade/en/.
    1. WHO Adolescent mental health: mapppying actions of nongovernmental organizations and other international development organizations. 2012.

Publication types