Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul:117:105811.
doi: 10.1016/j.ebiom.2025.105811. Epub 2025 Jun 17.

Statistical learning to identify and characterise neurodevelopmental outcomes at 2 years in babies born preterm: model development and validation using population-level data from England and Wales

Affiliations

Statistical learning to identify and characterise neurodevelopmental outcomes at 2 years in babies born preterm: model development and validation using population-level data from England and Wales

Sadia Haider et al. EBioMedicine. 2025 Jul.

Abstract

Background: Children born preterm face elevated risks of neurodevelopmental impairments across domains. Prior studies have relied on expert-imposed typologies within single domains. This study applies statistical learning to a national database to identify transdomain clusters and their maternal and neonatal predictors.

Methods: Latent class analysis (LCA) was used to derive transdomain clusters from parent-reported visual, auditory, neuromotor, and communication impairments in preterm-born children at two years corrected age using the UK National Neonatal Research Database data (N = 27,261). Replication was conducted in an independent sample from Wales (N = 975). Clusters were clinically validated using cerebral palsy diagnosis, Bayley Scales of Infant and Toddler Development (3rd edition), and global neurodevelopmental delay. Random forest identified cluster-specific and shared predictors.

Findings: Four homogeneous clusters were derived (silhouette score = 0.71) and replicated in Wales with high balanced accuracy (93%): (1) typically developing (84.8%), (2) communication impairments (8.4%), (3) neuro-motor impairments (4.1%), and (4) multiple neuro-morbidity (2.7%). Clusters had high clinical validity and were distinguishable by shared and cluster-specific predictors. Neonatal brain injuries were most predictive of neuro-motor and multiple neuro-morbidity clusters. Birthweight, gestational age, socio-economic deprivation, and sex were stronger predictors of the communication cluster than preterm co-morbidities.

Interpretation: This study provides first evidence of the transdomain nature of neurodevelopmental impairments after preterm birth using LCA. The finding that socio-demographic and perinatal factors rather than co-morbidities increase the risk of communication impairment highlights the importance of environmental modification alongside clinical interventions. Applying data-driven approaches to routinely collected data may offer a cost-effective way to stratify at-risk children and inform targeted support strategies.

Funding: UKRI Medical Research Council.

Keywords: Birth cohorts; Machine learning; Neonatal; Neurocognitive; Neurodevelopmental impairments; Preterm.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests AT has involvement in the following grants, which are broadly in the area of digital health, although they did not support the present manuscript: EU Horizon: “EUmetriosis: transforming endometriosis care in Europe via an integrated approach addressing current knowledge, diagnosis, tailored management and patient empowerment” (Universite Catholique de Louvain, Belgium); NIHR: EQUI-RESP-AFRICA (University of Edinburgh); UKRI “AI Centre for Doctoral Training in Biomedical Innovation (AI4BI)” (University of Edinburgh); Wellcome Trust Programme grant: 226944/Z/23/Z (University of Edinburgh); Dunhill Medical Trust PhD studentship: “Preventing rehospitalizations of elderly acute care survivors using longitudinal physical and mental health monitoring with wearable sensors and smartphones” (main PhD supervisor); Exogenous sex steroid hormones and asthma phenotypes: a population-based prospective cohort study using UK-wide primary care databases, (University of Edinburgh); NES Tender, Digital Health and Care Transformation Leaders Programme in Scotland, building on the NHS Digital Academy Leads (University of Edinburgh); UKRI/Versus Arthritis APDP consortia: MR/W002426/1 (University of Cambridge); HEE for the further development of the NHS Digital Academy (renewal award)–collaborative project between Imperial College London, the University of Edinburgh, and HDRUK; Standard Life Grant, topped up by EXPPECT contribution and UoE CMVM funds, PhD studentship on Endometriosis and wearable technology. Supervisors: A. Tsanas, A. Horne, P. Saunders (University of Edinburgh); ESRC: “Beyond the 10,000 steps: Managing less visible aspects of healthy ageing at work” (Business School, University of Edinburgh); BHF: RG/20/10/34966 (University of Edinburgh); HDRUK: CFC0109 (University of Oxford); Wellcome Trust ISSF, 204826/Z/16/Z and 204826/Z/16/Z (University of Oxford); Asthma UK, Asthma: renewal funding bid (University of Edinburgh & QMUL); NHS England commissioning for the development of the NHS Digital Academy. (Imperial College London and the University of Edinburgh, with input from Harvard University); HDRUK core site award (Reg. no: Edin1), Universities of Edinburgh (coordinating), Glasgow, Dundee, Aberdeen, Strathclyde, and St Andrews. AT received consulting fees from Mirador Analytics for statistical risk disclosure and dataset certification. AT received honoraria for talks in the area of digital health (World AI conference) and Cirrus Logic. SRC reports the following grants: US NIH Grant R01AG054628, U01AG083829, & 1RF1AG073593 (University of Edinburgh); BBSRC & ESRC Grant BB/W008793/1 (University of Edinburgh); Wellcome & Royal Society Grant 221890/Z/20/Z (University of Edinburgh). CB is supported by National Institute for Health and Care Research (NIHR) via an Advanced Fellowship programme, and holds unpaid leadership roles at the NIHR Health Technology Assessment Prioritisation committee (Deputy Chair) and the British Association of Perinatal Medicine (Honorary Secretary). JPB holds a MRC UKRI Programme grant: “Preterm birth as a determinant of neurodevelopment and cognition in children: mechanisms and causal evidence”, MR/X003434/1, PI: J. Boardman (University of Edinburgh); reports book royalties from Walter Kluwer for Avery and MacDonald's Neonatology Pathophysiology and Management of the Newborn, Eighth edition. Editors: J P Boardman, A M Groves, J Ramasethu. Publisher: Lippincott Williams & Wilkins (LWW). ISBN: 978-1-97-512925-5; support for travel expenses from Perinatal Science International, International Neonatology Association, Witness to House of Lords select committee on preterm birth, and the Joint European Neonatal Societies to attend meetings; participation at Data and Safety Monitoring Committee of the Pregnancy Outcome Prediction Study 2 (POPS2); and holds leadership roles at the NHS England Maternity Neonatal Programme, Member of Clinical Outcomes Group, Scientific Advisory Panel, Action Medical Research, PREMSTEM (Brain injury in the premature born infant: stem cell regeneration research network) scientific advisory board. EU programme. SH, GDB, RMR, HCW, and REM declare no competing interests.

Figures

Fig. 1
Fig. 1
Overview of analytic steps. LCA was used to derive transdomain clusters of neurodevelopmental functioning using unvalidated parent-reported outcomes in the England cohort. Clusters were externally replicated using the Wales cohort. t-SNE algorithm was used to compare cluster structure in both cohorts. Clusters were clinically validated using three additional sources of data at the two-year follow-up, and RF was used to investigate shared and cluster-specific ante- and peri-natal features.
Fig. 2
Fig. 2
Four clusters identified by latent class analysis in 27,261 children ina)England and 975 children inb)Wales at 2-years of age. The children were assigned to each cluster according to the maximum posterior probability. The lines display the prevalence of each impairment conditional upon cluster membership. TD = Typically developing, COMM = Communication impairments, NM = Neuro-motor impairments, MNM = Multiple neuro-morbidity.
Fig. 3
Fig. 3
Two-dimensional representation of thea)England andb)Wales cohorts using t-SNE colour mapped by LCA clusters.
Fig. 4
Fig. 4
Two-dimensional representation of thea)England andb)Wales cohorts using t-SNE colour mapped by impairments.
Fig. 5
Fig. 5
Individual-level sequences stratified by a) LCA-derived clusters and b) healthcare professional global assessment of developmental delay (England). TD = Typically developing, COMM = Communication impairments, NM = Neuro-motor impairments, MNM = Multiple neuro-morbidity. For developmental delay (as diagnosed by a HCP), typically developing: <3 months delay; mild delay: 3–6 months delay; moderate delay: 6–12 months delay; severe delay: >12 months delay. Severity of impairments defined in Table 1. 35% of children assigned to MNM had 5–8 impairments compared with 19% who were assessed as severely delayed.
Fig. 6
Fig. 6
Alluvial plot to show change in individual assignments between HCP assessment of developmental delay and LCA cluster assignment (England cohort). This plot shows that there was low agreement in assignments between both classifications among children with impairments. For example, children from all four developmental delay categories were assigned to the TD cluster. Only 41% with severe developmental delay were assigned to MNM and 10% to TD. N = 26,032 children with both LCA assignment and HCP assessment of developmental delay. Adjusted Rand Index: 0.457.
Fig. 7
Fig. 7
Global interpretation of the contribution of the top fifteen features to the model predictions of each cluster (England cohort). The bar plots show the mean absolute SHAP value of each feature across all observations per cluster in the England cohort. The bars show the impact each feature has on the model predicting each cluster from the RF model. The order of the bars shows the contribution (feature importance) each predictor makes in determining the output of the models (top to bottom = highest to lowest contribution).

References

    1. Ohuma E.O., Moller A.B., Bradley E., et al. National, regional, and global estimates of preterm birth in 2020, with trends from 2010: a systematic analysis. Lancet. 2023;402(10409):1261–1271. - PubMed
    1. Inder T.E., Volpe J.J., Anderson P.J. Defining the neurologic consequences of preterm birth. N Engl J Med. 2023;389(5):441–453. - PubMed
    1. Blencowe H., Lee A.C., Cousens S., et al. Preterm birth-associated neurodevelopmental impairment estimates at regional and global levels for 2010. Pediatr Res. 2013;74 Suppl 1(Suppl 1):17–34. - PMC - PubMed
    1. Baker S., Kandasamy Y. Machine learning for understanding and predicting neurodevelopmental outcomes in premature infants: a systematic review. Pediatr Res. 2023;93(2):293–299. - PMC - PubMed
    1. van Boven M.R., Henke C.E., Leemhuis A.G., et al. Machine learning prediction models for neurodevelopmental outcome after preterm birth: a scoping review and new machine learning evaluation framework. Pediatrics. 2022;150(1) - PubMed