Prequalification of genome-based newborn screening for severe childhood genetic diseases through federated training based on purifying hyperselection

Stephen F Kingsmore¹, Meredith Wright², Laurie D Smith³, Yupu Liang⁴, William R Mowrey⁴, Liana Protopsaltis², Matthew Bainbridge², Mei Baker⁵, Sergey Batalov², Eric Blincow², Bryant Cao², Sara Caylor², Christina Chambers⁶, Katarzyna Ellsworth², Annette Feigenbaum⁷, Erwin Frise⁸, Lucia Guidugli², Kevin P Hall⁹, Christian Hansen², Mark Kiel¹⁰, Lucita Van Der Kraan², Chad Krilow¹¹, Hugh Kwon², Lakshminarasimha Madhavrao², Sebastien Lefebvre¹², Jeremy Leipzig¹¹, Rebecca Mardach⁷, Barry Moore¹³, Danny Oh², Lauren Olsen², Eric Ontiveros², Mallory J Owen², Rebecca Reimers¹⁴, Gunter Scharer³, Jennifer Schleit³, Seth Shelnutt¹¹, Shyamal S Mehtalia⁹, Albert Oriol², Erica Sanford³, Steve Schwartz¹⁰, Kristen Wigby², Mary J Willis³, Mark Yandell¹³, Chris M Kunard⁹, Thomas Defay⁴

Affiliations

¹ Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA; Rady Children's Hospital, San Diego, CA 92123, USA. Electronic address: skingsmore@rchsd.org.
² Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA; Rady Children's Hospital, San Diego, CA 92123, USA.
³ Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.
⁴ Alexion, AstraZeneca Rare Disease, Boston, MA 02210, USA.
⁵ Department of Pediatrics, University of Wisconsin School of Medicine and Public Health, Madison, WI 53706, USA.
⁶ Department of Pediatrics, University of California, San Diego, San Diego, CA 92093, USA.
⁷ Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA; Rady Children's Hospital, San Diego, CA 92123, USA; Department of Pediatrics, University of California, San Diego, San Diego, CA 92093, USA.
⁸ Fabric Genomics, Inc., Oakland, CA 94612, USA.
⁹ Illumina, Inc., San Diego, CA 92122, USA.
¹⁰ Genomenon Inc., Ann Arbor, MI 48108, USA.
¹¹ TileDB Inc., Cambridge, MA 02142, USA.
¹² SLC Consulting Inc., Boston, MA 02210, USA.
¹³ Department of Human Genetics, University of Utah, Salt Lake City, UT 84132, USA.
¹⁴ Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA; Scripps Research Translational Institute, La Jolla, CA 92037, USA.

PMID: 39642867
PMCID: PMC11639087
DOI: 10.1016/j.ajhg.2024.10.021

Prequalification of genome-based newborn screening for severe childhood genetic diseases through federated training based on purifying hyperselection

Stephen F Kingsmore et al. Am J Hum Genet. 2024.

. 2024 Dec 5;111(12):2618-2642.

doi: 10.1016/j.ajhg.2024.10.021.

Authors

Affiliations

¹ Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA; Rady Children's Hospital, San Diego, CA 92123, USA. Electronic address: skingsmore@rchsd.org.
² Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA; Rady Children's Hospital, San Diego, CA 92123, USA.
³ Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA.
⁴ Alexion, AstraZeneca Rare Disease, Boston, MA 02210, USA.
⁵ Department of Pediatrics, University of Wisconsin School of Medicine and Public Health, Madison, WI 53706, USA.
⁶ Department of Pediatrics, University of California, San Diego, San Diego, CA 92093, USA.
⁷ Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA; Rady Children's Hospital, San Diego, CA 92123, USA; Department of Pediatrics, University of California, San Diego, San Diego, CA 92093, USA.
⁸ Fabric Genomics, Inc., Oakland, CA 94612, USA.
⁹ Illumina, Inc., San Diego, CA 92122, USA.
¹⁰ Genomenon Inc., Ann Arbor, MI 48108, USA.
¹¹ TileDB Inc., Cambridge, MA 02142, USA.
¹² SLC Consulting Inc., Boston, MA 02210, USA.
¹³ Department of Human Genetics, University of Utah, Salt Lake City, UT 84132, USA.
¹⁴ Rady Children's Institute for Genomic Medicine, San Diego, CA 92123, USA; Scripps Research Translational Institute, La Jolla, CA 92037, USA.

PMID: 39642867
PMCID: PMC11639087
DOI: 10.1016/j.ajhg.2024.10.021

Abstract

Genome-sequence-based newborn screening (gNBS) has substantial potential to improve outcomes in hundreds of severe childhood genetic disorders (SCGDs). However, a major impediment to gNBS is imprecision due to variants classified as pathogenic (P) or likely pathogenic (LP) that are not SCGD causal. gNBS with 53,855 P/LP variants, 342 genes, 412 SCGDs, and 1,603 therapies was positive in 74% of UK Biobank (UKB470K) adults, suggesting 97% false positives. We used the phenomenon of purifying hyperselection, which acts to decrease the frequency of SCGD causal diplotypes, to reduce false positives. Training of gene-disease-inheritance mode-diplotype tetrads in 618,290 control and affected subjects identified 293 variants or haplotypes and seven genes with variable inheritance contributing higher positive diplotype counts than consistent with purifying hyperselection and with little or no evidence of SCGD causality. With these changes, 2.0% of UKB470K adults were positive. In contrast, gNBS was positive in 7.2% of 3,118 critically ill children with suspected SCGDs and 7.9% of 705 infant deaths. When compared with rapid diagnostic genome sequencing (RDGS), gNBS had 99.1% recall. In eight true-positive children, gNBS was projected to decrease time to diagnosis by a median of 121 days and avoid life-threatening disease presentations in four children, organ damage in six children, ∼$1.25 million in healthcare cost, and ten (1.4%) infant deaths. Federated training predicated on purifying hyperselection provides a general framework to attain high precision in population screening. Federated training across many biobanks and clinical trials can provide a privacy-preserving mechanism for qualification of gNBS in diverse genetic ancestries.

Keywords: artificial intelligence; diplotype; false positive; genetic architecture; genome sequencing; infant mortality; newborn screening; purifying hyperselection; query federation; severe childhood genetic diseases.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests K.P.H., C.M.K., and S.S.M. are employees and shareholders of Illumina, Inc. W.R.M., Y.L., and T.D. are employees and shareholders of Alexion, AstraZeneca Rare Disease. E.F. is an employee and shareholder of Fabric Genomics, Inc. M.K. and S. Schwartz are employees and shareholders of Genomenon, Inc. J.L., C.K., and S. Shelnutt are employees and shareholders of TileDB, Inc. M.Y. is a co-founder and consultant of Fabric Genomics, Inc. S.F.K. has filed a patent related to this work.

Figures

**Figure 1**
Technical approach to structured, adaptive development of the BeginNGS.2 SCGD screening, diagnosis, and treatment platform (A) Development of a structured SCGD molecular and treatment knowledge base and screening algorithm that is trained in multicentric, large diplotype models. Federated training identifies variants in BeginNGS.2 genes contributing to diplotypes with frequencies (f_{diplotype 1 … n}) inconsistent with purifying hyperselection, such that f_{diplotype 1 … n} are greater than P, the population prevalence of the corresponding genetic disease(s) after correction for penetrance ( $p$ ), expressivity ( $e$ ), diplotype heterogeneity ( $d$ ), and locus heterogeneity ( $l$ )(Figure 2). (B) Highly automated platform for scalable population screening, diagnosis, and treatment that is empowered by the knowledge base and trained algorithm. GS, genome sequence; SME, subject matter expert; Rx, treatment. ^∗Automated interpretation includes a diplotype query and use of the Transformer tool. (C) Federated learning by (1) iterative queries of genomic sequences of UKB470K and RCIGM RDGS cohorts, with (2) return of positive diplotypes with zygosity and count of positive subjects and (3) removal of NSDCC variants and disorder MOI contributing excess positive counts. Rx, therapeutic intervention; GS, genome sequence; SME, subject matter expert; MOI, mode of inheritance; GTRx; Genome-to-Treatment; eCDS, electronic clinical decision support; DBS, dried blood spot; Exp., expected; TP, true positive rate; TN, true negative rate; AWS, Amazon web services; aiSNPs, ancestry-informative single-nucleotide polymorphisms; ETL, extract, transform, load; Db, database; VEP, variant effect predictor.

**Figure 2**
Training of the BeginNGS.2 genetic disease screening algorithm in multicentric, large diplotype models (A) Federated training in large GS cohorts flags P or LP variants for evaluation as non-severe disease causing in childhood (NSDCC) based on absence of purifying hyperselection evidenced by contributing diplotype frequencies (f) that are greater than those expected based on the sum of the corresponding disease prevalences (P) following correction for penetrance (p), expressivity (e), diplotype heterogeneity (d), and locus heterogeneity (L). (B) Manhattan plot of counts of 2,785 diplotypes that were gNBS positive in UKB470K. The x axis shows chromosome number and relative nucleotide position from the lowest value (left) to the highest value (right). The y axis is the diplotype count in UKB470K. 113 diplotypes with counts ≥54 in UKB470K (frequency >1 in 8,703) are indicated in green if disease causal (n = 16), and in red if determined to be NSDCC (n = 97) using the method of (A). The top 109 *CFTR* diplotypes (with counts >3, 1 in 118,000) are also indicated as green if disease causal (n = 5) and red if not (n = 104). (C) Rank ordering of 2,785 diplotype counts in UKB470K from largest (left) to smallest (right). The x axis shows the diplotype rank from most common (left) to least common (right). The y axis is the diplotype count in UKB470K. The top 10 (darker shaded blue) and 100 (lighter shaded blue) diplotypes accounted for 91% and 97%, respectively, of the total diplotype count. The 113 diplotypes with frequencies >1 in 8,703 (counts ≥54) are indicated in green if disease causal (n = 16), and in red (n = 97) if determined to be NSDCC using the method of (A), indicating the power to reduce false positives.

See this image and copyright information in PMC

References

1. GUTHRIE R., SUSI A. A SIMPLE PHENYLALANINE METHOD FOR DETECTING PHENYLKETONURIA IN LARGE POPULATIONS OF NEWBORN INFANTS. Pediatrics. 1963;32:338–343. - PubMed
1. IRWIN H.R., NOTRICA S., FLEMING W. Blood phenylalanine levels of newborn infants. A routine screening program for the hospital newborn nursery. Calif. Med. 1964;101:331–333. - PMC - PubMed
1. Wilson J.M.G., Jungner G., World Health Organization . World Health Organization; 1968. Principles and Practice of Screening for Disease.
1. Newborn Screening: A blueprint for the future. Pediatrics. 2000;106:S383–S427. - PubMed
1. Watson M.S., Mann M.Y., Lloyd-Puryear M.A., Rinaldo P., Howell R.R. Newborn Screening: Towards a Uniform Screening Panel and System. Genet. Med. 2006;8:1S–11S. doi: 10.1542/peds.2005-2633J. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prequalification of genome-based newborn screening for severe childhood genetic diseases through federated training based on purifying hyperselection

Affiliations

Prequalification of genome-based newborn screening for severe childhood genetic diseases through federated training based on purifying hyperselection

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials