Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;56(10):2046-2053.
doi: 10.1038/s41588-024-01910-8. Epub 2024 Sep 23.

Federated analysis of autosomal recessive coding variants in 29,745 developmental disorder patients from diverse populations

Affiliations

Federated analysis of autosomal recessive coding variants in 29,745 developmental disorder patients from diverse populations

V Kartik Chundru et al. Nat Genet. 2024 Oct.

Abstract

Autosomal recessive coding variants are well-known causes of rare disorders. We quantified the contribution of these variants to developmental disorders in a large, ancestrally diverse cohort comprising 29,745 trios, of whom 20.4% had genetically inferred non-European ancestries. The estimated fraction of patients attributable to exome-wide autosomal recessive coding variants ranged from ~2-19% across genetically inferred ancestry groups and was significantly correlated with average autozygosity. Established autosomal recessive developmental disorder-associated (ARDD) genes explained 84.0% of the total autosomal recessive coding burden, and 34.4% of the burden in these established genes was explained by variants not already reported as pathogenic in ClinVar. Statistical analyses identified two novel ARDD genes: KBTBD2 and ZDHHC16. This study expands our understanding of the genetic architecture of developmental disorders across diverse genetically inferred ancestry groups and suggests that improving strategies for interpreting missense variants in known ARDD genes may help diagnose more patients than discovering the remaining genes.

PubMed Disclaimer

Conflict of interest statement

K.M., M.J.G.S., H.O. and V.D.U. are employees of GeneDx. Z.Z., K.R. and R.T. were formerly employees of GeneDx and K.R. and R.T. are now employees of Geisinger Health System. A.B.A. and P.B. are employees of CENTOGENE. E.J.G. is an employee of and holds shares in Adrestia Therapeutics. K.E.S. has received support from Microsoft for work related to rare disease diagnostics. M.E.H. is a co-founder of, consultant to and holds shares in Congenica, a genetics diagnostic company. All other authors declare no conflicts of interest.

Figures

Fig. 1
Fig. 1. Estimates of the fraction of patients attributable to autosomal recessive coding variants or de novo coding mutations in DDD and GeneDx across seven large GIA sub-groups (n = 25,523).
a, Estimated attributable fraction per GIA sub-group. The de novo attributable fractions (lighter shading) are stacked on the autosomal recessive attributable fractions (darker shading), with the total height of the bars being the sum of the attributable fractions. Lines show 95% confidence intervals (CIs). b, Estimated attributable fraction owing to de novo coding mutations (left) or autosomal recessive coding variants (right) versus average autozygosity (FROH) for these seven GIA sub-groups (see Table 1). Colored lines, 95% CIs. The black line is the line of best fit and gray shading shows its 95% CI. c, Comparison of the proportion of the total sample size (left) versus the proportion of the total autosomal recessive attributable fraction (right) accounted for by each GIA sub-group.
Fig. 2
Fig. 2. Estimates of the fraction of patients attributable to autosomal recessive coding variants in different subsets of genes and patients.
These plots are focused on the individuals without cross-continental admixture from seven large GIA sub-groups, as in Fig. 1 and Table 1. a, Estimates in all individuals from DDD and GeneDx combined (n = 25,523), for all genes versus genes in the indicated lists. b, Estimates in all individuals for consensus + discordant genes split by cohort (n = 7,919 and 17,604 for DDD and GeneDx, respectively), comparing the estimates obtained with all variants versus after removing variants annotated as pathogenic or likely pathogenic (P/LP) in ClinVar. c, Estimates in undiagnosed individuals (n = 4,425 and 12,604 for DDD and GeneDx, respectively), for all genes versus the genes that are used for clinical filtering of diagnostic autosomal recessive variants in the respective cohorts, split by cohort and functional consequence of the variants. Error bars, 95% confidence intervals.
Extended Data Fig. 1
Extended Data Fig. 1. Defining broad-scale population structure.
UMAP of the first seven principal components (PCs) of the 1000 Genomes and HGDP samples with DDD and GeneDx samples projected onto the PCs. The genetically-inferred ancestry (GIA) groups were labelled based on the ancestry of the 1000 Genomes/HGDP reference samples within each cluster.
Extended Data Fig. 2
Extended Data Fig. 2. Defining fine-scale population structure.
UMAPs based on principal components from each continental-level genetically-inferred ancestry (GIA) group from Extended Data Fig. 1. The PCA was run on each GIA group separately using the 1000 Genomes/HGDP reference samples together with the unrelated parents from GeneDx, then the DDD samples and remaining GeneDx samples were projected onto these. The clusters indicated in the left-hand plots were determined using HDBSCAN. The right-hand plots show the same UMAP but instead coloured to indicate which samples come from each cohort versus the reference samples. The GIA groups were as follows: AB) African (AFR), CD) Latin American (AMR), EF) East Asian (EAS), GH) European (EUR), IJ) Middle Eastern (MDE), and KL) South Asian (SAS).
Extended Data Fig. 3
Extended Data Fig. 3. Exome-wide observed and expected number of biallelic genotypes per genetically-inferred ancestry (GIA) sub-group, for the four consequence classes.
This is after excluding trios with cross-continental admixture. This figure shows only GIA sub-groups with at least 200 trios; numbers for all GIA sub-groups are shown in Supplementary Table 4, together with estimates obtained with either no admixture filtering or stricter admixture filtering. The GIA sub-groups used in Fig. 1 are shown in blue bold text along the x-axis. Coloured points are the observed numbers, black points are the expected numbers, and black lines show 95% confidence intervals around the observed. For some GIA sub-groups, the black points and/or black lines are not visible as they lie under the coloured points. P-values are shown for those where there is a Bonferroni significant difference between the observed and expected values, according to a Poisson test (p < 0.05/88, since in total there were 4 tests from each of 22 populations; two-sided test for synonymous/synonymous, one-sided otherwise).
Extended Data Fig. 4
Extended Data Fig. 4. De novo or autosomal recessive attributable fraction in different subsets of probands.
Fraction of patients in different groups attributable to de novo versus autosomal recessive coding variants [(observed-expected)/N]. The patients are split by (a) level of consanguinity (N = 1,087 and 24,436 for low and high consanguinity respectively), (b) cohort (N = 7,919 and 17,604 for DDD and GeneDx respectively), (c) diagnostic status (N = 8,494 and 17,029 for diagnosed and undiagnosed respectively) or (d) sex (N = 11,316 and 14,207 for female and male respectively). The bars show the attributable fraction estimates within the groups, and error bars show 95% confidence intervals.
Extended Data Fig. 5
Extended Data Fig. 5. Autosomal recessive attributable fraction in different gene lists.
Fraction of patients in each cohort attributable to autosomal recessive coding variants both across all genes and in the indicated ARDD gene lists (N = 7,919 and 17,604 for DDD and GeneDx respectively). Error bars show 95% confidence intervals.
Extended Data Fig. 6
Extended Data Fig. 6. Quantifying the contribution of as-yet-undetected multi-gene diagnoses.
A) The residual de novo and recessive attributable fraction in diagnosed individuals before and after diagnostic variants were removed (N = 2,031 and 4,624 diagnosed with non-de novo and de novo respectively). B) The residual de novo attributable fraction in diagnosed patients, excluding the diagnostic variant, restricted to monoallelic or X-linked dominant DDG2P genes versus all other genes. Note the patients with partial diagnoses in DDD were included but patients with known composite diagnoses or whose diagnostic variant/s did not pass variant filters were excluded from the diagnosed sets. Error bars show 95% confidence intervals.
Extended Data Fig. 7
Extended Data Fig. 7. Assessing phenotypic similarity between patients with biallelic genotypes in the same gene.
Cumulative distribution functions for pairwise phenotypic similarity scores as calculated by Phenopy. The distribution of novel genes passing FDR < 5% (ATAD2B, CRELD1, HECTD4, KBTBD2, ZDHHC16) is shown in red, consensus/discordant genes passing FDR < 5% in blue, and the similarity scores of random pairs in grey. Random pairs were selected proportionally to match the occurrence of DDD/DDD, GeneDx/GeneDx and DDD/GeneDx pairs in the novel and consensus/discordant sets. The phenotypic similarity scores in patients with damaging biallelic genotypes in the novel genes were not significantly lower than those for patients with such genotypes in consensus/discordant genes (one-sided Wilcoxon rank sum p = 0.12), but they were significantly higher than random scores (one-sided Wilcoxon rank sum p = 0.0058).

References

    1. Bamshad, M. J., Nickerson, D. A. & Chong, J. X. Mendelian gene discovery: fast and furious with no end in sight. Am. J. Hum. Genet.105, 448–455 (2019). - PMC - PubMed
    1. Wright, C. F., FitzPatrick, D. R. & Firth, H. V. Paediatric genomics: diagnosing rare disease in children. Nat. Rev. Genet.19, 253–268 (2018). - PubMed
    1. Manickam, K. et al. Exome and genome sequencing for pediatric patients with congenital anomalies or intellectual disability: an evidence-based clinical guideline of the American College of Medical Genetics and Genomics (ACMG). Genet. Med.23, 2029–2037 (2021). - PubMed
    1. Srivastava, S. et al. Meta-analysis and multidisciplinary consensus statement: exome sequencing is a first-tier clinical diagnostic test for individuals with neurodevelopmental disorders. Genet. Med.21, 2413–2421 (2019). - PMC - PubMed
    1. Wright, C. F. et al. Genomic diagnosis of rare pediatric disease in the United Kingdom and Ireland. N. Engl. J. Med.388, 1559–1571 (2023). - PMC - PubMed

Substances

LinkOut - more resources