Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;613(7944):519-525.
doi: 10.1038/s41586-022-05420-7. Epub 2023 Jan 18.

Mono- and biallelic variant effects on disease at biobank scale

Affiliations

Mono- and biallelic variant effects on disease at biobank scale

H O Heyne et al. Nature. 2023 Jan.

Abstract

Identifying causal factors for Mendelian and common diseases is an ongoing challenge in medical genetics1. Population bottleneck events, such as those that occurred in the history of the Finnish population, enrich some homozygous variants to higher frequencies, which facilitates the identification of variants that cause diseases with recessive inheritance2,3. Here we examine the homozygous and heterozygous effects of 44,370 coding variants on 2,444 disease phenotypes using data from the nationwide electronic health records of 176,899 Finnish individuals. We find associations for homozygous genotypes across a broad spectrum of phenotypes, including known associations with retinal dystrophy and novel associations with adult-onset cataract and female infertility. Of the recessive disease associations that we identify, 13 out of 20 would have been missed by the additive model that is typically used in genome-wide association studies. We use these results to find many known Mendelian variants whose inheritance cannot be adequately described by a conventional definition of dominant or recessive. In particular, we find variants that are known to cause diseases with recessive inheritance with significant heterozygous phenotypic effects. Similarly, we find presumed benign variants with disease effects. Our results show how biobanks, particularly in founder populations, can broaden our understanding of complex dosage effects of Mendelian variants on disease.

PubMed Disclaimer

Conflict of interest statement

A.P. is a member of the Pfizer Genetics Scientific Advisory Panel. M.J.D. is a founder of Maze Therapeutics. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schema of different effect sizes of monoallelic (heterozygous) versus biallelic variant states.
A is the wild-type and B is the mutant allele. We distinguish five main scenarios that are associated with different modes of inheritance used in rare-disease genetics (first row of the table at the bottom). In rare-disease genetics, the phenotypes associated with the mono- and the biallelic state in scenarios 2, 3 and 4 are usually viewed as distinct disease entities, with the monoallelic phenotype regarded as dominantly inherited and the biallelic phenotype, which is usually more severe, regarded as recessively inherited. In the schema, we focus on autosomal inheritance and do not show overdominant or underdominant inheritance (rare outside the HLA region). A perfectly linear additive genetic architecture (scenario 3) is also described, in which no dominance effect contributes to phenotypic variation.
Fig. 2
Fig. 2. P values of the additive versus the recessive GWAS model of all genome-wide-significant variant–disease associations.
Associations in which the P value of the recessive model is two orders of magnitude lower than that of the additive model are shown in blue (category: ‘recessive’); all other associations are shown in black. a, All associations. b, Recessive associations, broken down by known inheritance modes of the respective disease gene (source: OMIM). Independent loci (considering adjacent variants with r2 > 0.25 associated with the same (parent) trait as one locus). For clarity, −log10 P values are capped at 50.
Fig. 3
Fig. 3. Age at first diagnosis of variants with recessive disease associations.
Data are shown as survival plots. a, Missense variant in CASP7 associated with cataract; not previously described (P = 2.5 × 10−16); n = 176,899. b, Missense variant in C10orf90 associated with hearing loss (P = 2.2 × 10−12); only recently described; n = 176,899. c, Intronic variant in EBAG associated with female infertility (P = 1.6 × 10−11); not previously described; n = 110,361 female individuals. Survival curves of wild-type individuals are coloured in blue, heterozygotes in yellow and homozygotes in red. The 95% confidence intervals of the point estimates are shaded in light blue, light yellow or light red.
Fig. 4
Fig. 4. Age at first diagnosis of known disease-associated variants.
Data are shown as survival plots. a, Known likely pathogenic variant (known recessive inheritance) in GJB2 associated with hearing loss also in a heterozygous state (P = 0.02). The y axis is cut at 0.9 for clarity. b, Known likely pathogenic variant in XPA associated with skin cancer (P = 8 × 10−11). In a homozygous state, this variant causes xeroderma pigmentosum with childhood-onset skin cancer. c, Likely benign missense variant in DBH protects from hypertension (P = 5.2 × 10−13). (DBH is associated with the recessively inherited disease dopamine beta-hydroxylase deficiency, which is characterized by severe hypotension). a and b show R4 data (n = 176,899); c shows R6 data (n = 234,553). Survival curves of wild-type individuals are coloured in blue, heterozygous individuals in yellow and homozygous individuals in red. The 95% confidence intervals of the point estimates are shaded in light blue, light yellow or light red.
Extended Data Fig. 1
Extended Data Fig. 1. Variants known to cause disease with recessive inheritance are found at a higher MAF in Finnish Europeans than in other Europeans.
We show MAF of 2,419 unique likely pathogenic variants (source: ClinVar) in 10,824 individuals from FIN and 10,824 from NFE populations (source: gnomAD). a, All 2,419 variants (violin plot). b, All 2,419 variants (histogram). In c we subset to 133 variants in Finnish disease heritage genes,. Violin plots are scaled to have the same area. Box plots within violins show the 1st, 2nd and 3rd quartiles of the MAF distribution; whiskers maximally extend to 1.5 interquartile range.
Extended Data Fig. 2
Extended Data Fig. 2. Simulations of recessive and additive effects at different MAFs.
Here, we generated genotype counts of wild types, heterozygotes and mutant homozygotes in 200,000 individuals of a variant with an allele frequency of 0.01 following Hardy Weinberg Equilibrium (R library: HardyWeinberg) and random controls and cases of a disease with a prevalence of 0.05 (see also Supplementary Note 2). In a we simulate a recessive association. Here, we set the probability of homozygotes to develop the disease to 5x compared to wild type and the heterozygous effect to 1 (= no effect). In this histogram we show on the x-axis, the log10 p-value of the recessive model - the log10 p-value of the additive model (method: logistic regression). In b we simulate an additive association. Here, we set the homozygous effect to 1.5x and the heterozygous effect to 2.25. In this histogram we show on the x-axis, the log10 p-value of the additive model - the log10 p-value of the recessive model (method: logistic regression).
Extended Data Fig. 3
Extended Data Fig. 3. EBAG is associated with female infertility.
a, Comparing number of offspring in 147,061 women who are wild type (n = 131,141), heterozygous (n = 15,455) or homozygous (n = 465) for the EBAG9 variant. Among 7,980 women with children and diagnosed with infertility 71 EBAG9 homozygotes (b) had fewer children and (c) had their first child significantly later (see Supplementary Note 5).
Extended Data Fig. 4
Extended Data Fig. 4. Age at first disease diagnosis of variant carriers in GJB2 (survival plot).
Wt, wild type; het, heterozygous; hom, homozygous; comp-het, compound heterozygous; GT, genotype. Genotypes of a known pathogenic missense and pLoF variant in GJB2 associated with hearing loss. Comp-het carry both the pathogenic missense and pLoF variant on different alleles.
Extended Data Fig. 5
Extended Data Fig. 5. Replications with UKBB data.
We replicated the 31 recessive associations in FinnGen in the UKBB with a recessive model in SAiGE. Of the 31 variants, 13 had ≥ 5 homozygotes in the UKBB of which 8 had a significant recessive p-value in UKBB for the same/similar phenotype. Here, we show recessive betas in FinnGen versus UKBB (betas set to 5, when beta > 5). Plot size corresponds to MAF in gnomAD non-Finnish Europeans and plot colour red means the recessive p-value in UKBB is < 0.05.
Extended Data Fig. 6
Extended Data Fig. 6. Age at disease onset.
This forest plot shows the median age at first diagnosis for each variant with recessive associations in our FinnGen data. P-values indicate differences in disease onset between respective homozygous or heterozygous compared to wild-type carriers (Wilcoxon rank tests). Bars represent the first and third quartile of age at first diagnosis. Only variants with more than 5 affected homozygotes are shown. The y-axis lists gene-disease associations. Homozygotes had significantly earlier (or later for a known homozygous protective variant in FUT2) disease onset for 7/31 variants than wild types (p-value < 0.0016 with Bonferroni correction for 31 tests, 14/31 tests with nominal p-value < 0.05, Wilcoxon rank test.).
Extended Data Fig. 7
Extended Data Fig. 7. Global disease associations of variant categories.
Variants previously described as disease-causing (ClinVar likely pathogenic or conflicting variants) but also ClinVar likely benign variants are globally associated with disease phenotypes compared with randomly sampled intergenic variants matched to the same minor allele frequency in 15 bins. Disease-causing variants in OMIM genes that were described with only dominant inheritance, as well as genes with only recessive inheritance were globally disease-associated.

Comment in

References

    1. Claussnitzer M, et al. A brief history of human disease genetics. Nature. 2020;577:179–189. - PMC - PubMed
    1. Peltonen L, Jalanko A, Varilo T. Molecular genetics of the Finnish disease heritage. Hum. Mol. Genet. 1999;8:1913–1923. - PubMed
    1. Lim ET, et al. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 2014;10:e1004494. - PMC - PubMed
    1. Zuk O, et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA. 2014;111:E455–E464. - PMC - PubMed
    1. Peltonen L, Palotie A, Lange K. Use of population isolates for mapping complex traits. Nat. Rev. Genet. 2000;1:182–190. - PubMed

Publication types