Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 24;373(6562):1499-1505.
doi: 10.1126/science.abg8289. Epub 2021 Sep 23.

Protein-coding repeat polymorphisms strongly shape diverse human phenotypes

Affiliations

Protein-coding repeat polymorphisms strongly shape diverse human phenotypes

Ronen E Mukamel et al. Science. .

Abstract

Many human proteins contain domains that vary in size or copy number because of variable numbers of tandem repeats (VNTRs) in protein-coding exons. However, the relationships of VNTRs to most phenotypes are unknown because of difficulties in measuring such repetitive elements. We developed methods to estimate VNTR lengths from whole-exome sequencing data and impute VNTR alleles into single-nucleotide polymorphism haplotypes. Analyzing 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 786 phenotypes identified some of the strongest associations of common variants with human phenotypes, including height, hair morphology, and biomarkers of health. Accounting for large-effect VNTRs further enabled fine-mapping of associations to many more protein-coding mutations in the same genes. These results point to cryptic effects of highly polymorphic common structural variants that have eluded molecular analyses to date.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Kringle IV-2 repeat length variation and 23 LPA SNPs together explain ~90% of lipoprotein(a) heritability.
A, Serum lipoprotein(a) concentration vs. KIV-2 VNTR length in an effective-haploid model of Lp(a), involving N=24,969 LPA alleles (in exome-sequenced UKB participants of European ancestry) for which the allele on the homologous chromosome was predicted to produce negligible Lp(a) (<4 nmol/L). Colors indicate the 15 most common Lp(a)-modifying SNPs identified by fine-mapping analysis. Curves indicate parametric fits of Lp(a) to KIV-2 length (gray: alleles not carrying any Lp(a)-modifying SNP; red, blue, green: carriers of a single common Lp(a)-modifying SNP); large points, mean Lp(a) among such alleles in KIV-2 length bins (error bars, 95% CIs). Histograms (top/bottom), counts of Lp(a) measurements outside the reportable range (<3.8 nmol/L or >189 nmol/L), colored by Lp(a)-modifying SNPs (7). B, Observed and predicted median Lp(a) among individuals of African (AFR; N=893), European (EUR; N=42,162), South Asian (SA; N=954), and East Asian (EAS; N=156) ancestry. C, LPA allele frequencies by ancestry. VNTR alleles in cis with a large-effect Lp(a)-reducing variant (respectively, the Lp(a)-increasing 5’ UTR variant rs1800769) are indicated in gray (respectively, red). D,E, Myocardial infarction risk (respectively, type 2 diabetes prevalence) vs. measured or genetically predicted Lp(a). Error bars, 95% CIs.
Figure 2.
Figure 2.. Lengths of protein-coding repeat polymorphisms in ACAN and TENT5A associate with human height.
A, Genetic associations with height in UKB participants of European (top; EUR N=415,280) and African (bottom; AFR N=7,543) ancestry. B, ACAN VNTR allele length distributions. C, Height association statistics at ACAN in three consecutive steps of stepwise conditional analysis (EUR N=415,280). Large diamond/squares, likely-causal coding mutations; colored dots, variants in partial LD (R2>0.1) with labeled variants. Height phenotypes were adjusted for genetic predictions computed using the rest of the genome (7). D, Mean height of carriers (lines, left axis) and EUR allele frequencies (histograms, right axis) of ACAN alleles defined by VNTR length and missense SNP haplotype; error bars, 95% CIs. Rare long alleles (40-42 repeats) were grouped into one bin. E, Height associations at TENT5A. F, Mean height and EUR allele frequencies for TENT5A VNTR alleles; error bars, 95% CIs.
Figure 3.
Figure 3.. MUC1 VNTR length associates with multiple renal phenotypes.
A,C, Genetic associations with serum urea (A) and serum urate (C) at MUC1 (top; orange dots indicate variants in LD with MUC1 VNTR length (R2>0.1)) and genome-wide (bottom); N=415,280 UKB EUR participants. B,D, Mean phenotypes in carriers (B) or disease odds ratios (D) (lines, left axis) and allele frequencies (histograms, right axis) of MUC1 VNTR alleles. VNTR alleles were stratified into three groups for phenotype analyses: short (<55 repeat units), long (55-95 repeat units), and very long (>95 repeat units). Error bars, 95% CIs; eGFR, estimated glomerular filtration rate.
Figure 4.
Figure 4.. TCHH VNTR length and missense SNP rs11803731 associate independently with hair phenotypes.
A, Genetic associations with male pattern baldness at TCHH (N=189,537 male UKB EUR participants). Colors indicate partial LD (R > 0.1) with missense SNP rs11803731 (blue), the TCHH VNTR (red), or both rs11803731 and VNTR length (purple). B, Mean baldness score in carriers (lines, left axis) and allele frequencies (histograms, right axis) of TCHH alleles. TCHH alleles were binned by VNTR length quintile and missense SNP rs11803731 status. C,D, Genetic associations with hair curl at TCHH in N=3,334 TwinsUK participants (conditioned on rs11803731 in D). E, Genome-wide associations with hair curl in TwinsUK. F, Relationship between TCHH allele length and hair curl (analogous to B).

Update of

Comment in

References

    1. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MH-Y, Konkel MK, Malhotra A, Stütz AM, Shi X, Casale FP, Chen J, Hormozdiari F, Dayama G, Chen K, Malig M, Chaisson MJP, Walter K, Meiers S, Kashin S, Garrison E, Auton A, Lam HYK, Mu XJ, Alkan C, Antaki D, Bae T, Cerveira E, Chines P, Chong Z, Clarke L, Dal E, Ding L, Emery S, Fan X, Gujral M, Kahveci F, Kidd JM, Kong Y, Lameijer E-W, McCarthy S, Flicek P, Gibbs RA, Marth G, Mason CE, Menelaou A, Muzny DM, Nelson BJ, Noor A, Parrish NF, Pendleton M, Quitadamo A, Raeder B, Schadt EE, Romanovitch M, Schlattl A, Sebra R, Shabalin AA, Untergasser A, Walker JA, Wang M, Yu F, Zhang C, Zhang J, Zheng-Bradley X, Zhou W, Zichner T, Sebat J, Batzer MA, McCarroll SA, Mills RE, Gerstein MB, Bashir A, Stegle O, Devine SE, Lee C, Eichler EE, Korbel JO, An integrated map of structural variation in 2,504 human genomes. Nature (2015), doi:10.1038/nature15394. - DOI - PMC - PubMed
    1. Sulovari A, Li R, Audano PA, Porubsky D, Vollger MR, Logsdon GA, Human Genome Structural Variation Consortium, Warren WC, Pollen AA, Chaisson MJP, Eichler EE, Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl. Acad. Sci (2019), doi:10.1073/pnas.1912175116. - DOI - PMC - PubMed
    1. Lalioti MD, Scott HS, Buresi C, Rossier C, Bottani A, Morris MA, Malafosse A, Antonarakis SE, Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy. Nature. 386, 847–851 (1997). - PubMed
    1. Wijmenga C, Hewitt JE, Sandkuijl LA, Clark LN, Wright TJ, Dauwerse HG, Gruter A-M, Hofker MH, Moerer P, Williamson R, van Ommen G-JB, Padberg GW, Frants RR, Chromosome 4q DNA rearrangements associated with facioscapulohumeral muscular dystrophy. Nat. Genet 2, 26–30 (1992). - PubMed
    1. Marchini J, Howie B, Myers S, McVean G, Donnelly P, A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet 39, 906–913 (2007). - PubMed

Publication types

MeSH terms