Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 28;11(1):e1004930.
doi: 10.1371/journal.pgen.1004930. eCollection 2015 Jan.

Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort

Affiliations

Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort

Thomas J Hoffmann et al. PLoS Genet. .

Erratum in

Abstract

An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37–0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4x10-12). The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8x10-4) and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting pleiotropic effects. [corrected].

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Confirmation of HOXB13 G84E mutation status from classification and regression tree.
The top of the figure shows three CART trees produced for the computationally phased haplotypes of the enriched reference panel of 93 individuals (22 carriers) plus 1000 Genomes data (2 carriers). Listed in the trees are the splits that classify the G84E mutation. The leaves in the tree contain the best guess classification of G84E on the top, and the number of reference alleles on the left and the number of G84E mutations on the right. The first tree, in black, is formed from selecting amongst all 57 SNPs +/− 3 crossovers. The second tree, in green, is formed from selecting from the same set of SNPs except excluding the 3 found in the first tree. The third tree, in blue, is formed from selecting amongst the same set of SNPs except excluding the 7 found in the first and second trees. Below the trees is a local chromosome plot of the region in reference to the surrounding genes and recombination rate of the region, with the color of the rs# for each SNP indicating the tree from which it was derived. KGW, 1000 Genomes white race/ethnicity individuals; frq, frequency.
Figure 2
Figure 2. Genotyping cluster plot of the G84E variant.
A subset of the RPGEH GERA cohort, in addition to the CMHS cohort, were additionally genotyped at the G84E variant. All carriers are imputed correctly, but some individuals are falsely identified as carriers (r2 = 0.57, 95% CI = 0.37–0.77). This is because of lack of specificity of the ancestral haplotype for mutation carriers. Counts of (Exome array genotype call, GWAS imputation call) categories for RPGEH GERA and Men’s Health cohort are given in brackets [.], and for RPGEH GERA alone in parenthesis (.). The most likely/best guess genotypes are given for the imputed data. Discordances are noted with the larger points.
Figure 3
Figure 3. Ancestry of the HOXB13 G84E variant.
Using the first two principal components (PCs) we created a smoothed estimate of the carrier frequency of each individual’s expected additive coding by using the 2,000 closest individuals (Euclidean distance) to calculate a G84E carrier frequency at that location, excluding individuals with >25% Ashkenazi ancestry. Text for the center of each Human Genome Diversity Project (HGDP) population is given to enhance interpretation; the mutation is most prevalent in northwestern Europe and Russian groups. To further adjust for incomplete LD, we multiplied the imputation carrier frequency by the r2 estimate of 0.57.
Figure 4
Figure 4. Age-specific risk of prostate cancer by HOXB13 G84E mutation carrier status.
One minus the usual Kaplan-Meier survival curve, with the probability of prostate cancer on the y-axis. The risk for G84E carriers is significantly higher than that for non-carriers. (a) Unadjusted. (b) Adjusted for incomplete LD.

Similar articles

Cited by

References

    1. Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB (2010) Rare Variants Create Synthetic Genome-Wide Associations. PLoS Biol 8: e1000294 - PMC - PubMed
    1. Sham PC, Purcell SM (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15: 335–346. - PubMed
    1. Cirulli ET, Goldstein DB (2010) Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11: 415–425. 10.1038/nrg2779 - DOI - PubMed
    1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, et al. (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42: D1001–D1006. 10.1093/nar/gkt1229 - DOI - PMC - PubMed
    1. Zheng H-F, Ladouceur M, Greenwood CMT, Richards JB (2012) Effect of Genome-Wide Genotyping and Reference Panels on Rare Variants Imputation. Journal of Genetics and Genomics 39: 545–550. - PubMed

Publication types