Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 11;3(2):100090.
doi: 10.1016/j.xhgg.2022.100090. eCollection 2022 Apr 14.

Leveraging TOPMed imputation server and constructing a cohort-specific imputation reference panel to enhance genotype imputation among cystic fibrosis patients

Affiliations

Leveraging TOPMed imputation server and constructing a cohort-specific imputation reference panel to enhance genotype imputation among cystic fibrosis patients

Quan Sun et al. HGG Adv. .

Abstract

Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped approximately 8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (approximately 30×) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among patients with CF. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the approximately 8,000 CF samples with GWAS array genotype using the Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for patients with CF, boosting genomic coverage from approximately 0.3-4.2 million genotyped markers to approximately 11-43 million well-imputed markers, and significantly improving polygenic risk score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of patients with CF. We demonstrate that despite having approximately 3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely owing to allele and haplotype differences between patients with CF and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.

Keywords: cystic fibrosis; genotype imputation; mendelian disease; polygenic risk score.

PubMed Disclaimer

Conflict of interest statement

M.J.B. is the Editor-in-chief of HGG Advances. All other authors declare no competing interests.

Figures

Figure 1
Figure 1
Imputation concordance for F508del using TOPMed and reduced CFGP reference panels. The true R2 for TOPMed and reduced CFGP imputed results are 0.835 and 0.926, and the sum of squared error for TOPMed and reduced CFGP are 117.58 and 82.42, respectively. The main reason that TOPMed is slightly worse is that it tends to underestimate the deletion frequency.
Figure 2
Figure 2
Histograms of differences between reduced CFGP true R2 and TOPMed true R2 to compare the imputation quality of the two reference panels. (A) For overall chr7. Almost all variants are located to the left half, which means TOPMed is predominantly better than the reduced CFGP reference panel. (B) For CFTR region only. The advantage of TOPMed reference panel over the reduced CFGP becomes less pronounced.
Figure 3
Figure 3
Histograms of mean true R2 difference and proportion of variants better imputed by reduced CFGP than TOPMed, across 2,872 1-Mb non-overlapping regions. We calculated the true R2 difference of the two reference panels using reduced-CFGP true R2 minus TOPMed true R2 for each variant, and then summarized variant level true R2 difference at the 1-Mb region level using the two statistics: difference of true R2 (A) and proportion of reduced-CFGP better imputed variants (B).
Figure 4
Figure 4
Illustration of impact of imputation on PRS construction. (A) Imputation performed in target cohorts. We started with four independent discovery cohorts (I–III are TOPMed imputed data, IV is WGS data), performed association analysis for each subset separately and then meta-analyzed the association results. The meta-GWAS summary statistics was then used to construct PRS using the P+T method. The constructed PRS was applied to the same 1992 target samples but with four different marker densities (in yellow highlight): array genotype, TOPMed imputed, reduced CFGP imputed, or WGS data to compare the benefit of imputation in target cohort. (B) Imputation performed in discovery cohorts. We started with the same first three discovery cohorts as in A, but adopted three different marker sets (again in yellow highlight), as well as a fourth independent WGS cohort. We then performed association analysis and meta-analysis for each marker set, and constructed three different PRSs using the three different meta-GWAS summary statistics. The three PRSs were then applied to the same cohort to compare the performances.

References

    1. Corvol H., Blackman S.M., Boëlle P.-Y., Gallins P.J., Pace R.G., Stonebraker J.R., Accurso F.J., Clement A., Collaco J.M., Dang H., et al. Genome-wide association meta-analysis identifies five modifier loci of lung disease severity in cystic fibrosis. Nat. Commun. 2015;6:8382. - PMC - PubMed
    1. Gong J., Wang F., Xiao B., Panjwani N., Lin F., Keenan K., Avolio J., Esmaeili M., Zhang L., He G., et al. Genetic association and transcriptome integration identify contributing genes and tissues at cystic fibrosis modifier loci. PLoS Genet. 2019;15:e1008007. - PMC - PubMed
    1. Aksit M.A., Pace R.G., Vecchio-Pagán B., Ling H., Rommens J.M., Boelle P.-Y., Guillot L., Raraigh K.S., Pugh E., Zhang P., et al. Genetic modifiers of cystic fibrosis-related diabetes have extensive overlap with type 2 diabetes and related traits. J. Clin. Endocrinol. Metab. 2020;105:1401–1415. - PMC - PubMed
    1. Treggiari M.M., Rosenfeld M., Mayer-Hamblett N., Retsch-Bogart G., Gibson R.L., Williams J., Emerson J., Kronmal R.A., Ramsey B.W. Early anti-pseudomonal acquisition in young patients with cystic fibrosis: rationale and design of the EPIC clinical trial and observational study. Contemp. Clin. Trials. 2009;30:256–268. - PMC - PubMed
    1. Kowalski M.H., Qian H., Hou Z., Rosen J.D., Tapia A.L., Shan Y., Jain D., Argos M., Arnett D.K., Avery C., et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 2019;15:e1008500. - PMC - PubMed