Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 17:10:673.
doi: 10.3389/fgene.2019.00673. eCollection 2019.

Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction

Affiliations

Strategies for Obtaining and Pruning Imputed Whole-Genome Sequence Data for Genomic Prediction

Shaopan Ye et al. Front Genet. .

Abstract

Genomic prediction with imputed whole-genome sequencing (WGS) data is an attractive approach to improve predictive ability with low cost. However, high accuracy has not been realized using this method in livestock. In this study, we imputed 435 individuals from 600K single nucleotide polymorphism (SNP) chip data to WGS data using different reference panels. We also investigated the prediction accuracy of genomic best linear unbiased prediction (GBLUP) using imputed WGS data from different reference panels, linkage disequilibrium (LD)-based marker pruning, and pre-selected variants based on Genome-wide association society (GWAS) results. Results showed that the imputation accuracies from 600K to WGS data were 0.873 ± 0.038, 0.906 ± 0.036, and 0.979 ± 0.010 for the internal, external, and combined reference panels, respectively. In most traits of chickens, the prediction accuracy of imputed WGS data obtained from the internal reference panel was greater than or equal to that of the combined reference panel; the external reference panel had the lowest prediction accuracy. Compared with 600K chip data, GBLUP with imputed WGS data had only a small increase (1-3%) in prediction accuracy. Using only variants selected from imputed WGS data based on GWAS results resulted in almost no increase for most traits and even increased the bias of the regression coefficient. The impact of the degree of LD of selected and remaining variants on prediction accuracy was different. For average daily gain (ADG), residual feed intake (RFI), intestine length (IL), and body weight in 91 days (BW91), the accuracy of GBLUP increased as the degree of LD of selected variants decreased, but the opposite relationship occurred for the remaining variants. But for breast muscle weight (BMW) and average daily feed intake (ADFI), the accuracy of GBLUP increased as the degree of LD of selected variants increased, and the degree of LD of remaining variants had a small effect on prediction accuracy. Overall, the optimal imputation strategy to obtain WGS data for genomic prediction should consider the relationship between selected individuals and target population individuals to avoid heterogeneity of imputation. LD-based marker pruning can be used to improve the accuracy of genomic prediction using imputed WGS data.

Keywords: GWAS; LD-based marker pruning; chickens; genomic prediction; imputed WGS data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Average imputation accuracies of Beagle 4.1 using different reference panels per chromosome. The imputation accuracy was assessed by the correlation between imputed and masked true genotypes per SNPs.
Figure 2
Figure 2
Impact of linkage disequilibrium (LD)-based marker pruning on the predictive ability of imputed whole-genome sequencing (WGS) and chip data. Different R-squared cutoffs of LD (0.99, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, and 0.1) were used to prune markers of imputed WGS and chip data. The predictive ability was assessed by the Pearson correlation between the predicted genetic values and the observed phenotypes corrected for fixed effect per trait. These traits were average daily gain (ADG), average daily feed intake (ADFI), residual feed intake (RFI), feed conversion ratio (FCR), carcass weight (CW), breast muscle weight (BMW), eviscerated weight with giblets (EWG), eviscerated weight (EW), drumstick weight (DW), abdominal fat weight (AFW), abdominal fat percentage (AFP), gizzard weight (GW), intestine length (IL), body weight in 45 days (BW45), body weight in 49 days (BW49), body weight in 56 days (BW56), body weight in 63 days (BW63), body weight in 70 days (BW70), body weight in 77 days (BW77), body weight in 84 days (BW84), and body weight in 91 days (BW91).
Figure 3
Figure 3
Impact of pre-selected variants on the predictive ability of GBLUP using imputed WGS data. Different p-value cutoffs from 2 to 5 were used to select variants from imputed WGS data based on GWAS results for GBLUP. The red line was the prediction accuracy of GBLUP with all markers of imputed WGS data. The predictive ability was assessed by the Pearson correlation between the predicted genetic values and the observed phenotypes corrected for fixed effect(s) per trait. These traits were average daily gain (ADG), average daily feed intake (ADFI), residual feed intake (RFI), feed conversion ratio (FCR), carcass weight (CW), breast muscle weight (BMW), eviscerated weight with giblets (EWG), eviscerated weight (EW), drumstick weight (DW), abdominal fat weight (AFW), abdominal fat percentage (AFP), gizzard weight (GW), intestine length (IL), body weight in 45 days (BW45), body weight in 49 days (BW49), body weight in 56 days (BW56), body weight in 63 days (BW63), body weight in 70 days (BW70), body weight in 77 days (BW77), body weight in 84 days (BW84), and body weight in 91 days (BW91).
Figure 4
Figure 4
Impact of the linkage disequilibrium (LD)-based marker pruning of selected or remaining variants on prediction accuracy. R-squared of LD less than 0.1 was fixed to prune selected (or remaining) variants, and then the remaining (or selected) variants that were pruned with different R-squared cutoffs of LD (0.99, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, and 0.1) for GBLUP were merged. The red line is the prediction accuracy of GBLUP with all markers of imputed WGS data. The predictive ability was assessed by the Pearson correlation between the predicted genetic values and the observed phenotypes corrected for fixed effect(s) per trait. These traits were average daily gain (ADG), intestine length (IL), breast muscle weight (BMW), residual feed intake (RFI), body weight in 91 days (BW91), and average daily feed intake (ADFI).

Similar articles

Cited by

References

    1. Brondum R. F., Su G., Janss L., Sahana G., Guldbrandtsen B., Boichard D., et al. (2015). Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction. J. Dairy Sci. 98 (6), 4107–4116. 10.3168/jds.2014-9005 - DOI - PubMed
    1. Browning B., Browning S. (2016). Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98 (1), 116–126. 10.1016/j.ajhg.2015.11.020 - DOI - PMC - PubMed
    1. Calus M. P., Meuwissen T. H., de Roos A. P., Veerkamp R. F. (2008). Accuracy of genomic selection using different methods to define haplotypes. Genetics 178 (1), 553–561. 10.1534/genetics.107.080838 - DOI - PMC - PubMed
    1. Clark S. A., Hickey J. M., van der Werf J. H. (2011). Different models of genetic variation and their effect on genomic evaluation. Genet. Sel. Evol. 43, 18. 10.1186/1297-9686-43-18 - DOI - PMC - PubMed
    1. de Los Campos G., Vazquez A. I., Fernando R., Klimentidis Y. C., Sorensen D. (2013). Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet. 9 (7), e1003608. 10.1371/journal.pgen.1003608 - DOI - PMC - PubMed