Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 26;53(1):19.
doi: 10.1186/s12711-021-00607-4.

On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL

Affiliations

On the use of whole-genome sequence data for across-breed genomic prediction and fine-scale mapping of QTL

Theo Meuwissen et al. Genet Sel Evol. .

Abstract

Background: Whole-genome sequence (WGS) data are increasingly available on large numbers of individuals in animal and plant breeding and in human genetics through second-generation resequencing technologies, 1000 genomes projects, and large-scale genotype imputation from lower marker densities. Here, we present a computationally fast implementation of a variable selection genomic prediction method, that could handle WGS data on more than 35,000 individuals, test its accuracy for across-breed predictions and assess its quantitative trait locus (QTL) mapping precision.

Methods: The Monte Carlo Markov chain (MCMC) variable selection model (Bayes GC) fits simultaneously a genomic best linear unbiased prediction (GBLUP) term, i.e. a polygenic effect whose correlations are described by a genomic relationship matrix (G), and a Bayes C term, i.e. a set of single nucleotide polymorphisms (SNPs) with large effects selected by the model. Computational speed is improved by a Metropolis-Hastings sampling that directs computations to the SNPs, which are, a priori, most likely to be included into the model. Speed is also improved by running many relatively short MCMC chains. Memory requirements are reduced by storing the genotype matrix in binary form. The model was tested on a WGS dataset containing Holstein, Jersey and Australian Red cattle. The data contained 4,809,520 genotypes on 35,549 individuals together with their milk, fat and protein yields, and fat and protein percentage traits.

Results: The prediction accuracies of the Jersey individuals improved by 1.5% when using across-breed GBLUP compared to within-breed predictions. Using WGS instead of 600 k SNP-chip data yielded on average a 3% accuracy improvement for Australian Red cows. QTL were fine-mapped by locating the SNP with the highest posterior probability of being included in the model. Various QTL known from the literature were rediscovered, and a new SNP affecting milk production was discovered on chromosome 20 at 34.501126 Mb. Due to the high mapping precision, it was clear that many of the discovered QTL were the same across the five dairy traits.

Conclusions: Across-breed Bayes GC genomic prediction improved prediction accuracies compared to GBLUP. The combination of across-breed WGS data and Bayesian genomic prediction proved remarkably effective for the fine-mapping of QTL.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Manhattan plots of the variance of the local GEBV within 250-kb regions for fat percentage
Fig. 2
Fig. 2
Manhattan plot of the variance of the local GEBV within 250-kb regions for fat percentage on BTA20
Fig. 3
Fig. 3
Fine-scale map of the posterior probabilities of the SNPs for affecting fat percentage in the region between 30 and 35 Mb on BTA20. The blue bar denotes the 95% credibility interval for the QTL, and the red dot the position of the COJO SNP detected by [23]
Fig. 4
Fig. 4
Fine-scale map of the posterior probabilities of the SNPs for affecting milk production in the region between 30 and 35 Mb on BTA20. The blue bar denotes the 95% credibility interval for the QTL

Similar articles

Cited by

References

    1. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 2011;467:1061–73. - PMC - PubMed
    1. Matasci N, Hung LH, Yan Z, Carpenter EJ, Wickett NJ, Mirarab S, et al. Data access for the 1000 plants (1KP) project. Gigascience. 2014;3:17. doi: 10.1186/2047-217X-3-17. - DOI - PMC - PubMed
    1. Daetwyler HD, Capitan A, Pausch H, Stothard P, van Binsbergen R, Brøndum RF, et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat Genet. 2014;46:858–865. doi: 10.1038/ng.3034. - DOI - PubMed
    1. Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44:955–959. doi: 10.1038/ng.2354. - DOI - PMC - PubMed
    1. Sargolzaei M, Chesnais JP, Schenkel FS. A new approach for efficient genotype imputation using information from relatives. BMC Genomics. 2014;15:478. doi: 10.1186/1471-2164-15-478. - DOI - PMC - PubMed

LinkOut - more resources