. 2017 Aug 15;18(1):618.

doi: 10.1186/s12864-017-4030-x.

Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping

Tingting Wang^{1

2

3}, Yi-Ping Phoebe Chen⁴, Iona M MacLeod^{5

6}, Jennie E Pryce^{5

6

7}, Michael E Goddard^{5

6

8}, Ben J Hayes^{5

6

9}

Affiliations

¹ School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, 3083, Australia. tingting.wang@ecodev.vic.gov.au.
² Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia. tingting.wang@ecodev.vic.gov.au.
³ Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia. tingting.wang@ecodev.vic.gov.au.
⁴ School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, 3083, Australia.
⁵ Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia.
⁶ Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia.
⁷ School of Applied Systems Biology, La Trobe University, Melbourne, VIC, 3083, Australia.
⁸ Faculty of Veterinary and Agricultural Science, University of Melbourne, Melbourne, VIC, 3010, Australia.
⁹ Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia.

PMID: 28810831
PMCID: PMC5558724
DOI: 10.1186/s12864-017-4030-x

Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping

Tingting Wang et al. BMC Genomics. 2017.

. 2017 Aug 15;18(1):618.

doi: 10.1186/s12864-017-4030-x.

Authors

Tingting Wang^{1

2

3}, Yi-Ping Phoebe Chen⁴, Iona M MacLeod^{5

6}, Jennie E Pryce^{5

6

7}, Michael E Goddard^{5

6

8}, Ben J Hayes^{5

6

9}

Affiliations

¹ School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, 3083, Australia. tingting.wang@ecodev.vic.gov.au.
² Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia. tingting.wang@ecodev.vic.gov.au.
³ Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia. tingting.wang@ecodev.vic.gov.au.
⁴ School of Engineering and Mathematical Sciences, La Trobe University, Melbourne, VIC, 3083, Australia.
⁵ Agriculture Victoria, AgriBio, Centre for AgriBioscience, Melbourne, VIC, 3083, Australia.
⁶ Dairy Futures Cooperative Research Centre, Melbourne, VIC, 3083, Australia.
⁷ School of Applied Systems Biology, La Trobe University, Melbourne, VIC, 3083, Australia.
⁸ Faculty of Veterinary and Agricultural Science, University of Melbourne, Melbourne, VIC, 3010, Australia.
⁹ Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia.

PMID: 28810831
PMCID: PMC5558724
DOI: 10.1186/s12864-017-4030-x

Abstract

Background: Using whole genome sequence data might improve genomic prediction accuracy, when compared with high-density SNP arrays, and could lead to identification of casual mutations affecting complex traits. For some traits, the most accurate genomic predictions are achieved with non-linear Bayesian methods. However, as the number of variants and the size of the reference population increase, the computational time required to implement these Bayesian methods (typically with Monte Carlo Markov Chain sampling) becomes unfeasibly long.

Results: Here, we applied a new method, HyB_BR (for Hybrid BayesR), which implements a mixture model of normal distributions and hybridizes an Expectation-Maximization (EM) algorithm followed by Markov Chain Monte Carlo (MCMC) sampling, to genomic prediction in a large dairy cattle population with imputed whole genome sequence data. The imputed whole genome sequence data included 994,019 variant genotypes of 16,214 Holstein and Jersey bulls and cows. Traits included fat yield, milk volume, protein kg, fat% and protein% in milk, as well as fertility and heat tolerance. HyB_BR achieved genomic prediction accuracies as high as the full MCMC implementation of BayesR, both for predicting a validation set of Holstein and Jersey bulls (multi-breed prediction) and a validation set of Australian Red bulls (across-breed prediction). HyB_BR had a ten fold reduction in compute time, compared with the MCMC implementation of BayesR (48 hours versus 594 hours). We also demonstrate that in many cases HyB_BR identified sequence variants with a high posterior probability of affecting the milk production or fertility traits that were similar to those identified in BayesR. For heat tolerance, both HyB_BR and BayesR found variants in or close to promising candidate genes associated with this trait and not detected by previous studies.

Conclusions: The results demonstrate that HyB_BR is a feasible method for simultaneous genomic prediction and QTL mapping with whole genome sequence in large reference populations.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no completing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
The pseudo-code of the EM module

**Fig. 2**
The computational time comparison between GBLUP, BayesR and HyB_BR on 600 K and SEQ data. Three reference sets (Ref1, Ref2 and Ref3) with the same number of variants (600 K or SEQ) are used here. Ref1 has Holstein bulls data with 3049 animals; Ref2 has Holstein bulls and cows data with 12,527 animals; Ref3 has Holstein and Jersey bulls and cows with 16,214 individuals

**Fig. 3**
The prediction accuracy of GBLUP, BayesR, and HyB_BR on 600 K and SEQ data related to three milk production traits including Fat Yield (a), Milk Yield (b), Protein Yield (c), Fat Percent (d), and Protein Percent (e)

**Fig. 4**
Posterior possibilities of all the variants on fat yield estimated from BayesR (a) and HyB_BR (b) according to their positions (base pairs) across the whole genome. The top SNPs with highest posterior possibilities are labelled with *blue circle*

**Fig. 5**
Posterior possibilities of all the variants for milk yield estimated from BayesR (a) and HyB_BR (b) according to their positions (base pairs) across the whole genome. The top SNPs with highest posterior possibilities are labelled with *blue circle*

**Fig. 6**
Posterior possibilities of all the variants for protein yield estimated from BayesR (a) and HyB_BR (b) according to their positions (base pairs) across the whole chromosome genome. The top SNPs with highest posterior possibilities are labelled with *blue circle*

**Fig. 7**
Posterior possibilities of all the variants for fat percent estimated from BayesR (a) and HyB_BR (b) according to their positions (base pairs) across the whole genome. The top SNPs with highest posterior possibilities are labelled with *blue circle*

**Fig. 8**
Posterior possibilities of all the variants on fertility estimated from BayesR (a) and HyB_BR (b) according to their positions (base pairs) across the whole genome. The top SNPs with highest posterior possibilities are labelled with *blue circle*

**Fig. 9**
Mapping posterior probabilities of all the variants estimated from BayesR (a) and HyB_BR (b) according to their positions (base pairs) across the whole chromosome related to Fat yield affected by heat tolerance. The top SNPs with highest posterior possibilities are labelled with *blue circle*

**Fig. 10**
Mapping the posterior probabilities of all the variants estimated from BayesR (a) and HyB_BR (b) according to their positions (base pairs) across the whole chromosome related to Milk yield affected by heat tolerance. The top SNPs with highest posterior possibilities are labelled with *blue circle*

**Fig. 11**
Mapping the posterior probabilities of all the variants estimated from BayesR (a) and HyB_BR (b) according to their positions (base pairs) across the whole chromosome related to protein yield affected by heat tolerance. The top SNPs with highest posterior possibilities are labelled with *blue circle*

See this image and copyright information in PMC

References

1. Daetwyler HD, Capitan A, Pausch H, Stothard P, van Binsbergen R, Brondum RF, Liao X, Djari A, Rodriguez SC, Grohs C, et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat Genet. 2014;46(8):858–865. doi: 10.1038/ng.3034. - DOI - PubMed
1. Clark SA, Hickey JM, van der Werf JHJ. Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol. 2011;43(1):1–9. doi: 10.1186/1297-9686-43-18. - DOI - PMC - PubMed
1. Druet T, Macleod IM, Hayes BJ. Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions. Heredity. 2014;112(1):39–47. doi: 10.1038/hdy.2013.13. - DOI - PMC - PubMed
1. IM ML, Hayes BJ, CJ VJ, Kemper KE, Haile-Mariam M, Bowman PJ, Schrooten C, Goddard ME. A Bayesian analysis to exploit imputed sequence variants for QTL discovery. In: Proceedings on 10th World Congress of Genetics Applied to Livestock Production: 2014. Vancouver, BC, Canada; 2014. p. 193.
1. MacLeod IM, Bowman PJ, Vander Jagt CJ, Haile-Mariam M, Kemper KE, Chamberlain AJ, Schrooten C, Hayes BJ, Goddard ME. Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits. BMC Genomics. 2016;17(1):1–21. doi: 10.1186/s12864-016-2443-6. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping

Affiliations

Application of a Bayesian non-linear model hybrid scheme to sequence data for genomic prediction and QTL mapping

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources