Data-driven encoding for quantitative genetic trait prediction
- PMID: 25707435
- PMCID: PMC4571493
- DOI: 10.1186/1471-2105-16-S1-S10
Data-driven encoding for quantitative genetic trait prediction
Abstract
Motivation: Given a set of biallelic molecular markers, such as SNPs, with genotype values on a collection of plant, animal or human samples, the goal of quantitative genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Quantitative genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes: the three distinct genotype values, corresponding to one heterozygous and two homozygous alleles, are usually coded as integers, and manipulated algebraically in the model. Further, epistasis between multiple markers is modeled as multiplication between the markers: it is unclear that the regression model continues to be effective under this. In this work we investigate the effects of encodings to the quantitative genetic trait prediction problem.
Results: We first showed that different encodings lead to different prediction accuracies, in many test cases. We then proposed a data-driven encoding strategy, where we encode the genotypes according to their distribution in the phenotypes and we allow each marker to have different encodings. We show in our experiments that this encoding strategy is able to improve the performance of the genetic trait prediction method and it is more helpful for the oligogenic traits, whose values rely on a relatively small set of markers. To the best of our knowledge, this is the first paper that discusses the effects of encodings to the genetic trait prediction problem.
Figures
Similar articles
-
Does encoding matter? A novel view on the quantitative genetic trait prediction problem.BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):272. doi: 10.1186/s12859-016-1127-1. BMC Bioinformatics. 2016. PMID: 27454886 Free PMC article.
-
Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.Bioinformatics. 2016 Jun 15;32(12):i37-i43. doi: 10.1093/bioinformatics/btw249. Bioinformatics. 2016. PMID: 27307640 Free PMC article.
-
Accuracy of prediction of simulated polygenic phenotypes and their underlying quantitative trait loci genotypes using real or imputed whole-genome markers in cattle.Genet Sel Evol. 2015 Dec 23;47:99. doi: 10.1186/s12711-015-0179-4. Genet Sel Evol. 2015. PMID: 26698091 Free PMC article.
-
The use of molecular genetics in the improvement of agricultural populations.Nat Rev Genet. 2002 Jan;3(1):22-32. doi: 10.1038/nrg701. Nat Rev Genet. 2002. PMID: 11823788 Review.
-
[Analysis of gene effects on performance characteristics].Dtsch Tierarztl Wochenschr. 1996 Oct;103(10):378-83. Dtsch Tierarztl Wochenschr. 1996. PMID: 9035965 Review. German.
Cited by
-
Genomic prediction with epistasis models: on the marker-coding-dependent performance of the extended GBLUP and properties of the categorical epistasis model (CE).BMC Bioinformatics. 2017 Jan 3;18(1):3. doi: 10.1186/s12859-016-1439-1. BMC Bioinformatics. 2017. PMID: 28049412 Free PMC article.
-
Influence of epistasis on response to genomic selection using complete sequence data.Genet Sel Evol. 2017 Aug 25;49(1):66. doi: 10.1186/s12711-017-0340-3. Genet Sel Evol. 2017. PMID: 28841821 Free PMC article.
-
Epistasis and covariance: how gene interaction translates into genomic relationship.Theor Appl Genet. 2016 May;129(5):963-76. doi: 10.1007/s00122-016-2675-5. Epub 2016 Feb 16. Theor Appl Genet. 2016. PMID: 26883048
-
Incorporating Genome Annotation Into Genomic Prediction for Carcass Traits in Chinese Simmental Beef Cattle.Front Genet. 2020 May 15;11:481. doi: 10.3389/fgene.2020.00481. eCollection 2020. Front Genet. 2020. PMID: 32499816 Free PMC article.
-
A consistent approach to the genotype encoding problem in a genome-wide association study of continuous phenotypes.PLoS One. 2020 Jul 15;15(7):e0236139. doi: 10.1371/journal.pone.0236139. eCollection 2020. PLoS One. 2020. PMID: 32667944 Free PMC article.
References
-
- Heffner EL, Sorrells ME, Jannink J-L. Genomic selection for crop improvement. Crop Science. 2009;49(1):1–12. doi: 10.2135/cropsci2008.08.0512. - DOI
-
- Xu Y, Crouch JH. Marker-assisted selection in plant breeding: from publications to practice. Crop Science. 2008;48(2):391–407. doi: 10.2135/cropsci2007.04.0191. - DOI
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources