L2-Boosting algorithm applied to high-dimensional problems in genomic selection
- PMID: 20667166
- DOI: 10.1017/S0016672310000261
L2-Boosting algorithm applied to high-dimensional problems in genomic selection
Abstract
The L(2)-Boosting algorithm is one of the most promising machine-learning techniques that has appeared in recent decades. It may be applied to high-dimensional problems such as whole-genome studies, and it is relatively simple from a computational point of view. In this study, we used this algorithm in a genomic selection context to make predictions of yet to be observed outcomes. Two data sets were used: (1) productive lifetime predicted transmitting abilities from 4702 Holstein sires genotyped for 32 611 single nucleotide polymorphisms (SNPs) derived from the Illumina BovineSNP50 BeadChip, and (2) progeny averages of food conversion rate, pre-corrected by environmental and mate effects, in 394 broilers genotyped for 3481 SNPs. Each of these data sets was split into training and testing sets, the latter comprising dairy or broiler sires whose ancestors were in the training set. Two weak learners, ordinary least squares (OLS) and non-parametric (NP) regression were used for the L2-Boosting algorithm, to provide a stringent evaluation of the procedure. This algorithm was compared with BL [Bayesian LASSO (least absolute shrinkage and selection operator)] and BayesA regression. Learning tasks were carried out in the training set, whereas validation of the models was performed in the testing set. Pearson correlations between predicted and observed responses in the dairy cattle (broiler) data set were 0.65 (0.33), 0.53 (0.37), 0.66 (0.26) and 0.63 (0.27) for OLS-Boosting, NP-Boosting, BL and BayesA, respectively. The smallest bias and mean-squared errors (MSEs) were obtained with OLS-Boosting in both the dairy cattle (0.08 and 1.08, respectively) and broiler (-0.011 and 0.006) data sets, respectively. In the dairy cattle data set, the BL was more accurate (bias=0.10 and MSE=1.10) than BayesA (bias=1.26 and MSE=2.81), whereas no differences between these two methods were found in the broiler data set. L2-Boosting with a suitable learner was found to be a competitive alternative for genomic selection applications, providing high accuracy and low bias in genomic-assisted evaluations with a relatively short computational time.
Similar articles
-
The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.J Dairy Sci. 2013 Jan;96(1):614-24. doi: 10.3168/jds.2012-5630. Epub 2012 Oct 24. J Dairy Sci. 2013. PMID: 23102953
-
Comparison of methods for the implementation of genome-assisted evaluation of Spanish dairy cattle.J Dairy Sci. 2013 Jan;96(1):625-34. doi: 10.3168/jds.2012-5631. Epub 2012 Oct 24. J Dairy Sci. 2013. PMID: 23102955
-
Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle.J Dairy Sci. 2010 Nov;93(11):5423-35. doi: 10.3168/jds.2010-3149. J Dairy Sci. 2010. PMID: 20965358
-
Genetic evaluation of dairy cattle using a simple heritable genetic ground.J Sci Food Agric. 2010 Aug 30;90(11):1765-73. doi: 10.1002/jsfa.4041. J Sci Food Agric. 2010. PMID: 20564310 Review.
-
Invited review: Genomic selection in dairy cattle: progress and challenges.J Dairy Sci. 2009 Feb;92(2):433-43. doi: 10.3168/jds.2008-1646. J Dairy Sci. 2009. PMID: 19164653 Review.
Cited by
-
Whole-genome regression and prediction methods applied to plant and animal breeding.Genetics. 2013 Feb;193(2):327-45. doi: 10.1534/genetics.112.143313. Epub 2012 Jun 28. Genetics. 2013. PMID: 22745228 Free PMC article. Review.
-
Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice.G3 (Bethesda). 2022 Apr 4;12(4):jkac039. doi: 10.1093/g3journal/jkac039. G3 (Bethesda). 2022. PMID: 35166767 Free PMC article.
-
Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction.Front Plant Sci. 2023 Jan 19;14:1092584. doi: 10.3389/fpls.2023.1092584. eCollection 2023. Front Plant Sci. 2023. PMID: 36743488 Free PMC article. Review.
-
A review of machine learning models applied to genomic prediction in animal breeding.Front Genet. 2023 Sep 6;14:1150596. doi: 10.3389/fgene.2023.1150596. eCollection 2023. Front Genet. 2023. PMID: 37745853 Free PMC article. Review.
-
Genomic Prediction for 25 Agronomic and Quality Traits in Alfalfa (Medicago sativa).Front Plant Sci. 2018 Aug 20;9:1220. doi: 10.3389/fpls.2018.01220. eCollection 2018. Front Plant Sci. 2018. PMID: 30177947 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous