Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 May 27;5 Suppl 3(Suppl 3):S11.
doi: 10.1186/1753-6561-5-S3-S11.

A comparison of random forests, boosting and support vector machines for genomic selection

Affiliations

A comparison of random forests, boosting and support vector machines for genomic selection

Joseph O Ogutu et al. BMC Proc. .

Abstract

Background: Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs.

Methods: We predicted GEBVs for one quantitative trait in a dataset simulated for the QTLMAS 2010 workshop. Predictive accuracy was measured as the Pearson correlation between GEBVs and observed values using 5-fold cross-validation and between predicted and true breeding values. The importance of each marker was ranked using RF and plotted against the position of the marker and associated QTLs on one of five simulated chromosomes.

Results: The correlations between the predicted and true breeding values were 0.547 for boosting, 0.497 for SVMs, and 0.483 for RF, indicating better performance for boosting than for SVMs and RF.

Conclusions: Accuracy was highest for boosting, intermediate for SVMs and lowest for RF but differed little among the three methods and relative to ridge regression BLUP (RR-BLUP).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Importance ranking of the 10031 SNP markers by random forest using percent increase in mean squared error. Positions of the simulated additive (triangle), epistatic (circle) and imprinted (diamond) QTLs are indicated on each chromosome.
Figure 2
Figure 2
Importance ranking of the 10031 SNP markers by random forest using tree node impurity. Positions of the simulated additive (triangle), epistatic (circle) and imprinted (diamond) QTLs are indicated on each chromosome.

References

    1. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. - PMC - PubMed
    1. Breiman L. Random forests. Machine Learning. 2001;45:5–32. doi: 10.1023/A:1010933404324. - DOI
    1. Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2:18–22.
    1. Statnikov A, Wang L, Aliferis CF. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics. 2008;9:319–324. doi: 10.1186/1471-2105-9-319. - DOI - PMC - PubMed
    1. Hastie TJ, Tibshirani R, Friedman J. The elements of statistical learning. Second. New York: Springer; 2009.

LinkOut - more resources