Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Mar;124(5):825-33.
doi: 10.1007/s00122-011-1747-9. Epub 2011 Nov 19.

Partial least squares regression, support vector machine regression, and transcriptome-based distances for prediction of maize hybrid performance with gene expression data

Affiliations
Comparative Study

Partial least squares regression, support vector machine regression, and transcriptome-based distances for prediction of maize hybrid performance with gene expression data

Junjie Fu et al. Theor Appl Genet. 2012 Mar.

Abstract

The performance of hybrids can be predicted with gene expression data from their parental inbred lines. Implementing such prediction approaches in breeding programs promises to increase the efficiency of hybrid breeding. The objectives of our study were to compare the accuracy of prediction models employing multiple linear regression (MLR), partial least squares regression (PLS), support vector machine regression (SVM), and transcriptome-based distances (D(B)). For a factorial of 7 flint and 14 dent maize lines, the grain yield of the hybrids was assessed and the gene expression of the parental lines was profiled with a 56k microarray. The accuracy of the prediction models was measured by the correlation between predicted and observed yield employing two cross-validation schemes. The first modeled the prediction of hybrids when testcross data are available for both parental lines (type 2 hybrids), and the second modeled the prediction of hybrids when no testcross data for the parental lines were available (type 0 hybrids). MLR, SVM, and PLS resulted in a high correlation between predicted and observed yield for type 2 hybrids, whereas for type 0 hybrids D(B) had greater prediction accuracy. The regression methods were robust to the choice of the set of profiled genes and required only a few hundred genes. In contrast, for an accurate hybrid prediction with D(B), 1,000-1,500 genes were required, and the prediction accuracy depended strongly on the set of profiled genes. We conclude that for prediction within one set of genetic material MLR is a promising approach, and for transferring prediction models from one set of genetic material to a related one, the transcriptome-based distance D(B) is most promising.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Theor Appl Genet. 2007 Nov;115(7):1003-13 - PubMed
    1. Theor Appl Genet. 2010 Jan;120(2):401-13 - PubMed
    1. Biostatistics. 2001 Jun;2(2):183-201 - PubMed
    1. Theor Appl Genet. 2010 Jan;120(2):441-50 - PubMed
    1. Stat Appl Genet Mol Biol. 2004;3:Article3 - PubMed

Publication types

LinkOut - more resources