Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 6;47(1):38.
doi: 10.1186/s12711-015-0116-6.

Optimization of genomic selection training populations with a genetic algorithm

Affiliations

Optimization of genomic selection training populations with a genetic algorithm

Deniz Akdemir et al. Genet Sel Evol. .

Abstract

In this article, we imagine a breeding scenario with a population of individuals that have been genotyped but not phenotyped. We derived a computationally efficient statistic that uses this genetic information to measure the reliability of genomic estimated breeding values (GEBV) for a given set of individuals (test set) based on a training set of individuals. We used this reliability measure with a genetic algorithm scheme to find an optimized training set from a larger set of candidate individuals. This subset was phenotyped to create the training set that was used in a genomic selection model to estimate GEBV in the test set. Our results show that, compared to a random sample of the same size, the use of a set of individuals selected by our method improved accuracies. We implemented the proposed training selection methodology on four sets of data on Arabidopsis, wheat, rice and maize. This dynamic model building process that takes genotypes of the individuals in the test sample into account while selecting the training individuals improves the performance of genomic selection models.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Arabidopsis data. The difference between the accuracies of the models trained on optimized populations versus random samples. Positive values indicate the cases for which the optimized population performed better as compared to a random sample.
Figure 2
Figure 2
Rice data: accuracies. Comparisons of mean accuracies (measured by correlation) for the traits florets per panicle (FP), panicle fertility (PF), seed length (SL), seed weight (SW), seed surface area (SSA) and straighthead susceptibility (SHS) for different training sample sizes. Error bars at three standard error units are also included. Optimized samples outperform random samples almost exclusively.
Figure 3
Figure 3
Rice data: structure. Left: Summary of the rice genotypic data with the first principal components and display lines that were most frequently selected by the optimization algorithm. Right: Lines that were most frequently selected by the optimization algorithm displayed on a neighbor-joining tree based on a genotypic distance matrix.
Figure 4
Figure 4
Wheat data. Comparisons of the mean accuracies (measured by correlation) when the test data set is selected from years 2007 through 2009 for different training sample sizes. For each of these cases, the training set was selected from the individuals in the years preceding the test year. Error bars at three standard error units are also included.
Figure 5
Figure 5
Maize data. Comparisons of the accuracies for prediction across clusters in the highly structured Maize data set. Test data set of size n Test=50 was selected at random in a particular cluster and a training population of size n Train=50,100,200 individuals was selected from the remaining clusters. Error bars at three standard error units are also included.

References

    1. Muir WM. Comparison of genomic and traditional blup-estimated breeding value accuracy and selection response under alternative trait and genomic parameters. J Anim Breed Genet. 2007;124:342–55. doi: 10.1111/j.1439-0388.2007.00700.x. - DOI - PubMed
    1. Heslot N, Yang HP, Sorrells ME, Jannink JL. Genomic selection in plant breeding: a comparison of models. Crop Sci. 2012;52:146–60. doi: 10.2135/cropsci2011.06.0297. - DOI
    1. Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink JL, Sorrells M, et al. Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3 (Bethesda) 2012;2:1427–36. doi: 10.1534/g3.112.003699. - DOI - PMC - PubMed
    1. Crossa J, Pérez P, Hickey JM, Burguenó J. Ornella L, Cerón-Rojas J, et al. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity. 2014;112:48–60. doi: 10.1038/hdy.2013.16. - DOI - PMC - PubMed
    1. Rincent R, Laloë D. Nicolas S, Altmann T, Brunel D, Revilla P, et al. Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: Comparison of methods in two diverse groups of maize inbreds (zea mays l.) Genetics. 2012;192:715–28. doi: 10.1534/genetics.112.141473. - DOI - PMC - PubMed

Publication types

LinkOut - more resources