Optimization of genomic selection training populations with a genetic algorithm

Deniz Akdemir¹, Julio I Sanchez², Jean-Luc Jannink³

Affiliations

¹ Department of Plant Breeding & Genetics, Cornell University, Ithaca, NY, USA. da346@cornell.edu.
² Department of Plant Breeding & Genetics, Cornell University, Ithaca, NY, USA. ji66@cornell.edu.
³ Robert W. Holley Center for Agriculture and Health, USDA-ARS, Ithaca, NY, USA. jeanluc.jannink@ars.usda.gov.

PMID: 25943105
PMCID: PMC4422310
DOI: 10.1186/s12711-015-0116-6

Optimization of genomic selection training populations with a genetic algorithm

Deniz Akdemir et al. Genet Sel Evol. 2015.

. 2015 May 6;47(1):38.

doi: 10.1186/s12711-015-0116-6.

Authors

Deniz Akdemir¹, Julio I Sanchez², Jean-Luc Jannink³

Affiliations

¹ Department of Plant Breeding & Genetics, Cornell University, Ithaca, NY, USA. da346@cornell.edu.
² Department of Plant Breeding & Genetics, Cornell University, Ithaca, NY, USA. ji66@cornell.edu.
³ Robert W. Holley Center for Agriculture and Health, USDA-ARS, Ithaca, NY, USA. jeanluc.jannink@ars.usda.gov.

PMID: 25943105
PMCID: PMC4422310
DOI: 10.1186/s12711-015-0116-6

Abstract

In this article, we imagine a breeding scenario with a population of individuals that have been genotyped but not phenotyped. We derived a computationally efficient statistic that uses this genetic information to measure the reliability of genomic estimated breeding values (GEBV) for a given set of individuals (test set) based on a training set of individuals. We used this reliability measure with a genetic algorithm scheme to find an optimized training set from a larger set of candidate individuals. This subset was phenotyped to create the training set that was used in a genomic selection model to estimate GEBV in the test set. Our results show that, compared to a random sample of the same size, the use of a set of individuals selected by our method improved accuracies. We implemented the proposed training selection methodology on four sets of data on Arabidopsis, wheat, rice and maize. This dynamic model building process that takes genotypes of the individuals in the test sample into account while selecting the training individuals improves the performance of genomic selection models.

PubMed Disclaimer

Figures

**Figure 1**
Arabidopsis data. The difference between the accuracies of the models trained on optimized populations versus random samples. Positive values indicate the cases for which the optimized population performed better as compared to a random sample.

**Figure 2**
Rice data: accuracies. Comparisons of mean accuracies (measured by correlation) for the traits florets per panicle (FP), panicle fertility (PF), seed length (SL), seed weight (SW), seed surface area (SSA) and straighthead susceptibility (SHS) for different training sample sizes. Error bars at three standard error units are also included. Optimized samples outperform random samples almost exclusively.

**Figure 3**
Rice data: structure. Left: Summary of the rice genotypic data with the first principal components and display lines that were most frequently selected by the optimization algorithm. Right: Lines that were most frequently selected by the optimization algorithm displayed on a neighbor-joining tree based on a genotypic distance matrix.

**Figure 4**
Wheat data. Comparisons of the mean accuracies (measured by correlation) when the test data set is selected from years 2007 through 2009 for different training sample sizes. For each of these cases, the training set was selected from the individuals in the years preceding the test year. Error bars at three standard error units are also included.

**Figure 5**
Maize data. Comparisons of the accuracies for prediction across clusters in the highly structured Maize data set. Test data set of size n _Test=50 was selected at random in a particular cluster and a training population of size n _Train=50,100,200 individuals was selected from the remaining clusters. Error bars at three standard error units are also included.

See this image and copyright information in PMC

References

1. Muir WM. Comparison of genomic and traditional blup-estimated breeding value accuracy and selection response under alternative trait and genomic parameters. J Anim Breed Genet. 2007;124:342–55. doi: 10.1111/j.1439-0388.2007.00700.x. - DOI - PubMed
1. Heslot N, Yang HP, Sorrells ME, Jannink JL. Genomic selection in plant breeding: a comparison of models. Crop Sci. 2012;52:146–60. doi: 10.2135/cropsci2011.06.0297. - DOI
1. Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink JL, Sorrells M, et al. Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3 (Bethesda) 2012;2:1427–36. doi: 10.1534/g3.112.003699. - DOI - PMC - PubMed
1. Crossa J, Pérez P, Hickey JM, Burguenó J. Ornella L, Cerón-Rojas J, et al. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity. 2014;112:48–60. doi: 10.1038/hdy.2013.16. - DOI - PMC - PubMed
1. Rincent R, Laloë D. Nicolas S, Altmann T, Brunel D, Revilla P, et al. Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: Comparison of methods in two diverse groups of maize inbreds (zea mays l.) Genetics. 2012;192:715–28. doi: 10.1534/genetics.112.141473. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimization of genomic selection training populations with a genetic algorithm

Affiliations

Optimization of genomic selection training populations with a genetic algorithm

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources