Comparative Study

. 2014 Aug 4;15(1):646.

doi: 10.1186/1471-2164-15-646.

The importance of phenotypic data analysis for genomic prediction - a case study comparing different spatial models in rye

Angela-Maria Bernal-Vasquez, Jens Möhring, Malthe Schmidt, Manfred Schönleben, Chris-Carolin Schön, Hans-Peter Piepho¹

Affiliations

PMID: 25087599
PMCID: PMC4133075
DOI: 10.1186/1471-2164-15-646

Comparative Study

The importance of phenotypic data analysis for genomic prediction - a case study comparing different spatial models in rye

Angela-Maria Bernal-Vasquez et al. BMC Genomics. 2014.

. 2014 Aug 4;15(1):646.

doi: 10.1186/1471-2164-15-646.

Authors

Angela-Maria Bernal-Vasquez, Jens Möhring, Malthe Schmidt, Manfred Schönleben, Chris-Carolin Schön, Hans-Peter Piepho¹

Affiliation

¹ Bioinformatics Unit, Institute of Crop Science, University of Hohenheim, Fruwirthstrasse 23, 70599 Stuttgart, Germany. piepho@uni-hohenheim.de.

PMID: 25087599
PMCID: PMC4133075
DOI: 10.1186/1471-2164-15-646

Abstract

Background: Genomic prediction is becoming a daily tool for plant breeders. It makes use of genotypic information to make predictions used for selection decisions. The accuracy of the predictions depends on the number of genotypes used in the calibration; hence, there is a need of combining data across years. A proper phenotypic analysis is a crucial prerequisite for accurate calibration of genomic prediction procedures. We compared stage-wise approaches to analyse a real dataset of a multi-environment trial (MET) in rye, which was connected between years only through one check, and used different spatial models to obtain better estimates, and thus, improved predictive abilities for genomic prediction. The aims of this study were to assess the advantage of using spatial models for the predictive abilities of genomic prediction, to identify suitable procedures to analyse a MET weakly connected across years using different stage-wise approaches, and to explore genomic prediction as a tool for selection of models for phenotypic data analysis.

Results: Using complex spatial models did not significantly improve the predictive ability of genomic prediction, but using row and column effects yielded the highest predictive abilities of all models. In the case of MET poorly connected between years, analysing each year separately and fitting year as a fixed effect in the genomic prediction stage yielded the most realistic predictive abilities. Predictive abilities can also be used to select models for phenotypic data analysis. The trend of the predictive abilities was not the same as the traditionally used Akaike information criterion, but favoured in the end the same models.

Conclusions: Making predictions using weakly linked datasets is of utmost interest for plant breeders. We provide an example with suggestions on how to handle such cases. Rather than relying on checks we show how to use year means across all entries for integrating data across years. It is further shown that fitting of row and column effects captures most of the heterogeneity in the field trials analysed.

PubMed Disclaimer

Figures

**Figure 1**
**General representation of stage-wise approaches to compare year-effect adjustment.** Factors were genotype (G), tester (T), location (L), year (A), trial (S), replicate (R) and block (B). Grain dry matter yield (Y) is the response variable in the first stage, M ⁽¹⁾ is the adjusted mean of genotypes across locations used in the second stage, M ^(1∗) is the year effect-corrected genotype adjusted mean, represents the simple mean of genotypes of the r-th year. In the genomic prediction (GP) stage, M ⁽²⁾ is the n×1 vector of adjusted means of genotypes by year for *Approach 1a* and across years for *Approach 2*, M ^(2∗) is the n×1 vector of adjusted means of year effect-corrected genotypes in *Approach 1b*, X and β are respectively the design matrix and parameter vector of fixed effects, Z is the n×p marker matrix, u is the p-dimensional vector of SNP effects and e the error vector. Y=G·T:S/R/B is the shorthand notation of the model eq. (1) in the text: Y _hijkv=(G T)_hv+S _i+R _ij+B _ijk+e _hijkv, M ⁽¹⁾=G×L×T stands for the model eq. (2) in the text: , and M ⁽¹⁾=(A/T)×G×L represents the extended model eq. (4) in the text: . The final predictive abilities (ρ) are presented in the ellipses.

formula image — **Figure 1**
**General representation of stage-wise approaches to compare year-effect adjustment.** Factors were genotype (G), tester (T), location (L), year (A), trial (S), replicate (R) and block (B). Grain dry matter yield (Y) is the response variable in the first stage, M ⁽¹⁾ is the adjusted mean of genotypes across locations used in the second stage, M ^(1∗) is the year effect-corrected genotype adjusted mean, represents the simple mean of genotypes of the r-th year. In the genomic prediction (GP) stage, M ⁽²⁾ is the n×1 vector of adjusted means of genotypes by year for *Approach 1a* and across years for *Approach 2*, M ^(2∗) is the n×1 vector of adjusted means of year effect-corrected genotypes in *Approach 1b*, X and β are respectively the design matrix and parameter vector of fixed effects, Z is the n×p marker matrix, u is the p-dimensional vector of SNP effects and e the error vector. Y=G·T:S/R/B is the shorthand notation of the model eq. (1) in the text: Y _hijkv=(G T)_hv+S _i+R _ij+B _ijk+e _hijkv, M ⁽¹⁾=G×L×T stands for the model eq. (2) in the text: , and M ⁽¹⁾=(A/T)×G×L represents the extended model eq. (4) in the text: . The final predictive abilities (ρ) are presented in the ellipses.

**Figure 2**
**General representation of model comparison through all the stages of the analysis.** Datasets generated from 9 spatial and non-spatial models plus two mixed datasets generated from best models given the Akaike information criterion (Mix1) and the predictive abilities (Mix2). Factors in second stage were genotype (G), location (L) and tester (T). M ⁽¹⁾ represents the adjusted mean of genotypes across locations and years. M ⁽¹⁾=G×L×T is the shorthand notation for . In the genomic prediction (GP) stage M ⁽²⁾ is the adjusted mean of genotypes across locations, X and β are respectively the design matrix and parameter vector of fixed effects, Z is the n×p marker matrix, u is the p-dimensional vector of SNP effects and e the error vector. Sampling methods in cross validation (CV) were across crosses (AC) and within crosses (WC). The final predictive abilities (ρ) are presented in the ellipses.

**Figure 3**
**General representation of strategies to compare model selection methods.** Factors were genotype (G), tester (T), trial (S), replicate (R) and block (B). Grain dry matter yield (Y) is the response variable in the first stage. Y=G·T:S/R/B is the shorthand notation for the model Y _hijkv=(G T)_hv+S _i+R _ij+B _ijk+e _hijkv. Datasets of 9 spatial and non spatial models plus one mixed dataset (Mix1) generated from best models given the Akaike information criterion (AIC) and another mixed dataset (Mix2) generated from best models given the predictive abilities (ρ-GP-CV).

**Figure 4**
**Comparison of approaches for year adjustment.** In the x-axis, the genotype adjusted means across-year analysis are plotted. In the y-axis, the year-effect-corrected adjusted means from the year-wise analysis are depicted.

**Figure 5**
**Comparison between approaches to fit the year effect.** The y-axis represents the genotype adjusted means in **(A)**, M ⁽²⁾ in **(B)** and M ^(2∗) in **(C)** ] and the x-axis represents the GEBV (). (A) Year-wise analysis (*Approach 1a*), fitting year as fixed effect in the GP stage, (B) Across-years analysis (*Approach 2*), using year in the second stage and (C) year-wise analysis using the year effect-corrected genotype means (*Approach 1b*). ρ _GP represents the predictive ability.

**Figure 6**
**Marker-based relationship heat-map.** Visualised are pairwise relationship coefficients estimated from the maker data for genotypes of years 2009 and 2010. Higher values represent a stronger relationship.

See this image and copyright information in PMC

References

1. Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. - PMC - PubMed
1. Schulz-Streeck T, Ogutu JO, Piepho HP. Comparisons of single-stage and two-stage approaches to genomic selection. Theor Appl Genet. 2013;126:69–82. doi: 10.1007/s00122-012-1960-1. - DOI - PubMed
1. Piepho HP, Möhring J, Schulz-Streeck T, Ogutu JO. A stage-wise approach for the analysis of multi-environment trials. Biom J. 2012;54:844–860. doi: 10.1002/bimj.201100219. - DOI - PubMed
1. Burgueño J, Crossa J, Cotes JM, San Vicente F, Das B. Prediction assessment of linear mixed models for multienvironment trials. Crop Sci. 2011;51:944–954. doi: 10.2135/cropsci2010.07.0403. - DOI
1. Piepho HP, Möhring J, Melchinger AE, Büchse A. Blup for phenotypic selection in plant breeding and variety testing. Euphytica. 2008;161:209–228. doi: 10.1007/s10681-007-9449-8. - DOI

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The importance of phenotypic data analysis for genomic prediction - a case study comparing different spatial models in rye

Affiliation

The importance of phenotypic data analysis for genomic prediction - a case study comparing different spatial models in rye

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous