Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2014 Aug 4;15(1):646.
doi: 10.1186/1471-2164-15-646.

The importance of phenotypic data analysis for genomic prediction - a case study comparing different spatial models in rye

Affiliations
Comparative Study

The importance of phenotypic data analysis for genomic prediction - a case study comparing different spatial models in rye

Angela-Maria Bernal-Vasquez et al. BMC Genomics. .

Abstract

Background: Genomic prediction is becoming a daily tool for plant breeders. It makes use of genotypic information to make predictions used for selection decisions. The accuracy of the predictions depends on the number of genotypes used in the calibration; hence, there is a need of combining data across years. A proper phenotypic analysis is a crucial prerequisite for accurate calibration of genomic prediction procedures. We compared stage-wise approaches to analyse a real dataset of a multi-environment trial (MET) in rye, which was connected between years only through one check, and used different spatial models to obtain better estimates, and thus, improved predictive abilities for genomic prediction. The aims of this study were to assess the advantage of using spatial models for the predictive abilities of genomic prediction, to identify suitable procedures to analyse a MET weakly connected across years using different stage-wise approaches, and to explore genomic prediction as a tool for selection of models for phenotypic data analysis.

Results: Using complex spatial models did not significantly improve the predictive ability of genomic prediction, but using row and column effects yielded the highest predictive abilities of all models. In the case of MET poorly connected between years, analysing each year separately and fitting year as a fixed effect in the genomic prediction stage yielded the most realistic predictive abilities. Predictive abilities can also be used to select models for phenotypic data analysis. The trend of the predictive abilities was not the same as the traditionally used Akaike information criterion, but favoured in the end the same models.

Conclusions: Making predictions using weakly linked datasets is of utmost interest for plant breeders. We provide an example with suggestions on how to handle such cases. Rather than relying on checks we show how to use year means across all entries for integrating data across years. It is further shown that fitting of row and column effects captures most of the heterogeneity in the field trials analysed.

PubMed Disclaimer

Figures

Figure 1
Figure 1
General representation of stage-wise approaches to compare year-effect adjustment. Factors were genotype (G), tester (T), location (L), year (A), trial (S), replicate (R) and block (B). Grain dry matter yield (Y) is the response variable in the first stage, M (1) is the adjusted mean of genotypes across locations used in the second stage, M (1∗) is the year effect-corrected genotype adjusted mean, formula image represents the simple mean of genotypes of the r-th year. In the genomic prediction (GP) stage, M (2) is the n×1 vector of adjusted means of genotypes by year for Approach 1a and across years for Approach 2, M (2∗) is the n×1 vector of adjusted means of year effect-corrected genotypes in Approach 1b, X and β are respectively the design matrix and parameter vector of fixed effects, Z is the n×p marker matrix, u is the p-dimensional vector of SNP effects and e the error vector. Y=G·T:S/R/B is the shorthand notation of the model eq. (1) in the text: Y hijkv=(G T)hv+S i+R ij+B ijk+e hijkv, M (1)=G×L×T stands for the model eq. (2) in the text: formula image, and M (1)=(A/TG×L represents the extended model eq. (4) in the text: formula image. The final predictive abilities (ρ) are presented in the ellipses.
Figure 2
Figure 2
General representation of model comparison through all the stages of the analysis. Datasets generated from 9 spatial and non-spatial models plus two mixed datasets generated from best models given the Akaike information criterion (Mix1) and the predictive abilities (Mix2). Factors in second stage were genotype (G), location (L) and tester (T). M (1) represents the adjusted mean of genotypes across locations and years. M (1)=G×L×T is the shorthand notation for formula image. In the genomic prediction (GP) stage M (2) is the adjusted mean of genotypes across locations, X and β are respectively the design matrix and parameter vector of fixed effects, Z is the n×p marker matrix, u is the p-dimensional vector of SNP effects and e the error vector. Sampling methods in cross validation (CV) were across crosses (AC) and within crosses (WC). The final predictive abilities (ρ) are presented in the ellipses.
Figure 3
Figure 3
General representation of strategies to compare model selection methods. Factors were genotype (G), tester (T), trial (S), replicate (R) and block (B). Grain dry matter yield (Y) is the response variable in the first stage. Y=G·T:S/R/B is the shorthand notation for the model Y hijkv=(G T)hv+S i+R ij+B ijk+e hijkv. Datasets of 9 spatial and non spatial models plus one mixed dataset (Mix1) generated from best models given the Akaike information criterion (AIC) and another mixed dataset (Mix2) generated from best models given the predictive abilities (ρ-GP-CV).
Figure 4
Figure 4
Comparison of approaches for year adjustment. In the x-axis, the genotype adjusted means across-year analysis are plotted. In the y-axis, the year-effect-corrected adjusted means from the year-wise analysis are depicted.
Figure 5
Figure 5
Comparison between approaches to fit the year effect. The y-axis represents the genotype adjusted means formula image in (A), M (2) in (B) and M (2∗) in (C) ] and the x-axis represents the GEBV (formula image). (A) Year-wise analysis (Approach 1a), fitting year as fixed effect in the GP stage, (B) Across-years analysis (Approach 2), using year in the second stage and (C) year-wise analysis using the year effect-corrected genotype means (Approach 1b). ρ GP represents the predictive ability.
Figure 6
Figure 6
Marker-based relationship heat-map. Visualised are pairwise relationship coefficients estimated from the maker data for genotypes of years 2009 and 2010. Higher values represent a stronger relationship.

References

    1. Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. - PMC - PubMed
    1. Schulz-Streeck T, Ogutu JO, Piepho HP. Comparisons of single-stage and two-stage approaches to genomic selection. Theor Appl Genet. 2013;126:69–82. doi: 10.1007/s00122-012-1960-1. - DOI - PubMed
    1. Piepho HP, Möhring J, Schulz-Streeck T, Ogutu JO. A stage-wise approach for the analysis of multi-environment trials. Biom J. 2012;54:844–860. doi: 10.1002/bimj.201100219. - DOI - PubMed
    1. Burgueño J, Crossa J, Cotes JM, San Vicente F, Das B. Prediction assessment of linear mixed models for multienvironment trials. Crop Sci. 2011;51:944–954. doi: 10.2135/cropsci2010.07.0403. - DOI
    1. Piepho HP, Möhring J, Melchinger AE, Büchse A. Blup for phenotypic selection in plant breeding and variety testing. Euphytica. 2008;161:209–228. doi: 10.1007/s10681-007-9449-8. - DOI

Publication types