Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 27:10:1491.
doi: 10.3389/fpls.2019.01491. eCollection 2019.

Combining Crop Growth Modeling and Statistical Genetic Modeling to Evaluate Phenotyping Strategies

Affiliations

Combining Crop Growth Modeling and Statistical Genetic Modeling to Evaluate Phenotyping Strategies

Daniela Bustos-Korts et al. Front Plant Sci. .

Abstract

Genomic prediction of complex traits, say yield, benefits from including information on correlated component traits. Statistical criteria to decide which yield components to consider in the prediction model include the heritability of the component traits and their genetic correlation with yield. Not all component traits are easy to measure. Therefore, it may be attractive to include proxies to yield components, where these proxies are measured in (high-throughput) phenotyping platforms during the growing season. Using the Agricultural Production Systems Simulator (APSIM)-wheat cropping systems model, we simulated phenotypes for a wheat diversity panel segregating for a set of physiological parameters regulating phenology, biomass partitioning, and the ability to capture environmental resources. The distribution of the additive quantitative trait locus effects regulating the APSIM physiological parameters approximated the same distribution of quantitative trait locus effects on real phenotypic data for yield and heading date. We use the crop growth model APSIM-wheat to simulate phenotypes in three Australian environments with contrasting water deficit patterns. The APSIM output contained the dynamics of biomass and canopy cover, plus yield at the end of the growing season. Each water deficit pattern triggered different adaptive mechanisms and the impact of component traits differed between drought scenarios. We evaluated multiple phenotyping schedules by adding plot and measurement error to the dynamics of biomass and canopy cover. We used these trait dynamics to fit parametric models and P-splines to extract parameters with a larger heritability than the phenotypes at individual time points. We used those parameters in multi-trait prediction models for final yield. The combined use of crop growth models and multi-trait genomic prediction models provides a procedure to assess the efficiency of phenotyping strategies and compare methods to model trait dynamics. It also allows us to quantify the impact of yield components on yield prediction accuracy even in different environment types. In scenarios with mild or no water stress, yield prediction accuracy benefitted from including biomass and green canopy cover parameters. The advantage of the multi-trait model was smaller for the early-drought scenario, due to the reduced correlation between the secondary and the target trait. Therefore, multi-trait genomic prediction models for yield require scenario-specific correlated traits.

Keywords: APSIM model; P-spline; crop growth model; dynamic traits; genomic prediction; genotype to phenotype; trait hierarchy; wheat.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulation steps to generate phenotypes for a set of genotypes across environments. Bottom left; an Australian wheat panel is defined as a sample of the target population of genotypes. For this sample of genotypes, phenotypic data for yield and heading date have been collected in eight field trials as well as single nucleotide polymorphisms (SNP) data. The phenotypic data are associated with SNP data in univariate genome-wide association study analyses. From these analyses, empirical distributions for the additive effects of quantitative trait loci underlying these phenotypes are obtained. Physiological knowledge on trait correlations is used to define genetic correlations between Agricultural Production Systems Simulator (APSIM) parameters (yiP). These correlations are included in a multi-variate description of the quantitative trait loci underlying APSIM parameters. From this distribution, genotype specific APSIM parameters (yiP) are generated and assigned to a subset of SNPs. Bottom right; we have historical environmental data defining the target population of environments (TPE). We use APSIM to identify environment scenarios (water deficit patterns). The environmental data of the selected scenarios and the genotype-dependent APSIM physiological parameters are used to generate intermediate traits over time (yijI). In a breeding programme, these intermediate traits are unknown, but we can approximate intermediate traits by high throughput phenotyping techniques, where the intermediate traits will come with plot (eijplot) and measurement error (eijmeasurement). The target trait (yijT) is modeled as a function of intermediate traits.
Figure 2
Figure 2
Additive main effect and multiplicative interaction biplot for grain yield in Emerald, Merredin, Narrabri, and Yanco during 1993–2013. Gray squares represent genotype scores and grey arrows represent environment scores. Environments that were sampled from different environment types (ET) for a more detailed characterization of traits over time are indicated in coloured arrows. ET1 represents trials without water deficit (represented in the sample by “Yanco_2010”), ET2 corresponded to intermediate drought starting around flowering (represented by “Narrabri_2008”). ET3 corresponded to intense drought starting early during the growing season (represented in the sample by “Emerald_1993”).
Figure 3
Figure 3
Quadratic function to relate the measurement error size (R2 between the high-throughput phenotyping and direct measurement of biomass) to the green canopy cover observed for a genotype at a specific day. As genotypes differ in green canopy cover in a given environment and day, their measurement error is also genotype-specific.
Figure 4
Figure 4
Correlations between APSIM parameters, parameters of biomass accumulation and canopy cover and yield. Details about the trait description are indicated in Table 1 .
Figure 5
Figure 5
Green canopy cover dynamics for a random sample of five genotypes (left panels) and genotype-specific R2 between high-throughput phenotyping and direct measurement of green canopy cover during the growing season (right panels) in three trials representing different environment types (i.e. different patterns of drought).
Figure 6
Figure 6
Heritability of the parameters from curves fitted to the dynamics of biomass accumulation and green canopy cover for the collection of genotypes, measured with high-throughput phenotyping (HTP). BL_asy is the asymptote for biomass fitted with a logistic curve, BS_asy is the asymptote for biomass fitted with a spline, BL_slope is the maximum slope of biomass fitted with a logistic curve, BS_slope is the maximum slope of biomass fitted with a spline, CC_max is maximum green canopy cover calculated from a cubic curve and CS_max is maximum green canopy cover calculated from a spline fit. The x-axis indicates the interval for different analyses, expressed as the number of days (5, 10, 15, or 20) between two consecutive HTP measurements. The z-axis (H2 time point) indicates the quality of the HTP measurement, quantified as the R2 between the direct phenotypic measurements (APSIM biomass plus plot error, Equation 2) and HTP (APSIM biomass plus plot and measurement error, Equation 3).
Figure 7
Figure 7
Heritability of curve estimates, as a function of the heritability at single time points. Each box contains H2 estimates obtained across levels for interval size between two consecutive measurements. Details about the trait description are indicated in Table 1 .
Figure 8
Figure 8
Heritability of curve estimates, as a function of the interval between two consecutive measurements, expressed in days. Each box contains H2 estimates obtained across measurement error sizes. Details about the trait description are indicated in Table 1 .
Figure 9
Figure 9
Yield prediction accuracy and standard error in ET1, ET2, and ET3 calculated with the multi-trait prediction models M2S and M2P, considering yield and parameters estimated from the biomass and green canopy cover dynamics using P-splines (BS_asy, BS_slope and CS_max) or parametric models (BL_asy, BL_slope, or CL_max), for the scenario nG_allt (target and secondary traits missing in the validation set). The x-axis indicates the heritability of individual time points measured with HTP, quantified as the R2 between the direct phenotypic measurements (APSIM biomass plus plot error, Equation 2) and HTP (APSIM biomass plus plot and measurement error, Equation 3). Symbol colour indicates the interval, expressed as the number of days between two consecutive HTP measurements. Black horizontal lines shows yield prediction accuracy for a single trait model trained with yield data for the genotypes in the training set (M1). Single- and multi-trait models were trained with 100 genotypes, whereas 99 genotypes were used for validation. Bars indicate the confidence interval for the mean, calculated across 30 realizations of the training-validation sets.
Figure 10
Figure 10
Yield prediction accuracy and standard error in ET1, ET2, and ET3 calculated with the multi-trait prediction models M2S and M2P, considering yield and parameters estimated from the biomass and green canopy cover dynamics using P-splines (BS_asy, BS_slope, and CS_max) or parametric models (BL_asy, BL_slope, or CL_max), for the scenario nG_yld (target trait missing in the validation set, but secondary traits are present in both training and validation set).The x-axis indicates the heritability of individual time points measured with HTP, quantified as the R2 between the direct phenotypic measurements (APSIM biomass plus plot error, Equation 2) and HTP (APSIM biomass plus plot and measurement error, Equation 3). Symbol colour indicates the interval, expressed as the number of days between two consecutive HTP measurements. Black horizontal lines shows yield prediction accuracy for a single trait model trained with yield data for the genotypes in the training set (M1). Single- and multi-trait models were trained with 100 genotypes, whereas 99 genotypes were used for validation. Bars indicate the confidence interval for the mean, calculated across 30 realizations of the training-validation sets.
Figure 11
Figure 11
Yield prediction accuracy and standard error in ET1, ET2, and ET3 calculated with the multi-trait prediction models M3S and M3P, considering the target trait and summaries of biomass over time, whenbiomass had an error that was a function of canopy cover. Predictions indicated with nGall_param and nGyld_param considered yield plus BL_asy, BL_slope, and CL_max, and models nGall_spline and nGyld_spline considered yield plus BS_asy, BS_slope and CS_max. The x-axis indicates the interval between consecutive phenotyping days. Symbol color indicates the combination of prediction scenario (nG_all, target and secondary traits missing in the validation set or nG_yld, target trait missing in the validation set, but secondary traits are present in both training and validation set) and method to model biomass over time (spline or logistic model) interval, expressed as the number of days between two consecutive HTP measurements. Black horizontal lines shows yield prediction accuracy for a single trait model trained with yield data for the genotypes in the training set (M1). Single- and multi-trait models were trained with 100 genotypes, whereas 99 genotypes were used for validation. Bars indicate the confidence interval for the mean, calculated across 30 realizations of the training-validation sets.
Figure 12
Figure 12
Yield prediction accuracy and standard error in ET1, ET2, and ET3 calculated with a multi-trait prediction model (M4) considering either the target trait and three APSIM (y_rue, photo_sens, and vern_sens). The x-axis indicates the heritability of the HTP measurement for the APSIM parameter. Symbol colour indicates the prediction scenario; nG_all (target and secondary traits missing in the validation set) or nG_yld (target trait missing in the validation set, but secondary traits are present in both training and validation set). Black horizontal lines shows yield prediction accuracy for a single trait model trained with yield data for the genotypes in the training set (M1). Single- and multi-trait models were trained with 100 genotypes, whereas 99 genotypes were used for validation. Bars indicate the confidence interval for the mean, calculated across 30 realizations of the training-validation sets.

References

    1. Albrecht T., Auinger H.-J., Wimmer V., Ogutu J., Knaak C., Ouzunova M. (2014). Genome-based prediction of maize hybrid performance across genetic groups, testers, locations, and years. Theor. Appl. Genet. 127, 1375–1386. 10.1007/s00122-014-2305-z - DOI - PubMed
    1. Alimi N. A., Alimi N. A. (2016). Statistical methods for QTL mapping and genomic prediction of multiple traits and environments: case studies in pepper. PhD Thesis, Wageningen University; 165 pp. Available at: http://edepot.wur.nl/390205. 10.18174/390205 - DOI
    1. Araus J. L., Cairns J. E. (2014). Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci. 19, 52–61. 10.1016/j.tplants.2013.09.008 - DOI - PubMed
    1. Araus J. L., Kefauver S. C., Zaman-Allah M., Olsen M. S., Cairns J. E. (2018).Translating high-throughput phenotyping into genetic gain. Trends Plant Sci. 23, 1–16. 10.1016/j.tplants.2018.02.001 - DOI - PMC - PubMed
    1. Astle W., Balding D. J. (2009). Population structure and cryptic relatedness in genetic association studies. Stat. Sci. 451–471. 10.1214/09-STS307 - DOI

LinkOut - more resources