Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 8;5(7):100975.
doi: 10.1016/j.xplc.2024.100975. Epub 2024 May 15.

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield

Affiliations

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield

Jinlong Li et al. Plant Commun. .

Abstract

Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.

Keywords: crop; genotype to phenotype; multi-trait; transfer learning; yield prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Population structures and phenotype distribution of the studied datasets. (A–C) A panel of 299 accessions was used for rice (A), and a panel of 487 accessions was used for wheat (B). Three distinct panels were used for maize: CNGWAS, USNAM, and G2F (C). (D–F) Depiction of six traits in the rice population (D), six traits in the wheat population (E), and seven traits in the three maize populations (F). The traits evaluated for transfer learning in the Rice299 dataset were plant height (PH), flag leaf area (LA), grain weight (GW), grain length (GL), and yield per plant (YPP). Evaluated traits in the Wheat487 dataset were PH, GW, number of spikes (SN), plant biomass (Biomass), and harvest index (HI). Evaluated traits in maize were days to pollen (DTP), ear height (EH), GW, and grain moisture (GM).
Figure 2
Figure 2
Design of TrG2P. (A) Schematic overview of the transfer-learning-based genomic prediction (TrG2P) framework. (B–D) The TrG2P comprises pre-training (B), fine-tuning (C), and fusion model construction (D). “Pre-trained” indicates the weight transfer from pre-trained models, whereas “Train” involves retraining for the target task to acquire new weights. The terms “CONV layer” and “FC layer” denote the convolutional and fully connected layers, respectively.
Figure 3
Figure 3
Results of the pre-training step using the CNN algorithm. The traits trained in the Rice299 dataset were PH, LA, 1000 GW, GL, and YPP. Evaluated traits in the Wheat487 dataset were PH, GW, SN, Biomass, and HI. Evaluated traits in maize were DTP, EH, GW, and GM.
Figure 4
Figure 4
Results of yield predictions with TrG2P and comparative methods. Predictive accuracy of yield in the Rice299 (A), Wheat487 (B), and G2F (C) datasets. Four traditional algorithms and three deep-learning-based algorithms were used for comparison. The traditional algorithms were ridge regression best linear unbiased prediction (rrBLUP), random forest (RF), support vector regression (SVR), and LightGBM. Deep learning algorithms were convolutional neural networks (CNN), DeepGS, and DNNGP. “FT-∗” denotes the fine-tuned model with each corresponding pre-training model. “FT-∗ + FT-∗” denotes a fused model composed of the corresponding fine-tuning models. The traits evaluated for transfer learning in the Rice299 dataset were PH, LA, 1000 GW, GL, and YPP. Evaluated traits in the Wheat487 dataset were PH, GW, SN, Biomass, and HI. Evaluated traits in maize were DTP, EH, GW, and GM. The red and blue dashed lines denote the highest predictive accuracy of the traditional and deep learning algorithms, respectively.
Figure 5
Figure 5
Comparison of predictive performance between TrG2P and other methods. (A–C) Predictive accuracy comparison for the rice (A), wheat (B), and maize (C) datasets. (D–F) Mean-squared error (MSE) comparison for the rice (D), wheat (E), and maize (F) datasets. The label “TrG2P″ represents the best-performing model from the fusing step. Statistical significance was determined using Student’s t-test. Significance is indicated as ∗∗∗p < 0.001, ∗∗p < 0.01, and ∗p < 0.05.
Figure 6
Figure 6
Correlation analysis of the predictive abilities between fine-tuned models and fused models. Correlation analysis of the predictive abilities between fine-tuned models and fused models for rice (A), wheat (B), and maize (C). The blue solid lines represent linear regression, and the gray shaded areas denote the 95% confidence intervals. Correlation coefficients (r) and corresponding statistical tests (p values) were calculated using Pearson’s method.

Similar articles

Cited by

References

    1. Abdollahi-Arpanahi L.R., Gianola D., Penagaricano F. Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genet. Sel. Evol. 2020;52 doi: 10.1186/s12711-020-00531-z. - DOI - PMC - PubMed
    1. Agrama H.J.P.b. Vol. 115. 1996. pp. 343–346. (Sequential Path Analysis of Grain Yield and its Components in Maize).
    1. Albalawi Y., Buckley J., Nikolov N.S. Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media. J. Big Data. 2021;8:95. doi: 10.1186/s40537-021-00488-w. - DOI - PMC - PubMed
    1. Annicchiarico P., Nazzicari N., Li X.H., Wei Y.L., Pecetti L., Brummer E.C. Accuracy of genomic selection for alfalfa biomass yield in different reference populations. BMC Genom. 2015;16 doi: 10.1186/s12864-015-2212-y. - DOI - PMC - PubMed
    1. Begum H., Spindel J.E., Lalusin A., Borromeo T., Gregorio G., Hernandez J., Virk P., Collard B., McCouch S.R. Genome-wide association mapping for yield and other agronomic traits in an elite breeding population of tropical rice (Oryza sativa) PLoS One. 2015;10 doi: 10.1371/journal.pone.0119873. - DOI - PMC - PubMed

LinkOut - more resources