. 2024 Jul 8;5(7):100975.

doi: 10.1016/j.xplc.2024.100975. Epub 2024 May 15.

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield

Affiliations

¹ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.
² Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA.
³ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China. Electronic address: wangky@nercita.org.cn.
⁴ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China. Electronic address: zhaochunjiang@nercita.org.cn.

PMID: 38751121
PMCID: PMC11287160
DOI: 10.1016/j.xplc.2024.100975

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield

Jinlong Li et al. Plant Commun. 2024.

. 2024 Jul 8;5(7):100975.

doi: 10.1016/j.xplc.2024.100975. Epub 2024 May 15.

Authors

Affiliations

¹ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.
² Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA.
³ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China. Electronic address: wangky@nercita.org.cn.
⁴ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China. Electronic address: zhaochunjiang@nercita.org.cn.

PMID: 38751121
PMCID: PMC11287160
DOI: 10.1016/j.xplc.2024.100975

Abstract

Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.

Keywords: crop; genotype to phenotype; multi-trait; transfer learning; yield prediction.

PubMed Disclaimer

Figures

**Figure 1**
Population structures and phenotype distribution of the studied datasets. **(A–C) A panel of 299 accessions was used f**or rice **(A)**, and a panel of 487 accessions was used for wheat **(B)**. Three distinct panels were used for maize: CNGWAS, USNAM, and G2F **(C)**. **(D–F)** Depiction of six traits in the rice population **(D)**, six traits in the wheat population **(E)**, and seven traits in the three maize populations **(F)**. The traits evaluated for transfer learning in the Rice299 dataset were plant height (PH), flag leaf area (LA), grain weight (GW), grain length (GL), and yield per plant (YPP). Evaluated traits in the Wheat487 dataset were PH, GW, number of spikes (SN), plant biomass (Biomass), and harvest index (HI). Evaluated traits in maize were days to pollen (DTP), ear height (EH), GW, and grain moisture (GM).

**Figure 2**
Design of TrG2P. **(A)** Schematic overview of the transfer-learning-based genomic prediction (TrG2P) framework. **(B–D)** The TrG2P comprises pre-training **(B)**, fine-tuning **(C)**, and fusion model construction **(D)**. “Pre-trained” indicates the weight transfer from pre-trained models, whereas “Train” involves retraining for the target task to acquire new weights. The terms “CONV layer” and “FC layer” denote the convolutional and fully connected layers, respectively.

**Figure 3**
Results of the pre-training step using the CNN algorithm. The traits trained in the Rice299 dataset were PH, LA, 1000 GW, GL, and YPP. Evaluated traits in the Wheat487 dataset were PH, GW, SN, Biomass, and HI. Evaluated traits in maize were DTP, EH, GW, and GM.

**Figure 4**
Results of yield predictions with TrG2P and comparative methods. Predictive accuracy of yield in the Rice299 **(A)**, Wheat487 **(B)**, and G2F **(C)** datasets. Four traditional algorithms and three deep-learning-based algorithms were used for comparison. The traditional algorithms were ridge regression best linear unbiased prediction (rrBLUP), random forest (RF), support vector regression (SVR), and LightGBM. Deep learning algorithms were convolutional neural networks (CNN), DeepGS, and DNNGP. “FT-∗” denotes the fine-tuned model with each corresponding pre-training model. “FT-∗ + FT-∗” denotes a fused model composed of the corresponding fine-tuning models. The traits evaluated for transfer learning in the Rice299 dataset were PH, LA, 1000 GW, GL, and YPP. Evaluated traits in the Wheat487 dataset were PH, GW, SN, Biomass, and HI. Evaluated traits in maize were DTP, EH, GW, and GM. The red and blue dashed lines denote the highest predictive accuracy of the traditional and deep learning algorithms, respectively.

**Figure 5**
Comparison of predictive performance between TrG2P and other methods. **(A–C)** Predictive accuracy comparison for the rice **(A)**, wheat **(B)**, and maize **(C)** datasets. **(D–F)** Mean-squared error (MSE) comparison for the rice **(D)**, wheat **(E)**, and maize **(F)** datasets. The label “TrG2P″ represents the best-performing model from the fusing step. Statistical significance was determined using Student’s t-test. Significance is indicated as ∗∗∗p < 0.001, ∗∗p < 0.01, and ∗p < 0.05.

**Figure 6**
Correlation analysis of the predictive abilities between fine-tuned models and fused models. Correlation analysis of the predictive abilities between fine-tuned models and fused models for rice **(A)**, wheat **(B)**, and maize **(C)**. The blue solid lines represent linear regression, and the gray shaded areas denote the 95% confidence intervals. Correlation coefficients (r) and corresponding statistical tests (p values) were calculated using Pearson’s method.

See this image and copyright information in PMC

Cited by

Using the Pearson's correlation coefficient as the sole metric to measure the accuracy of quantitative trait prediction: is it sufficient?
Pan S, Liu Z, Han Y, Zhang D, Zhao X, Li J, Wang K. Pan S, et al. Front Plant Sci. 2024 Dec 10;15:1480463. doi: 10.3389/fpls.2024.1480463. eCollection 2024. Front Plant Sci. 2024. PMID: 39719937 Free PMC article.
Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives.
Mbebi AJ, Mercado F, Hobby D, Tong H, Nikoloski Z. Mbebi AJ, et al. Brief Bioinform. 2025 May 1;26(3):bbaf211. doi: 10.1093/bib/bbaf211. Brief Bioinform. 2025. PMID: 40358423 Free PMC article. Review.
Fast-forwarding plant breeding with deep learning-based genomic prediction.
Gao S, Yu T, Rasheed A, Wang J, Crossa J, Hearne S, Li H. Gao S, et al. J Integr Plant Biol. 2025 Jul;67(7):1700-1705. doi: 10.1111/jipb.13914. Epub 2025 Apr 14. J Integr Plant Biol. 2025. PMID: 40226955 Free PMC article. Review.
WheatGP, a genomic prediction method based on CNN and LSTM.
Wang C, Zhang D, Ma Y, Zhao Y, Liu P, Li X. Wang C, et al. Brief Bioinform. 2025 Mar 4;26(2):bbaf191. doi: 10.1093/bib/bbaf191. Brief Bioinform. 2025. PMID: 40275535 Free PMC article.
PSR-MAPMS: A new approach for the interpretable prediction of myelin autoantigenic peptides in multiple sclerosis using multi-source propensity scores.
Charoenkwan P, Schaduangrat N, Chumnanpuen P, Shoombuatong W. Charoenkwan P, et al. Protein Sci. 2025 Aug;34(8):e70010. doi: 10.1002/pro.70010. Protein Sci. 2025. PMID: 40673425 Free PMC article.

See all "Cited by" articles

References

1. Abdollahi-Arpanahi L.R., Gianola D., Penagaricano F. Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes. Genet. Sel. Evol. 2020;52 doi: 10.1186/s12711-020-00531-z. - DOI - PMC - PubMed
1. Agrama H.J.P.b. Vol. 115. 1996. pp. 343–346. (Sequential Path Analysis of Grain Yield and its Components in Maize).
1. Albalawi Y., Buckley J., Nikolov N.S. Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media. J. Big Data. 2021;8:95. doi: 10.1186/s40537-021-00488-w. - DOI - PMC - PubMed
1. Annicchiarico P., Nazzicari N., Li X.H., Wei Y.L., Pecetti L., Brummer E.C. Accuracy of genomic selection for alfalfa biomass yield in different reference populations. BMC Genom. 2015;16 doi: 10.1186/s12864-015-2212-y. - DOI - PMC - PubMed
1. Begum H., Spindel J.E., Lalusin A., Borromeo T., Gregorio G., Hernandez J., Virk P., Collard B., McCouch S.R. Genome-wide association mapping for yield and other agronomic traits in an elite breeding population of tropical rice (Oryza sativa) PLoS One. 2015;10 doi: 10.1371/journal.pone.0119873. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield

Affiliations

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources