Review

. 2025 May 1;26(3):bbaf211.

doi: 10.1093/bib/bbaf211.

Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives

Alain J Mbebi^{1

2}, Facundo Mercado¹, David Hobby¹, Hao Tong^{1

2}, Zoran Nikoloski^{1

2}

Affiliations

¹ Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Brandenburg, Germany.
² Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Brandenburg, Germany.

PMID: 40358423
PMCID: PMC12070487
DOI: 10.1093/bib/bbaf211

Review

Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives

Alain J Mbebi et al. Brief Bioinform. 2025.

. 2025 May 1;26(3):bbaf211.

doi: 10.1093/bib/bbaf211.

Authors

Alain J Mbebi^{1

2}, Facundo Mercado¹, David Hobby¹, Hao Tong^{1

2}, Zoran Nikoloski^{1

2}

Affiliations

¹ Bioinformatics Department, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Brandenburg, Germany.
² Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Brandenburg, Germany.

PMID: 40358423
PMCID: PMC12070487
DOI: 10.1093/bib/bbaf211

Abstract

Traits in any organism are not independent, but show considerable integration, observed in a form of couplings and trade-offs. Therefore, improvement in one trait may affect other traits, often in undesired direction. To account for this problem, crop breeding increasingly relies on multi-trait genomic prediction (MT-GP) approaches that leverage the availability of genetic markers from different populations along with advances in high-throughput precision phenotyping. While significant progress has been made to jointly model multiple traits using a variety of statistical and machine learning approaches, there is no systematic comparison of advantages and shortcomings of the existing classes of MT-GP models. Here, we fill this knowledge gap by first classifying the existing MT-GP models and briefly summarizing their general principles, modeling assumptions, and potential limitations. We then perform an extensive comparative analysis with 10 traits measured in an Oryza sativa diversity panel using cross-validation scenarios relevant in breeding practice. Finally, we discuss directions that can enable the building of next generation MT-GP models in addressing pressing challenges in crop breeding.

Keywords: breeding; crop improvement; deep learning; genomic prediction; machine learning; multi-trait.

PubMed Disclaimer

Conflict of interest statement

All authors declare that they have no conflict of interest.

Figures

**Figure 1**
Schematic overview of GS. Showcased are the main steps involved in the GS, starting with the collection of phenotypic and genotypic data from a training population (e.g. inbreeds or hybrids). Depending on the prediction objective and the sample size, different CV schemes along with collected data are used to train the predictive models; these models are subsequently used to determine GEBVs. The GEBVs are then applied to a testing population that is only phenotyped and from which individuals with desired performances are selected without the need for direct phenotyping. Briefly, in -fold CV, the population under consideration is partitioned into folds of approximately equal size; the model is trained on folds while the remaining fold is used for validation until each fold has been used as a validation set. Leave one out CV is similar to the former except for the fact that a single individual is used for validation. On the other hand, CV0, CV00, CV1, and CV2 are employed under multiple environments settings and they correspond respectively to the prediction of seen genotypes in unseen environments, unseen genotypes in unseen environments, unseen genotypes in seen environments and genotypes seen in some environments to be predicted in other seen environments.

formula image — **Figure 1**
Schematic overview of GS. Showcased are the main steps involved in the GS, starting with the collection of phenotypic and genotypic data from a training population (e.g. inbreeds or hybrids). Depending on the prediction objective and the sample size, different CV schemes along with collected data are used to train the predictive models; these models are subsequently used to determine GEBVs. The GEBVs are then applied to a testing population that is only phenotyped and from which individuals with desired performances are selected without the need for direct phenotyping. Briefly, in -fold CV, the population under consideration is partitioned into folds of approximately equal size; the model is trained on folds while the remaining fold is used for validation until each fold has been used as a validation set. Leave one out CV is similar to the former except for the fact that a single individual is used for validation. On the other hand, CV0, CV00, CV1, and CV2 are employed under multiple environments settings and they correspond respectively to the prediction of seen genotypes in unseen environments, unseen genotypes in unseen environments, unseen genotypes in seen environments and genotypes seen in some environments to be predicted in other seen environments.

**Figure 2**
Comparison of predictabilities for MT and a baseline GP methods with a rice data set. We used five MT-GP models, namely: MT-BMORS, MT-MOR, MT-SVD, MT-PLS, and MT-DL, and ST-GBLUP to predict the levels of five metabolites (i.e. mr1198, mr1234, mr1246, mr1268, and mr1418; see Metabolic traits section for full description) as well as five yield-related traits (i.e. yield, GW, HD, PSSR, and PH). The predictability is computed as the average Pearson correlation coefficient between observed and predicted values for the ten traits in the validation set, based on 20 repetitions of 5- and 10-fold CV for respectively CV-A (a and b), CV-B ( d and e), and CV-C (c). The average accuracy obtained from repeated CVs are reported as the height of the bars along with the standard errors. Panels a and b correspond to the CV schemes in which models were trained on Indica and Japonica to predict traits in Indica and Japonica accessions, respectively. In contrast, panels d and e correspond respectively to the CV scenario where the models were trained on data from Indica (Japonica) and used to predict the performance on Japonica (Indica). Finally, panel c is concerned with the random split with varying proportion of combined Indica/Japonica samples to predict the remaining mixed samples of Indica and japonica.

See this image and copyright information in PMC

Cited by

Elucidating Genotypic Variation in Quinoa via Multidimensional Agronomic, Physiological, and Biochemical Assessments.
Nazeer S, Akram MZ. Nazeer S, et al. Plants (Basel). 2025 Jul 28;14(15):2332. doi: 10.3390/plants14152332. Plants (Basel). 2025. PMID: 40805681 Free PMC article.

References

1. Van Dijk, Morley T, Rau ML. et al. . A meta-analysis of projected global food demand and population at risk of hunger for the period 2010–2050. Nat Food 2021;2:494–501. 10.1038/s43016-021-00322-9 - DOI - PubMed
1. Tester M, Langridge P. Breeding technologies to increase crop production in a changing world. Science 2010;327:818–22. 10.1126/science.1183700 - DOI - PubMed
1. McCouch S, Baute GJ, Bradeen J. et al. . Feeding the future. Nature 2013;499:23–4. 10.1038/499023a - DOI - PubMed
1. Dwivedi SL, Heslop-Harrison P, Amas J. et al. . Epistasis and pleiotropy-induced variation for plant breeding. Plant Biotechnol J 2024;22:2788–807. 10.1111/pbi.14405 - DOI - PMC - PubMed
1. Mackay TFC, Anholt RRH. Pleiotropy, epistasis and the genetic architecture of quantitative traits. Nat Rev Genet 2024;25:639–57. 10.1038/s41576-024-00711-3 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

101060393/BOLERO

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives

Affiliations

Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous