. 2024 Jul 23;15(8):969.

doi: 10.3390/genes15080969.

A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets

Osval A Montesinos-López¹, Cristian Daniel Pulido-Carrillo¹, Abelardo Montesinos-López², Jesús Antonio Larios Trejo³, José Cricelio Montesinos-López⁴, Afolabi Agbona^{5

6}, José Crossa^{7

8

9

10}

Affiliations

¹ Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico.
² Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico.
³ Facultad de Ciencias de la Educación, Universidad de Colima, Colima 28040, Mexico.
⁴ Department of Public Health Sciences, University of California Davis, Davis, CA 95616, USA.
⁵ International Institute of Tropical Agriculture (IITA), Ibadan 200113, Nigeria.
⁶ Molecular & Environmental Plant Sciences, Texas A&M University, College Station, TX 77843, USA.
⁷ International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco 52640, Mexico.
⁸ Louisiana State University, Baton Rouge, LA 70803, USA.
⁹ Distinguished Scientist Fellowship Program and Department of Statistics and Operations Research, King Saud University, Riyah 11451, Saudi Arabia.
¹⁰ Colegio de Postgraduados, Montecillos 56230, Mexico.

PMID: 39202329
PMCID: PMC11353568
DOI: 10.3390/genes15080969

A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets

Osval A Montesinos-López et al. Genes (Basel). 2024.

. 2024 Jul 23;15(8):969.

doi: 10.3390/genes15080969.

Authors

Affiliations

¹ Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico.
² Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico.
³ Facultad de Ciencias de la Educación, Universidad de Colima, Colima 28040, Mexico.
⁴ Department of Public Health Sciences, University of California Davis, Davis, CA 95616, USA.
⁵ International Institute of Tropical Agriculture (IITA), Ibadan 200113, Nigeria.
⁶ Molecular & Environmental Plant Sciences, Texas A&M University, College Station, TX 77843, USA.
⁷ International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, Texcoco 52640, Mexico.
⁸ Louisiana State University, Baton Rouge, LA 70803, USA.
⁹ Distinguished Scientist Fellowship Program and Department of Statistics and Operations Research, King Saud University, Riyah 11451, Saudi Arabia.
¹⁰ Colegio de Postgraduados, Montecillos 56230, Mexico.

PMID: 39202329
PMCID: PMC11353568
DOI: 10.3390/genes15080969

Abstract

Genomic selection (GS) is changing plant breeding by significantly reducing the resources needed for phenotyping. However, its accuracy can be compromised by mismatches between training and testing sets, which impact efficiency when the predictive model does not adequately reflect the genetic and environmental conditions of the target population. To address this challenge, this study introduces a straightforward method using binary-Lasso regression to estimate β coefficients. In this approach, the response variable assigns 1 to testing set inputs and 0 to training set inputs. Subsequently, Lasso, Ridge, and Elastic Net regression models use the inverse of these β coefficients (in absolute values) as weights during training (WLasso, WRidge, and WElastic Net). This weighting method gives less importance to features that discriminate more between training and testing sets. The effectiveness of this method is evaluated across six datasets, demonstrating consistent improvements in terms of the normalized root mean square error. Importantly, the model's implementation is facilitated using the glmnet library, which supports straightforward integration for weighting β coefficients.

Keywords: Elastic Net regression; Lasso regression; Ridge regression; genomic selection; mismatch; weighted regression.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Normalized root mean square error (NRMSE) for the “Maize_1” dataset. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge) and the 2 traits (GDD_ASI and GDD_DTT).

**Figure 2**
Normalized root mean square error (NRMSE) for the “Maize 3” dataset. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge) and the 2 traits (GDD_ASI and GDD_DTT).

**Figure 3**
Normalized root mean square error (NRMSE) for the “Soybean 1” dataset. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge) and the 2 traits (Height and R8).

**Figure 4**
Normalized root mean square error (NRMSE) for the “Soybean 2” dataset. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge) and the 2 traits (Height and R8).

**Figure 5**
Normalized root mean square error (NRMSE) for all datasets. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge).

**Figure A1**
Dataset Maize 2. Normalized root mean square error (NRMSE) for the Maize_2 dataset. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge) and the 2 traits (GDD_ASI and GDD_DTT).

**Figure A2**
Dataset Maize 4. Normalized root mean square error (NRMSE) for the Maize_4 dataset. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge) and the 2 traits (GDD_ASI and GDD_DTT).

**Figure A3**
Dataset Soybean 3. Normalized root mean square error (NRMSE) for the Soybean_3 dataset. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge) and the 2 traits (Height and R8).

**Figure A4**
Dataset Soybean 4. Normalized root mean square error (NRMSE) for the Soybean_4 dataset. A comparison is presented for the 6 evaluated models (Enet, Lasso, Ridge, WEnet, WLasso, and WRidge) and the 2 traits (Height and R8).

See this image and copyright information in PMC

References

1. Montesinos López O.A., Montesinos-López A., Crossa J. Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer; Cham, Switzerland: 2022. Multivariate statistical machine learning methods for genomic prediction. - PubMed
1. Heffner E.L., Sorrells M.E., Jannink J.L. Genomic selection for crop improvement. Crop Sci. 2009;49:1–12. doi: 10.2135/cropsci2008.08.0512. - DOI
1. Montesinos-López A., Montesinos-López O.A., Crossa J., Toledo F.H., Pérez-Hernández O., Eskridge K.M., Rutkoski J. A Genomic Bayesian Multi-trait and Multi-environment Model. G3 Genes Genomes Genet. 2016;6:2725–2744. doi: 10.1534/g3.116.032359. - DOI - PMC - PubMed
1. Crossa J., Beyene Y., Kassa S., Pérez P., Hickey J.M., Chen C., de los Campos G., Burgueño J., Windhausen V.S., Buckler E., et al. Genomic prediction in maize breeding populations with genotyping-by-sequencing. G3 Genes Genomes Genet. 2013;3:1903–1926. doi: 10.1534/g3.113.008227. - DOI - PMC - PubMed
1. Spindel J., Begum H., Akdemir D., Virk P., Collard B., Redoña E., Atlin G., Jannink J.-L., McCouch S.R. Genomic selection and association mapping in rice (Oryza sativa): Effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet. 2015;11:e1004982. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

[INV-003439, BMGF/FCDO, Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods (AG2MW)]./Bill and Melinda Gates Foundation

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets

Affiliations

A Penalized Regression Method for Genomic Prediction Reduces Mismatch between Training and Testing Sets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources