Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 3:13:883280.
doi: 10.3389/fpls.2022.883280. eCollection 2022.

Exploring Machine Learning Algorithms to Unveil Genomic Regions Associated With Resistance to Southern Root-Knot Nematode in Soybeans

Affiliations

Exploring Machine Learning Algorithms to Unveil Genomic Regions Associated With Resistance to Southern Root-Knot Nematode in Soybeans

Caio Canella Vieira et al. Front Plant Sci. .

Abstract

Southern root-knot nematode [SRKN, Meloidogyne incognita (Kofold & White) Chitwood] is a plant-parasitic nematode challenging to control due to its short life cycle, a wide range of hosts, and limited management options, of which genetic resistance is the main option to efficiently control the damage caused by SRKN. To date, a major quantitative trait locus (QTL) mapped on chromosome (Chr.) 10 plays an essential role in resistance to SRKN in soybean varieties. The confidence of discovered trait-loci associations by traditional methods is often limited by the assumptions of individual single nucleotide polymorphisms (SNPs) always acting independently as well as the phenotype following a Gaussian distribution. Therefore, the objective of this study was to conduct machine learning (ML)-based genome-wide association studies (GWAS) utilizing Random Forest (RF) and Support Vector Machine (SVM) algorithms to unveil novel regions of the soybean genome associated with resistance to SRKN. A total of 717 breeding lines derived from 330 unique bi-parental populations were genotyped with the Illumina Infinium BARCSoySNP6K BeadChip and phenotyped for SRKN resistance in a greenhouse. A GWAS pipeline involving a supervised feature dimension reduction based on Variable Importance in Projection (VIP) and SNP detection based on classification accuracy was proposed. Minor effect SNPs were detected by the proposed ML-GWAS methodology but not identified using Bayesian-information and linkage-disequilibrium Iteratively Nested Keyway (BLINK), Fixed and Random Model Circulating Probability Unification (FarmCPU), and Enriched Compressed Mixed Linear Model (ECMLM) models. Besides the genomic region on Chr. 10 that can explain most of SRKN resistance variance, additional minor effects SNPs were also identified on Chrs. 10 and 11. The findings in this study demonstrated that overfitting in GWAS may lead to lower prediction accuracy, and the detection of significant SNPs based on classification accuracy limited false-positive associations. The expansion of the basis of the genetic resistance to SRKN can potentially reduce the selection pressure over the major QTL on Chr. 10 and achieve higher levels of resistance.

Keywords: GWAS; feature selection; machine learning; root-knot nematode; soybean.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Variable Importance in Projection (VIP)-based Manhattan plot of the 4,974 SNPs. The SNPs with VIP scores higher than 2.0 are highlighted in blue, and the 29 non-correlated SNPs with VIP scores higher than 2.0 selected to be used in the ML-based GWAS are colored in red.
FIGURE 2
FIGURE 2
Prediction accuracy of RF models by the number of SNPs included as predictors.

Similar articles

Cited by

References

    1. Abad P., Gouzy J., Aury J. M., Castagnone-Sereno P., Danchin E. G. J., Deleury E., et al. (2008). Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita. Nat. Biotechnol. 26 909–915. 10.1038/nbt.1482 - DOI - PubMed
    1. Akarachantachote N., Chadcham S., Saithanu K. (2014). Cutoff threshold of variable importance in projection for variable selection. Int. J. Pure Appl. Math. 94 307–322. 10.12732/ijpam.v94i3.2 - DOI
    1. Allen T. W., Bradley C. A., Sisson A. J., Byamukama E., Chilvers M. I., Coker C. M., et al. (2017). Soybean yield loss estimates due to diseases in the United States and Ontario, Canada, from 2010 to 2014. Plant Health Prog. 18 19–27. 10.1094/PHP-RS-16-0066 - DOI
    1. Austin P. C., Steyerberg E. W. (2015). The number of subjects per variable required in linear regression analyses. J. Clin. Epidemiol. 68 627–636. 10.1016/j.jclinepi.2014.12.014 - DOI - PubMed
    1. Beneventi M. A., da Silva O. B., de Sá M. E. L., Firmino A. A. P., de Amorim R. M. S., Albuquerque ÉV. S., et al. (2013). Transcription profile of soybean-root-knot nematode interaction reveals a key role of phythormones in the resistance reaction. BMC Genomics 14:322. 10.1186/1471-2164-14-322 - DOI - PMC - PubMed

LinkOut - more resources