Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 15;12(1):13823.
doi: 10.1038/s41598-022-16075-9.

Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions

Affiliations

Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions

Nicholas Dominic et al. Sci Rep. .

Abstract

As the fourth most populous country in the world, Indonesia must increase the annual rice production rate to achieve national food security by 2050. One possible solution comes from the nanoscopic level: a genetic variant called Single Nucleotide Polymorphism (SNP), which can express significant yield-associated genes. The prior benchmark of this study utilized a statistical genetics model where no SNP position information and attention mechanism were involved. Hence, we developed a novel deep polygenic neural network, named the NucleoNet model, to address these obstacles. The NucleoNets were constructed with the combination of prominent components that include positional SNP encoding, the context vector, wide models, Elastic Net, and Shannon's entropy loss. This polygenic modeling obtained up to 2.779 of Mean Squared Error (MSE) with 47.156% of Symmetric Mean Absolute Percentage Error (SMAPE), while revealing 15 new important SNPs. Furthermore, the NucleoNets reduced the MSE score up to 32.28% compared to the Ordinary Least Squares (OLS) model. Through the ablation study, we learned that the combination of Xavier distribution for weights initialization and Normal distribution for biases initialization sparked more various important SNPs throughout 12 chromosomes. Our findings confirmed that the NucleoNet model was successfully outperformed the OLS model and identified important SNPs to Indonesian rice yields.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Number of SNPs for each chromosome.
Figure 2
Figure 2
Data preprocessing step.
Figure 3
Figure 3
The NucleoNet model.
Figure 4
Figure 4
Ablation study results testing for one random sample.
Figure 5
Figure 5
The NucleoNets training plots.
Figure 6
Figure 6
NucleoNetV3 testing results under different seeds.
Figure 7
Figure 7
Important SNPs emitted per attention score.

Similar articles

Cited by

References

    1. Lee S, Lozano A, Kambadur P, Xing EP. An efficient nonlinear regression approach for genome-wide detection of marginal and interacting genetic variations. J. Comput. Biol. 2016;23:372–389. doi: 10.1089/cmb.2015.0202. - DOI - PMC - PubMed
    1. Banerjee S, Zeng L, Schunkert H, Söding J. Bayesian multiple logistic regression for case-control GWAS. PLoS Genet. 2018;14:1–27. doi: 10.1371/journal.pgen.1007856. - DOI - PMC - PubMed
    1. Yoo YJ, Sun L, Bull SB. Gene-based multiple regression association testing for combined examination of common and low frequency variants in quantitative trait analysis. Front. Genet. 2013;4:1–17. doi: 10.3389/fgene.2013.00233. - DOI - PMC - PubMed
    1. Yoo YJ, Sun L, Poirier JG, Paterson AD, Bull SB. Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure. Genet. Epidemiol. 2017;41:108–121. doi: 10.1002/gepi.22024. - DOI - PMC - PubMed
    1. Li X, et al. Genetic control of the root system in rice under normal and drought stress conditions by genome-wide association study. PLoS Genet. 2017;13:1–24. - PMC - PubMed