. 2008 Jul 17;1(1):3.

doi: 10.1186/1756-0381-1-3.

Neural networks for genetic epidemiology: past, present, and future

Alison A Motsinger-Reif¹, Marylyn D Ritchie^#²

Affiliations

¹ Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC, USA.
² Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA.

^# Contributed equally.

PMID: 18822147
PMCID: PMC2553772
DOI: 10.1186/1756-0381-1-3

Neural networks for genetic epidemiology: past, present, and future

Alison A Motsinger-Reif et al. BioData Min. 2008.

. 2008 Jul 17;1(1):3.

doi: 10.1186/1756-0381-1-3.

Authors

Alison A Motsinger-Reif¹, Marylyn D Ritchie^#²

Affiliations

¹ Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC, USA.
² Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA.

^# Contributed equally.

PMID: 18822147
PMCID: PMC2553772
DOI: 10.1186/1756-0381-1-3

Abstract

During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes.In the current review, we consider how NN have been used for both linkage and association analyses in genetic epidemiology. We discuss both the successes of these initial NN applications, and the questions that arose during the previous studies. Finally, we introduce evolutionary computing strategies, Genetic Programming Neural Networks (GPNN) and Grammatical Evolution Neural Networks (GENN), for using NN in association studies of complex human diseases that address some of the caveats illuminated by previous work.

PubMed Disclaimer

Figures

**Figure 1**
**A Typical Feed-Forward NN**. A feed-forward neural network with one input layer consisting of eight nodes (X_i), two hidden layers with four and two nodes respectively (Σ), and one output layer (O). The connections between layers have associated connection strengths or weights (a_i).

**Figure 2**
**Overview of the GPNN method (adapted from Ritchie et al. 2003)**. First, GPNN has a set of parameters to be initialized before beginning the evolution of NN models. Second, the data are divided into 10 equal parts for 10-fold cross-validation. Third, training begins by generating an initial population of random solutions. Fourth, each NN is evaluated on the training set and its fitness (classification error) recorded. Fifth, the best solutions are selected for crossover and reproduction using a fitness-proportionate selection technique. The new generation begins the cycle again. This continues until a stopping criterion (classification error of zero or limit on the number of generations) is met. At the end of the GPNN evolution, the overall best solution is selected as the optimal NN. Sixth, this best GPNN model is tested on the 1/10 of the data left out to estimate the prediction error of the model. Steps two through six are performed ten times with the same parameters settings, each time using a different 9/10 of the data for training and 1/10 of the data for testing. The loci that are consistently present in the GPNN models are selected as the functional loci and are used as input to a final GPNN evolutionary process to estimate the classification and prediction error of the GPNN model.

**Figure 3**
**A binary expression tree representation of a NN**. This is an example of one NN optimized by GPNN. The O is the output node, Σ indicates the activation function, a_iindicates a weight, and X₁-X₈are the NN inputs. The C nodes are constants.

See this image and copyright information in PMC

Cited by

Optimization of nonlinear dose- and concentration-response models utilizing evolutionary computation.
Beam AL, Motsinger-Reif AA. Beam AL, et al. Dose Response. 2011;9(3):387-409. doi: 10.2203/dose-response.09-030.Beam. Epub 2010 Jun 25. Dose Response. 2011. PMID: 22013401 Free PMC article.
Simple Scoring System and Artificial Neural Network for Knee Osteoarthritis Risk Prediction: A Cross-Sectional Study.
Yoo TK, Kim DW, Choi SB, Oh E, Park JS. Yoo TK, et al. PLoS One. 2016 Feb 9;11(2):e0148724. doi: 10.1371/journal.pone.0148724. eCollection 2016. PLoS One. 2016. PMID: 26859664 Free PMC article.
An investigation of gene-gene interactions in dose-response studies with Bayesian nonparametrics.
Beam AL, Motsinger-Reif AA, Doyle J. Beam AL, et al. BioData Min. 2015 Feb 6;8:6. doi: 10.1186/s13040-015-0039-3. eCollection 2015. BioData Min. 2015. PMID: 25691918 Free PMC article.
A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology.
Koo CL, Liew MJ, Mohamad MS, Salleh AH. Koo CL, et al. Biomed Res Int. 2013;2013:432375. doi: 10.1155/2013/432375. Epub 2013 Oct 21. Biomed Res Int. 2013. PMID: 24228248 Free PMC article.
Identification of Clinically Relevant HIV Vif Protein Motif Mutations through Machine Learning and Undersampling.
Altamirano-Flores JS, Alvarado-Hernández LÁ, Cuevas-Tello JC, Tino P, Guerra-Palomares SE, Garcia-Sepulveda CA. Altamirano-Flores JS, et al. Cells. 2023 Feb 28;12(5):772. doi: 10.3390/cells12050772. Cells. 2023. PMID: 36899908 Free PMC article.

See all "Cited by" articles

References

1. Sing CF, Stengard JH, Kardia SL. Genes, environment, and cardiovascular disease. Arterioscler Thromb Vasc Biol. 2003;23:1190–1196. - PubMed
1. Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered. 2003;56:73–82. - PubMed
1. Ming JE, Muenke M. Multiple hits during early embryonic development: digenic diseases and holoprosencephaly. Am J Hum Genet. 2002;71:1017–1032. - PMC - PubMed
1. Lucek PR, Ott J. Neural network analysis of complex traits. Genet Epidemiol. 1997;14:1101–1106. - PubMed
1. Daly MJ, Altshuler D. Partners in crime. Nat Genet. 2005;37:337–338. - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Neural networks for genetic epidemiology: past, present, and future

Affiliations

Neural networks for genetic epidemiology: past, present, and future

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous