Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jul 17;1(1):3.
doi: 10.1186/1756-0381-1-3.

Neural networks for genetic epidemiology: past, present, and future

Affiliations

Neural networks for genetic epidemiology: past, present, and future

Alison A Motsinger-Reif et al. BioData Min. .

Abstract

During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes.In the current review, we consider how NN have been used for both linkage and association analyses in genetic epidemiology. We discuss both the successes of these initial NN applications, and the questions that arose during the previous studies. Finally, we introduce evolutionary computing strategies, Genetic Programming Neural Networks (GPNN) and Grammatical Evolution Neural Networks (GENN), for using NN in association studies of complex human diseases that address some of the caveats illuminated by previous work.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A Typical Feed-Forward NN. A feed-forward neural network with one input layer consisting of eight nodes (Xi), two hidden layers with four and two nodes respectively (Σ), and one output layer (O). The connections between layers have associated connection strengths or weights (ai).
Figure 2
Figure 2
Overview of the GPNN method (adapted from Ritchie et al. 2003). First, GPNN has a set of parameters to be initialized before beginning the evolution of NN models. Second, the data are divided into 10 equal parts for 10-fold cross-validation. Third, training begins by generating an initial population of random solutions. Fourth, each NN is evaluated on the training set and its fitness (classification error) recorded. Fifth, the best solutions are selected for crossover and reproduction using a fitness-proportionate selection technique. The new generation begins the cycle again. This continues until a stopping criterion (classification error of zero or limit on the number of generations) is met. At the end of the GPNN evolution, the overall best solution is selected as the optimal NN. Sixth, this best GPNN model is tested on the 1/10 of the data left out to estimate the prediction error of the model. Steps two through six are performed ten times with the same parameters settings, each time using a different 9/10 of the data for training and 1/10 of the data for testing. The loci that are consistently present in the GPNN models are selected as the functional loci and are used as input to a final GPNN evolutionary process to estimate the classification and prediction error of the GPNN model.
Figure 3
Figure 3
A binary expression tree representation of a NN. This is an example of one NN optimized by GPNN. The O is the output node, Σ indicates the activation function, ai indicates a weight, and X1-X8 are the NN inputs. The C nodes are constants.

Similar articles

Cited by

References

    1. Sing CF, Stengard JH, Kardia SL. Genes, environment, and cardiovascular disease. Arterioscler Thromb Vasc Biol. 2003;23:1190–1196. - PubMed
    1. Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered. 2003;56:73–82. - PubMed
    1. Ming JE, Muenke M. Multiple hits during early embryonic development: digenic diseases and holoprosencephaly. Am J Hum Genet. 2002;71:1017–1032. - PMC - PubMed
    1. Lucek PR, Ott J. Neural network analysis of complex traits. Genet Epidemiol. 1997;14:1101–1106. - PubMed
    1. Daly MJ, Altshuler D. Partners in crime. Nat Genet. 2005;37:337–338. - PubMed

LinkOut - more resources