Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec;46(4):248-55.
doi: 10.1016/j.ymeth.2008.10.002. Epub 2008 Oct 16.

Identification of mitochondrial disease genes through integrative analysis of multiple datasets

Affiliations

Identification of mitochondrial disease genes through integrative analysis of multiple datasets

Raeka S Aiyar et al. Methods. 2008 Dec.

Abstract

Determining the genetic factors in a disease is crucial to elucidating its molecular basis. This task is challenging due to a lack of information on gene function. The integration of large-scale functional genomics data has proven to be an effective strategy to prioritize candidate disease genes. Mitochondrial disorders are a prevalent and heterogeneous class of diseases that are particularly amenable to this approach. Here we explain the application of integrative approaches to the identification of mitochondrial disease genes. We first examine various datasets that can be used to evaluate the involvement of each gene in mitochondrial function. The data integration methodology is then described, accompanied by examples of common implementations. Finally, we discuss how gene networks are constructed using integrative techniques and applied to candidate gene prioritization. Relevant public data resources are indicated. This report highlights the success and potential of data integration as well as its applicability to the search for mitochondrial disease genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data integration procedure for prioritization of mitochondrial disease candidate genes. Input datasets informative about mitochondrial function (left) and reference sets (center, below) of known mitochondrial (green spots) and non-mitochondrial genes (black spots) are collected. The reference sets are used to train the data integration method to combine the input datasets to calculate a score for each gene reflecting the probability that it is involved in mitochondrial function. The ranked genes are then cross-referenced with positional candidates in a disease locus (right, highlighted in yellow), providing a basis for prioritization.
Figure 2
Figure 2
Comparing integration methods by sensitivity and specificity. Performance of three integration methods in predicting yeast mitochondrial proteins are shown by continuous curves obtained by assigning a threshold to the score ranging from its most stringent value (bottom-right corner) to its least stringent value (top-left corner). At a given threshold, sensitivity is calculated as the fraction of the reference set covered by the predicted set, and specificity is calculated as the fraction of the predicted set confirmed by the reference set. The two machine-learning based methods (linear predictor [21] and MitoP2 SVM [16]) outperform the original heuristic method (MitoP2 2004 [8]). The SVM outperforms the linear predictor only in the range of 45–65% specificity. The 24 input datasets (black dots) are A: Neurospora ortholog with mitochondrial localization, B: Huh et al., 2003 [32] (mitochondrial localization), C: Kumar et al., 2002 [33] (mitochondrial localization), D: Sickmann et al., 2003 [50] (proteomics), E: Steinmetz et al., 2002 [24] (deletion phenotype), F: Prokisch et al, 2004 [8] (proteomics), G: Lascaris et al., 2003 [30] (Hap4-induced genes), H: von Mering et al., 2002 [54] (medium confidence interaction with mitochondrial protein), I: Dimmer et al., 2002 [35] (petite phenotype), J: human ortholog with mitochondrial localization, K: R. prowazekii ortholog, L: Ohlmeier et al., 2004 [87] (proteomics), M: Bayesian prediction [88], N: Predotar [46] (signal peptide, score >50), O: Pflieger et al., 2002 [89] (proteomics), P: von Mering et al., 2002 [54] (high confidence interaction with mitochondrial protein), Q: MitoProt [90] (import prediction, score >0.80), R: Prokisch et al. [16] (Neurospora proteomics), S: PSORT [44] (signal peptide), T: Prokisch et al., 2004 [8] (>1.2-fold differential expression, glucose versus lactate), U: Marc et al., 2002 [48] (mitochondrion-bound polysomes, MLR>80), V: von Mering et al., 2002 [54] (low confidence interaction with mitochondrial protein), W: deRisi et al., 1997 [29] (>2-fold increase in diauxic shift when OD600 = 7.3), X: E. cuniculi ortholog (negative predictor).
Figure 3
Figure 3
Figure 3a Symptom matching in gene networks to identify disease candidates. Modules containing genes in a hypothetical disease locus for a neuromuscular dystrophy with ataxia are shown: green nodes represent genes implicated in diseases causing ataxia, blue represents genes implicated in diseases with different symptoms, and gray represents genes not associated with disease. One of the positional candidate genes (outlined in red) shares a module with genes implicated in diseases causing ataxia. This candidate would therefore be prioritized relative to the others in the locus. Figure 3b Predicting candidate gene combinations for multigenic diseases using networks. Three genes, one from each hypothetical disease locus (highlighted in yellow), are network neighbours and therefore functionally related: these comprise the combination most likely to be responsible for the disease. This approach reduces the number of gene combinations that must be screened for mutations, given the size of typical linkage intervals.
Figure 3
Figure 3
Figure 3a Symptom matching in gene networks to identify disease candidates. Modules containing genes in a hypothetical disease locus for a neuromuscular dystrophy with ataxia are shown: green nodes represent genes implicated in diseases causing ataxia, blue represents genes implicated in diseases with different symptoms, and gray represents genes not associated with disease. One of the positional candidate genes (outlined in red) shares a module with genes implicated in diseases causing ataxia. This candidate would therefore be prioritized relative to the others in the locus. Figure 3b Predicting candidate gene combinations for multigenic diseases using networks. Three genes, one from each hypothetical disease locus (highlighted in yellow), are network neighbours and therefore functionally related: these comprise the combination most likely to be responsible for the disease. This approach reduces the number of gene combinations that must be screened for mutations, given the size of typical linkage intervals.

References

    1. Online Mendelian Inheritance in Man, OMIM (TM) McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information. Bethesda, MD: National Library of Medicine; [October 2, 2008]. World Wide Web URL www.ncbi.nlm.nih.gov/omim.
    1. Botstein D, Risch N. Nat Genet. 2003;33 Suppl:228–237. - PubMed
    1. Lander ES, Botstein D. Genetics. 1989;121:185–199. - PMC - PubMed
    1. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Nucleic Acids Res. 2005;33(Database Issue) - PMC - PubMed
    1. Dudbridge F, Gusnanto A, Koeleman BP. Hum Genomics. 2006;2:310–317. - PMC - PubMed