Identification of mitochondrial disease genes through integrative analysis of multiple datasets

Raeka S Aiyar¹, Julien Gagneur, Lars M Steinmetz

Affiliations

PMID: 18930150
PMCID: PMC2774125
DOI: 10.1016/j.ymeth.2008.10.002

Identification of mitochondrial disease genes through integrative analysis of multiple datasets

Raeka S Aiyar et al. Methods. 2008 Dec.

. 2008 Dec;46(4):248-55.

doi: 10.1016/j.ymeth.2008.10.002. Epub 2008 Oct 16.

Authors

Raeka S Aiyar¹, Julien Gagneur, Lars M Steinmetz

Affiliation

¹ European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.

PMID: 18930150
PMCID: PMC2774125
DOI: 10.1016/j.ymeth.2008.10.002

Abstract

Determining the genetic factors in a disease is crucial to elucidating its molecular basis. This task is challenging due to a lack of information on gene function. The integration of large-scale functional genomics data has proven to be an effective strategy to prioritize candidate disease genes. Mitochondrial disorders are a prevalent and heterogeneous class of diseases that are particularly amenable to this approach. Here we explain the application of integrative approaches to the identification of mitochondrial disease genes. We first examine various datasets that can be used to evaluate the involvement of each gene in mitochondrial function. The data integration methodology is then described, accompanied by examples of common implementations. Finally, we discuss how gene networks are constructed using integrative techniques and applied to candidate gene prioritization. Relevant public data resources are indicated. This report highlights the success and potential of data integration as well as its applicability to the search for mitochondrial disease genes.

PubMed Disclaimer

Figures

**Figure 1**
Data integration procedure for prioritization of mitochondrial disease candidate genes. Input datasets informative about mitochondrial function (left) and reference sets (center, below) of known mitochondrial (green spots) and non-mitochondrial genes (black spots) are collected. The reference sets are used to train the data integration method to combine the input datasets to calculate a score for each gene reflecting the probability that it is involved in mitochondrial function. The ranked genes are then cross-referenced with positional candidates in a disease locus (right, highlighted in yellow), providing a basis for prioritization.

**Figure 2**
Comparing integration methods by sensitivity and specificity. Performance of three integration methods in predicting yeast mitochondrial proteins are shown by continuous curves obtained by assigning a threshold to the score ranging from its most stringent value (bottom-right corner) to its least stringent value (top-left corner). At a given threshold, sensitivity is calculated as the fraction of the reference set covered by the predicted set, and specificity is calculated as the fraction of the predicted set confirmed by the reference set. The two machine-learning based methods (linear predictor [21] and MitoP2 SVM [16]) outperform the original heuristic method (MitoP2 2004 [8]). The SVM outperforms the linear predictor only in the range of 45–65% specificity. The 24 input datasets (black dots) are A: *Neurospora* ortholog with mitochondrial localization, B: Huh *et al.*, 2003 [32] (mitochondrial localization), C: Kumar *et al.*, 2002 [33] (mitochondrial localization), D: Sickmann *et al.*, 2003 [50] (proteomics), E: Steinmetz *et al.*, 2002 [24] (deletion phenotype), F: Prokisch et al, 2004 [8] (proteomics), G: Lascaris *et al.*, 2003 [30] (Hap4-induced genes), H: von Mering *et al.*, 2002 [54] (medium confidence interaction with mitochondrial protein), I: Dimmer *et al.*, 2002 [35] (petite phenotype), J: human ortholog with mitochondrial localization, K: *R. prowazekii* ortholog, L: Ohlmeier *et al.*, 2004 [87] (proteomics), M: Bayesian prediction [88], N: Predotar [46] (signal peptide, score >50), O: Pflieger *et al.*, 2002 [89] (proteomics), P: von Mering *et al.*, 2002 [54] (high confidence interaction with mitochondrial protein), Q: MitoProt [90] (import prediction, score >0.80), R: Prokisch *et al.* [16] (*Neurospora* proteomics), S: PSORT [44] (signal peptide), T: Prokisch *et al.*, 2004 [8] (>1.2-fold differential expression, glucose versus lactate), U: Marc *et al.*, 2002 [48] (mitochondrion-bound polysomes, MLR>80), V: von Mering *et al.*, 2002 [54] (low confidence interaction with mitochondrial protein), W: deRisi *et al.*, 1997 [29] (>2-fold increase in diauxic shift when OD600 = 7.3), X: *E. cuniculi* ortholog (negative predictor).

**Figure 3**
Figure 3a Symptom matching in gene networks to identify disease candidates. Modules containing genes in a hypothetical disease locus for a neuromuscular dystrophy with ataxia are shown: green nodes represent genes implicated in diseases causing ataxia, blue represents genes implicated in diseases with different symptoms, and gray represents genes not associated with disease. One of the positional candidate genes (outlined in red) shares a module with genes implicated in diseases causing ataxia. This candidate would therefore be prioritized relative to the others in the locus. Figure 3b Predicting candidate gene combinations for multigenic diseases using networks. Three genes, one from each hypothetical disease locus (highlighted in yellow), are network neighbours and therefore functionally related: these comprise the combination most likely to be responsible for the disease. This approach reduces the number of gene combinations that must be screened for mutations, given the size of typical linkage intervals.

See this image and copyright information in PMC

References

1. Online Mendelian Inheritance in Man, OMIM (TM) McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information. Bethesda, MD: National Library of Medicine; [October 2, 2008]. World Wide Web URL www.ncbi.nlm.nih.gov/omim.
1. Botstein D, Risch N. Nat Genet. 2003;33 Suppl:228–237. - PubMed
1. Lander ES, Botstein D. Genetics. 1989;121:185–199. - PMC - PubMed
1. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Nucleic Acids Res. 2005;33(Database Issue) - PMC - PubMed
1. Dudbridge F, Gusnanto A, Koeleman BP. Hum Genomics. 2006;2:310–317. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

P01 HG000205/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of mitochondrial disease genes through integrative analysis of multiple datasets

Affiliation

Identification of mitochondrial disease genes through integrative analysis of multiple datasets

Authors

Affiliation

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical