Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Jan 21;100(2):605-10.
doi: 10.1073/pnas.242716699. Epub 2003 Jan 14.

Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics

Affiliations

Identification of a gene causing human cytochrome c oxidase deficiency by integrative genomics

Vamsi K Mootha et al. Proc Natl Acad Sci U S A. .

Abstract

Identifying the genes responsible for human diseases requires combining information about gene position with clues about biological function. The recent availability of whole-genome data sets of RNA and protein expression provides powerful new sources of functional insight. Here we illustrate how such data sets can expedite disease-gene discovery, by using them to identify the gene causing Leigh syndrome, French-Canadian type (LSFC, Online Mendelian Inheritance in Man no. 220111), a human cytochrome c oxidase deficiency that maps to chromosome 2p16-21. Using four public RNA expression data sets, we assigned to all human genes a "score" reflecting their similarity in RNA-expression profiles to known mitochondrial genes. Using a large survey of organellar proteomics, we similarly classified human genes according to the likelihood of their protein product being associated with the mitochondrion. By intersecting this information with the relevant genomic region, we identified a single clear candidate gene, LRPPRC. Resequencing identified two mutations on two independent haplotypes, providing definitive genetic proof that LRPPRC indeed causes LSFC. LRPPRC encodes an mRNA-binding protein likely involved with mtDNA transcript processing, suggesting an additional mechanism of mitochondrial pathophysiology. Similar strategies to integrate diverse genomic information can be applied likewise to other disease pathways and will become increasingly powerful with the growing wealth of diverse, functional genomics data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
DNA, mRNA, and protein data sets that are used in this study.
Figure 2
Figure 2
Physical map of the LSFC candidate region (Human Genome, August 2001, chr2:46994838–48992238). Microsatellite markers and genetic distances are shown to the left of the chromosome map. Genes with varying levels of annotation support are shown with different colors (RefSeq gene, blue; Ensembl gene, green; human mRNA, orange). An additional 15 computationally predicted genes lie within this region but are not shown. Genes represented in mRNA expression sets are indicated with a check to the right of the gene names.
Figure 3
Figure 3
Evaluating mRNA expression neighborhoods for enrichment in mitochondrial genes. (A) Schematic illustration of the mitochondria neighborhood index. The coordinate of each gene (circle) is defined by its expression vector in an mRNA microarray experiment. Genes are close to one another if they have similar expression profiles (based on an appropriate distance metric, see Methods). The mitochondria neighborhood index, NR(G), is defined as the number of known mitochondrial genes (orange circles) among the R nearest neighbors of the query gene, G (blue circle). In this cartoon, N10 = 5 because there are five mitochondrial genes within the query's 10 nearest-neighboring genes. (B) Distribution of N100 values. The blue histogram shows the distribution of N100 for all genes, and the red histogram plots N100 for known mitochondrial genes, in expression set 4. *, the histogram bin containing LRPPRC (see text and Table 1).
Figure 4
Figure 4
Organelle proteomics. (A) Western blot of human HepG2 homogenate (H), crude mitochondrial fraction (M), and Percoll-purified mitochondria (P) probed with an antibody against cytochrome c, a marker for mitochondria, and calreticulin, a marker for contamination by endoplasmic reticulum. Two different loading volumes (5 and 10 μl) were used for each sample. (B) Representative tandem mass spectrum showing y-ion and b-ion series along with the deduced peptide sequence. (C) The predicted LRPPRC (GenBank accession no. XP_031527.3) amino acid sequence with high-scoring peptides, identified by organelle proteomics, marked in red.
Figure 5
Figure 5
Mutations identified in LRPPRC. LRPPRC has 38 exons (blue) predicted to encode a 1,394-aa protein. The amino acid sequence corresponding to exons 9 and 35 are shown as well as the aligned sequences from mouse, rat, and Fugu. The exon 9 missense mutation, A354V, and the exon 35 truncation, C1277STOP, are shown in red. Conserved residues are shaded in gray. *, a stop codon.

Similar articles

Cited by

References

    1. Lander E S, Linton L M, Birren B, Nussbaum C, Zody M C, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Nature. 2001;409:860–921. - PubMed
    1. Shoubridge E A. Am J Med Genet. 2001;106:46–52. - PubMed
    1. Robinson B H. Pediatr Res. 2000;48:581–585. - PubMed
    1. Morin C, Mitchell G, Larochelle J, Lambert M, Ogier H, Robinson B H, De Braekeleer M. Am J Hum Genet. 1993;53:488–496. - PMC - PubMed
    1. Merante F, Petrova-Benedict R, MacKay N, Mitchell G, Lambert M, Morin C, De Braekeleer M, Laframboise R, Gagne R, Robinson B H. Am J Hum Genet. 1993;53:481–487. - PMC - PubMed

Publication types