Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2011 Sep;27(9):377-86.
doi: 10.1016/j.tig.2011.06.004. Epub 2011 Jul 20.

Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations

Affiliations
Review

Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations

Sudhir Kumar et al. Trends Genet. 2011 Sep.

Abstract

Modern technologies have made the sequencing of personal genomes routine. They have revealed thousands of nonsynonymous (amino acid altering) single nucleotide variants (nSNVs) of protein-coding DNA per genome. What do these variants foretell about an individual's predisposition to diseases? The experimental technologies required to carry out such evaluations at a genomic scale are not yet available. Fortunately, the process of natural selection has lent us an almost infinite set of tests in nature. During long-term evolution, new mutations and existing variations have been evaluated for their biological consequences in countless species, and outcomes are readily revealed by multispecies genome comparisons. We review studies that have investigated evolutionary characteristics and in silico functional diagnoses of nSNVs found in thousands of disease-associated genes. We conclude that the patterns of long-term evolutionary conservation and permissible sequence divergence are essential and instructive modalities for functional assessment of human genetic variations.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Profiles of personal and population variations. (a) Counts of various types of genetic variants profiled by 23andMe using the Illumina HumanOmniExpress BeadChip. 733,202 SNP identifiers (rsIDs) were retrieved from the Illumina website and mapped to the dbSNP database. Cross-referenced by rsIDs, disease-related variants were determined by using data from HGMD [12] and VARIMED [96] datasets. (b) The numbers of different types of variants found per human genome [97]. (c) The numbers of known non-synonymous single nucleotide variants (nSNVs) in the human nuclear and mitochondrial genomes that are associated with Mendelian diseases, complex diseases, and somatic cancers. Compared to complex diseases and somatic cancers, nSNVs related to Mendelian diseases account for the most variants discovered to date. Data were retrieved from HGMD [12], VARIMED [96], COSMIC [98], MITOMAP [41], and HapMap3 [99] resources. (d) The number of nSNVs in each gene related to Mendelian diseases. The majority of genes have only one or a few mutations, while there are some genes hosting hundreds or even more than 1000 mutations. Data were retrieved from HGMD. The numbers of variants in panels {a–c} are in log10 scale. Information for disease associated variants is shown in red and the personal and population variations are shown in blue.
Figure 2
Figure 2
Novel SNV discovery with genome and exome sequencing. (a) The number of novel SNVs discovered by sequencing one and more genomes [97]. With increasing numbers of genomes sequenced, the number of novel SNVs decreases (bars), whereas the cumulative count of SNVs increases (filled circles). (b) The number of nSNVs discovered by sequencing one or more exomes [14]. With more exomes sequenced, the number of novel SNVs discovered decreases (bars) and the cumulative count of nSNVs increases (filled circles). Panels a and b are redrawn with permission from [97] and [14], respectively.
Figure 3
Figure 3
Evolutionary properties of positions afflicted with disease-associated nonsynonymous single nucleotide variants (nSNVs). (a) The observed and expected numbers of disease associated nSNVs in positions that have evolved with different evolutionary rates in the CFTR protein [29]. The disease associated nSNVs are enriched in positions evolving with the lowest rates, which belong to the rate category 0. (b) The ratio of observed to expected numbers of nSNVs in different rate categories for all CFTR variants (solid pattern; 431 variants) and those reported in publications profiling one or more families (hatched pattern; 59 variants). Data and publications were obtained from HGMD for all variants with deposition date until year 2000. This comparison shows that the initial practice of the use of all available variants, including those reported by clinicians from individual patients (>80% of the variants), did not bias the observed trends. (c) The proteome-scale relationship of the observed/expected ratios of Mendelian disease-associated nSNVs in positions that have evolved with different evolutionary rates. The results are from an analysis of disease associated nSNVs from 2,717 genes (public release of HGMD). Just as for individual diseases, nSNVs are enriched in positions evolving with the lowest rates. Panel a is redrawn with permission from ref. [29].
Figure 4
Figure 4
The enrichment of disease-associated nSNVs (red) and the deficit of population polymorphisms (blue) in human amino acid positions (a) evolving with different rates and (b) with differ degrees of insertion-deletions [35]. In both cases, smaller numbers on the x axis correspond to more conserved positions. There is an enrichment of disease associated nSNVs and a deficit of population nSNPs in conserved positions. This trend is reversed for the fastest evolving positions. (c) The cumulative distributions of the evolutionary conservation scores for nSNVs associated with Mendelian diseases (solid red line), complex diseases (open red circles), and population polymorphisms (green line). The shift towards the left in Mendelian nSNVs indicates higher position specific evolutionary conservation. Conversely, a shift towards the right in complex disease nSNVs indicates lower evolutionary conservation, which overlaps with normal variations observed in the population. Data for the neutral model (black line) was generated by simulation [37]. Panels a and c are reproduced with permission from refs. [35], [35], and [37], respectively.
Figure 5
Figure 5
Some applications of evolutionary in silico tools in diagnosing pathogenic variants. (a) The comparison of PolyPhen [100] and SIFT [69] predictions for 7,534 high-quality variants present within the Venter genome [2]. The numbers of variants diagnosed as probably damaging (PolyPhen) and intolerant (SIFT) are shown. The in silico diagnosis of personal variants by different tools produces highly discordant results. (b) ROC (receiver operating characteristic [101]) curves produced by PolyPhen-2 (pph2), SIFT, MAPP, Mutation Assessor (masses), Log R Pfam E-value (logre), and Condel (WAS). Condel used a weighted average of the normalized scores of the other five methods and outperformed each of them [85]. The ROC curve for Condel rises much more quickly, which means that it has a much greater rate of diagnosing damaging variants (true positives) at the expense of much smaller rate of incorrect diagnosis (false positives). (c) The relationship of the accuracy of the PolyPhen prediction for disease-associated nSNVs at positions evolving with different long-term rates (0–5 are categories of slowest to fastest-evolving sites) [57]. This shows that the accuracy of the PolyPhen prediction is the highest for the most slow-evolving positions for disease-associated nSNVs. Panels (a–c) are redrawn with permission from [2], [57], and [85], respectively.
Textbox 1 Figure I
Textbox 1 Figure I
Disease-associated genetic variants identified in patients with Miller syndrome.

Similar articles

Cited by

References

    1. Li Y, Agarwal P. A Pathway-Based View of Human Diseases and Disease Relationships. PLoS ONE. 2009;4:e4346. - PMC - PubMed
    1. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19:1553–1561. - PMC - PubMed
    1. Hindorff LA, et al. A Catalog of Published Genome-Wide Association Studies. 2011 www.genome.gov/gwastudies.
    1. Hindorff LA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106:9362–9367. - PMC - PubMed
    1. McCarthy MI, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–369. - PubMed

Publication types