Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Feb 22:6:13.
doi: 10.1186/1745-6150-6-13.

Nonsynonymous substitution rate (Ka) is a relatively consistent parameter for defining fast-evolving and slow-evolving protein-coding genes

Affiliations

Nonsynonymous substitution rate (Ka) is a relatively consistent parameter for defining fast-evolving and slow-evolving protein-coding genes

Dapeng Wang et al. Biol Direct. .

Abstract

Background: Mammalian genome sequence data are being acquired in large quantities and at enormous speeds. We now have a tremendous opportunity to better understand which genes are the most variable or conserved, and what their particular functions and evolutionary dynamics are, through comparative genomics.

Results: We chose human and eleven other high-coverage mammalian genome data-as well as an avian genome as an outgroup-to analyze orthologous protein-coding genes using nonsynonymous (Ka) and synonymous (Ks) substitution rates. After evaluating eight commonly-used methods of Ka and Ks calculation, we observed that these methods yielded a nearly uniform result when estimating Ka, but not Ks (or Ka/Ks). When sorting genes based on Ka, we noticed that fast-evolving and slow-evolving genes often belonged to different functional classes, with respect to species-specificity and lineage-specificity. In particular, we identified two functional classes of genes in the acquired immune system. Fast-evolving genes coded for signal-transducing proteins, such as receptors, ligands, cytokines, and CDs (cluster of differentiation, mostly surface proteins), whereas the slow-evolving genes were for function-modulating proteins, such as kinases and adaptor proteins. In addition, among slow-evolving genes that had functions related to the central nervous system, neurodegenerative disease-related pathways were enriched significantly in most mammalian species. We also confirmed that gene expression was negatively correlated with evolution rate, i.e. slow-evolving genes were expressed at higher levels than fast-evolving genes. Our results indicated that the functional specializations of the three major mammalian clades were: sensory perception and oncogenesis in primates, reproduction and hormone regulation in large mammals, and immunity and angiotensin in rodents.

Conclusion: Our study suggests that Ka calculation, which is less biased compared to Ks and Ka/Ks, can be used as a parameter to sort genes by evolution rate and can also provide a way to categorize common protein functions and define their interaction networks, either pair-wise or in defined lineages or subgroups. Evaluating gene evolution based on Ka and Ks calculations can be done with large datasets, such as mammalian genomes.

Reviewers: This article has been reviewed by Drs. Anamaria Necsulea (nominated by Nicolas Galtier), Subhajyoti De (nominated by Sarah Teichmann) and Claus O. Wilke.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Divergence index (standard deviation/mean) of Ka and Ks determined based on the eight different methods from the twelve vertebrate species. In the boxplots, lower quantile, median, and upper quantile were represented in the boxes. Mean values were depicted in dots. Outliers were removed to make the plot straightforward. The number codes for the vertebrate species are: 1, chimp; 2, orangutan; 3, macaque; 4, horse; 5, dog; 6, cow; 7, guinea pig; 8, mouse; 9, rat; 10, opossum; 11, platypus; and 12, chicken.
Figure 2
Figure 2
The percentage of shared genes of Ka, Ks and Ka/Ks based on GY compared with other seven methods in terms of cut-off (A, B), method (C, D), and species (E, F). Outliers were removed to make the plots straightforward. The number codes for the species are the same as what in Figure 1.
Figure 3
Figure 3
A network of fast-evolving and slow-evolving genes among twelve mammalian species. For any two given species, we calculated the shared number of fast-evolving or slow-evolving genes and subsequently divided them based on the total shared number of genes to normalize the correlation coefficients. We connected the species based on the largest two correlation coefficients for each pair. Red and green lines stand for fast-evolving and slow-evolving genes, respectively, and the yellow lines are the sum of both.
Figure 4
Figure 4
Expression level correlations and evolvability. S, M, and F stand for slow-evolving, intermediately-evolving, and fast-evolving genes, respectively. Expression levels were calibrated as the number of transcripts per million (TPM). Outliers were removed to make the plots straightforward.
Figure 5
Figure 5
Three-dimensional conservation grading of ISG20 (A) and RAB30 (B). Two 3-D backbone structures of ISG20 and RAB30 were retrieved from PDB code 1WLJ and 2EW1, respectively. (A) The putative conservation grading was based on the alignment of twenty mammalian protein sequences from: Human (Homo sapiens), Chimpanzee (Pan troglodytes), Orangutan (Pongo pygmaeus), Gorilla (Gorilla gorilla), Macaque (Macaca mulatta), Cow (Bos taurus), Dog (Canis familiaris), Horse (Equus caballus), Cat (Felis catus), Guinea Pig (Cavia porcellus), Mouse (Mus musculus), Rat (Rattus norvegicus), Megabat (Pteropus vampyrus), Microbat (Myotis lucifugus), Pika (Ochotona princeps), Hyrax (Procavia capensis), Tree Shrew (Tupaia belangeri), Dolphin (Tursiops truncatus), Opossum (Monodelphis domestica), Platypus (Ornithorhynchus anatinus). (B) These conservation grades were based on the aligned twenty-two mammalian protein sequences from Human (Homo sapiens), Cow (Bos taurus), Dog (Canis familiaris), Guinea Pig (Cavia porcellus), Horse (Equus caballus), Cat (Felis catus), Elephant (Loxodonta africana), Macaque (Macaca mulatta), Mouse Lemur (Microcebus murinus), Opossum (Monodelphis domestica), Mouse (Mus musculus), Microbat (Myotis lucifugus), Pika (Ochotona princeps), Platypus (Ornithorhynchus anatinus), Rabbit (Oryctolagus cuniculus), Chimpanzee (Pan troglodytes), Orangutan (Pongo pygmaeus), Hyrax (Procavia capensis), Megabat (Pteropus vampyrus), Rat (Rattus norvegicus), Tree shrew (Tupaia belangeri), Dolphin (Tursiops truncatus). The color bars from the left to the right measure changes from variable to conserved residues. Conservation grading in yellow indicates the residues whose conservation degrees were not supported with sufficient data.

Similar articles

Cited by

References

    1. Claverie JM. Fewer genes, more noncoding RNA. Science. 2005;309:1529–1530. doi: 10.1126/science.1116800. - DOI - PubMed
    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W. et al.Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Demuth JP, De Bie T, Stajich JE, Cristianini N, Hahn MW. The evolution of mammalian gene families. PLoS One. 2006;1:e85. doi: 10.1371/journal.pone.0000085. - DOI - PMC - PubMed
    1. Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R. Localizing recent adaptive evolution in the human genome. PLoS Genet. 2007;3:e90. doi: 10.1371/journal.pgen.0030090. - DOI - PMC - PubMed
    1. Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD, Civello D, Adams MD, Cargill M, Clark AG. Natural selection on protein-coding genes in the human genome. Nature. 2005;437:1153–1157. doi: 10.1038/nature04240. - DOI - PubMed

Publication types

LinkOut - more resources