Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Aug 24:8:308.
doi: 10.1186/1471-2105-8-308.

Analysis of the role of retrotransposition in gene evolution in vertebrates

Affiliations

Analysis of the role of retrotransposition in gene evolution in vertebrates

Zhan Yu et al. BMC Bioinformatics. .

Abstract

Background: The dynamics of gene evolution are influenced by several genomic processes. One such process is retrotransposition, where an mRNA transcript is reverse-transcribed and reintegrated into the genomic DNA.

Results: We have surveyed eight vertebrate genomes (human, chimp, dog, cow, rat, mouse, chicken and the puffer-fish T. nigriviridis), for putatively retrotransposed copies of genes. To gain a complete picture of the role of retrotransposition, a robust strategy to identify putative retrogenes (PRs) was derived, in tandem with an adaptation of previous procedures to annotate processed pseudogenes, also called retropseudogenes (RpsiGs). Mammalian genomes are estimated to contain 400-800 PRs (corresponding to approximately 3% of genes), with fewer PRs and RpsiGs in the non-mammalian vertebrates. Focussing on human and mouse, we aged the PRs, analysed for evidence of transcription and selection pressures, and assigned functional categories. The PRs have significantly less transcription evidence mappable to them, are significantly less likely to arise from alternatively-spliced genes, and are statistically overrepresented for ribosomal-protein genes, when compared to the proteome in general. We find evidence for spurts of gene retrotransposition in human and mouse, since the lineage of either species split from the dog lineage, with >200 PRs formed in mouse since its divergence from rat. To examine for selection, we calculated: (i) Ka/Ks values (ratios of non-synonymous and synonymous substitutions in codons), and (ii) the significance of conservation of reading frames in PRs. We found >50 PRs in both human and mouse formed since divergence from dog, that are under pressure to maintain the integrity of their coding sequences. For different subsets of PRs formed at different stages of mammalian evolution, we find some evidence for non-neutral evolution, despite significantly less expression evidence for these sequences.

Conclusion: These results indicate that retrotranspositions are a significant source of novel coding sequences in mammalian gene evolution.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pipeline summarizing the annotation of PRs and retropseudogenes. The pipeline for PR annotation is summarized. There is an inset at the bottom, that summarizes the tests for local gene order and chromosomal milieu.
Figure 2
Figure 2
Rapid annotation of retropseudogenes. (1) TBLASTN matches (e-value ≤ 10-4) of the annotated proteome against the genomic DNA are sorted by coordinates and collated for each protein to form a set of matches {M}. (2) The sets {M} are filtered using length-based heuristics. (3) Each protein is realigned to the genomic DNA using FASTY, and the best-matching proteins at each point have disablements and that matches >70% of the length of the parent sequence are picked as retropseudogene annotations.
Figure 4
Figure 4
Lineage-specific lists of PRs: The number of species-specific PRs relative to other species. PRs specific relative to other species were obtained by comparison of Ks between the PR and its parent and the Ks between the parent (KsPR_parent) and the ortholog of the parent in the other species (Ksparent_ortholog). PRs with KsPR_parent <Ksparent_ortholog were defined as specific PRs relative to the other species. Only PRs which amino acid identity to parents is more than 70% and have an ortholog in other species were subjected to this calculation. Orthology criteria used are 40% identity over 60% length overlap. 'Human-specific' and 'Chimp-specific' PRs are those formed since the species diverged from each other; similarly, for 'Mouse-specific' and 'Rat-specific' PRs. 'Other primate-specific' are any other PRs formed in human or chimp since divergence from dog (in bold typeface), or from cow (in italic typeface); similarly, for 'Other rodent-specific'.
Figure 3
Figure 3
Ks distributions: (A) Ks distribution for human PRs meeting the local gene order test with threshold of Nhomologs = 0, from comparison to their parent sequences. Labelled are the median values for the 'Human-specific' set, and those PRs formed between divergence from dog and from chimp [see panel (C)]. A similar distribution is observed with an Nhomologs threshold of ≤ 1 for the local gene order test. (B) Ks distribution for mouse PRs meeting the local gene order test with threshold of Nhomologs = 0, from comparison to their parent sequences. Labelled are the median values for the 'Mouse-specific' set, and those PRs formed between divergence from dog and from chimp [see panel (C)]. A similar distribution is observed with an Nhomologs threshold of ≤ 1 for the local gene order test.
Figure 5
Figure 5
Distributions of percentage protein sequence identity between PRs and parents. (A) Distribution of % protein sequence identity for all human PRs that pass the local gene order test (Nhomologs ≤ 1). These are broken down into 'transcribed' and 'not transcribed'. (B) The fraction that are transcribed in each bin of the histogram in panel A. (C) Distribution of % protein sequence identity for all mouse PRs that pass the local gene order test (Nhomologs ≤ 1). These are broken down into 'transcribed' and 'not transcribed'. (D) The fraction that are transcribed in each bin of the histogram in panel C.
Figure 6
Figure 6
Ka/Ks distributions for PRs and for retropseudogenes (RψG s). (A) Distribution of Ka/Ks for human PRs (n = 262) meeting the local gene order test (Nhomologs ≤ 1), compared to Ka/Ks for the RψGs (n = 183). All sequences were required to have protein sequence identity ≥ 60.0% with their parent sequences. (B) As in (A), but for mouse PRs (n = 318) and RψGs (n = 220).

Similar articles

Cited by

References

    1. D'Errico I, Gadaleta G, Saccone C. Pseudogenes in metazoa: origin and features. Briefings in functional genomics & proteomics. 2004;3:157–167. doi: 10.1093/bfgp/3.2.157. - DOI - PubMed
    1. Zhang Z, Harrison P, Liu Y, Gerstein M. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res. 2003;13:2541–2558. doi: 10.1101/gr.1429003. - DOI - PMC - PubMed
    1. Zhang Z, Harrison P, Gerstein M. Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res. 2002;12:1466–14482. doi: 10.1101/gr.331902. - DOI - PMC - PubMed
    1. Zhang Z, Gerstein M. Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Nucleic acids research. 2003;31:5338–5348. doi: 10.1093/nar/gkg745. - DOI - PMC - PubMed
    1. Harrison P, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol. 2002;318:1155–1174. doi: 10.1016/S0022-2836(02)00109-2. - DOI - PubMed

Publication types