Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Dec;13(12):2559-67.
doi: 10.1101/gr.1455503.

A genome-wide survey of human pseudogenes

Affiliations

A genome-wide survey of human pseudogenes

David Torrents et al. Genome Res. 2003 Dec.

Abstract

We screened all intergenic regions in the human genome to identify pseudogenes with a combination of homology searches and a functionality test using the ratio of silent to replacement nucleotide substitutions (KA/KS). We identified 19,724 regions of which 95% +/- 3% are estimated to evolve neutrally and thus are likely to encode pseudogenes. Half of these have no detectable truncation in their pseudocoding regions and therefore are not identifiable by methods that require the presence of truncations to prove nonfunctionality. A comparative analysis with the mouse genome showed that 70% of these pseudogenes have a retrotranspositional origin (processed), and the rest arose by segmental duplication (nonprocessed). Although the spread of both types of pseudogenes correlates with chromosome size, nonprocessed pseudogenes appear to be enriched in regions with high gene density. It is likely that the human pseudogenes identified here represent only a small fraction of the total, which probably exceeds the number of genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
General overview of the strategy for pseudogene search and evaluation. Our analysis can be divided into three different parts: homology search, analysis of orthology for the selection of KA/KS benchmark sets, and the functionality test based on KA/KS. Green, red, and blue boxes denote the intermediate steps, the excluded sequences, and the final results for each of the sections, respectively. See text for details.
Figure 2
Figure 2
KA/KS distributions of benchmark and candidate sets. The KA/KS distributions (as log KA/KS) associated with the functional (green) and pseudogenic (red) benchmark sets (A) as well as the test sequence set (B) are shown. An average of 40% of the sequences analyzed in this study satisfied our requirements for the KA/KS calculation. The subsets of sequences with KA/KS values (1659 for the functional, 1703 for the pseudogenic benchmark sets, and 3291 for the test set) are expected to be representative for each of the corresponding complete sets, as what determines whether a KA/KS value can be calculated for a sequence (availability of homologous sequences and restrictions on the KA/KS calculation; see Methods) is likely to equally affect genes and pseudogenes. By using the least-squares fitting against the benchmark distributions, we evaluated the fraction of pseudogenic (red) and functional (green) sequences for each of the bins of the test distribution and combined them to determine that up to 95% of the sequences analyzed correspond to pseudogenes.
Figure 3
Figure 3
Distribution of genes and the different types of pseudogenes for each of the human chromosomes. We have displayed for each human chromosome the number of pseudogenes (separated in different types; see chart legend for details) and genes per megabase. Chromosomes have been ordered according to the density of pseudogenes (highest on top).

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389-3402. - PMC - PubMed
    1. Birney, E. and Durbin, R. 1997. Dynamite: A flexible code generating language for dynamic programming methods used in sequence comparison. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5: 56-64. - PubMed
    1. Brosius, J. 1999. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 238: 115-134. - PubMed
    1. Bustamante, C.D., Nielsen, R., and Hartl, D.L. 2002. A maximum likelihood method for analyzing pseudogene evolution: Implications for silent site evolution in humans and rodents. Mol. Biol. Evol. 19: 110-117. - PubMed
    1. Collins, J.E., Goward, M.E., Cole, C.G., Smink, L.J., Huckle, E.J., Knowles, S., Bye, J.M., Beare, D.M., and Dunham, I. 2003. Reevaluating human gene annotation: A second-generation analysis of chromosome 22. Genome Res. 13: 27-36. - PMC - PubMed

Publication types

Substances

LinkOut - more resources