Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Apr 28;33(8):2374-83.
doi: 10.1093/nar/gki531. Print 2005.

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability

Affiliations

Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability

Paul M Harrison et al. Nucleic Acids Res. .

Abstract

Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (PPsigs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human PPsigs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPPsig), which is disabled but nonetheless transcribed. TPPsigs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPPsig. To elucidate their role, we identified human TPPsigs by mapping expressed sequences onto PPsigs and, reciprocally, extracting TPPsigs from known mRNAs. We consider only those PPsigs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166-233 TPPsigs ( approximately 4-6% of PPsigs). Proteins/transcripts with the highest numbers of homologous TPPsigs generally have many homologous PPsigs and are abundantly expressed. TPPsigs are significantly over-represented near both the 5' and 3' ends of genes; this suggests that TPPsigs can be formed through gene-promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPPsigs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPPsigs are unlike other PPsigs and processed genes in the following ways: (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPPsigs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPPsigs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPPsigs is available at: http://www.biology.mcgill.ca/faculty/harrison/tppg/bppg.tov (or) http:pseudogene.org.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Origination and deposition of TPΨgs for different chromosomes. (A) Origination of TPΨgs: this plot shows the number of parent genes of TPΨgs in a chromosome versus the chromosome size (in Mb). (B) Deposition of TPΨgs: this shows the number of TPΨgs per chromosome versus chromosome size (in Mb). Only retrotranspositions from one chromosome to another are considered in each plot. The X chromosome is ringed. Note that for each plot we have corrected for the probability of X and Y chromosome inclusion in gametes [i.e. the size of X is multiplied by 0.75 and Y by 0.25; for comparison see figure 1 in (13)].
Figure 2
Figure 2
Examples of TPΨgs. (A) This is a TPΨg derived from the human prohibitin gene. The prohibitin gene contains both a protein-coding region and an RNA in its 3′-UTR (45), but only the segment of the TPΨg corresponding to the protein-coding sequence is shown. In the center is an alignment of the TPΨg (in red) with prohibitin protein (in green). The graphic above it shows the position of the TPΨg (red segment) in the 3′-UTR of an mRNA that codes for a Zn-finger-containing protein (blue segment). (B) An example of a TPΨg that maps to a known globular protein domain. The TPΨg derives from the mRNA for the precursor sequence of mitochondrial 2-amino-3-ketobutyrate coenzyme A. The domain is from the closest-matching protein structure (from E.coli, PDB code 1fc4a). In the Molscript (54) picture, the protein chain trace color changes at the position of each disablement. The alignment of the E.coli domain sequence and the human TPΨg sequence is shown. The part of the sequence that maps to an EST (gi|6138420) is boxed and italicized.

References

    1. Dermitzakis E.T., Reymond A., Scamuffa N., Ucla C., Kirkness E., Rossier C., Antonarakis S.E. Evolutionary discrimination of mammalian conserved non-genic sequences (CNGs) Science. 2003;302:1033–1035. - PubMed
    1. Bejerano G., Pheasant M., Makunin I., Stephen S., Kent W.J., Mattick J.S., Haussler D. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. - PubMed
    1. Consortium E.P. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. - PubMed
    1. Harrison P., Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J. Mol. Biol. 2002;318:1155–1174. - PubMed
    1. Harrison P.M., Hegyi H., Balasubramanian S., Luscombe N.M., Bertone P., Echols N., Johnson T., Gerstein M. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res. 2002;12:272–280. - PMC - PubMed

Publication types