. 2005 Jan 1;33(Database issue):D59-66.

doi: 10.1093/nar/gki084.

HOPPSIGEN: a database of human and mouse processed pseudogenes

Adel Khelifi¹, Laurent Duret, Dominique Mouchiroud

Affiliations

Affiliation

¹ Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard-Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France. khelifi@biomserv.univ-lyon1.fr

PMID: 15608268
PMCID: PMC540038
DOI: 10.1093/nar/gki084

HOPPSIGEN: a database of human and mouse processed pseudogenes

Adel Khelifi et al. Nucleic Acids Res. 2005.

. 2005 Jan 1;33(Database issue):D59-66.

doi: 10.1093/nar/gki084.

Authors

Adel Khelifi¹, Laurent Duret, Dominique Mouchiroud

Affiliation

¹ Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard-Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France. khelifi@biomserv.univ-lyon1.fr

PMID: 15608268
PMCID: PMC540038
DOI: 10.1093/nar/gki084

Erratum in

Nucleic Acids Res. 2005;33(1):448. Adel, Khelifi [corrected to Khelifi, Adel]; Laurent, Duret [corrected to Durent, Laurent]; Dominique, Mouchiroud [corrected to Mouchiroud, Dominique]

Abstract

Processed pseudogenes result from reverse transcribed mRNAs. In general, because processed pseudogenes lack promoters, they are no longer functional from the moment they are inserted into the genome. Subsequently, they freely accumulate substitutions, insertions and deletions. Moreover, the ancestral structure of processed pseudogenes could be easily inferred using the sequence of their functional homologous genes. Owing to these characteristics, processed pseudogenes represent good neutral markers for studying genome evolution. Recently, there is an increasing interest for these markers, particularly to help gene prediction in the field of genome annotation, functional genomics and genome evolution analysis (patterns of substitution). For these reasons, we have developed a method to annotate processed pseudogenes in complete genomes. To make them useful to different fields of research, we stored them in a nucleic acid database after having annotated them. In this work, we screened both mouse and human complete genomes from ENSEMBL to find processed pseudogenes generated from functional genes with introns. We used a conservative method to detect processed pseudogenes in order to minimize the rate of false positive sequences. Within processed pseudogenes, some are still having a conserved open reading frame and some have overlapping gene locations. We designated as retroelements all reverse transcribed sequences and more strictly, we designated as processed pseudogenes, all retroelements not falling in the two former categories (having a conserved open reading or overlapping gene locations). We annotated 5823 retroelements (5206 processed pseudogenes) in the human genome and 3934 (3428 processed pseudogenes) in the mouse genome. Compared to previous estimations, the total number of processed pseudogenes was underestimated but the aim of this procedure was to generate a high-quality dataset. To facilitate the use of processed pseudogenes in studying genome structure and evolution, DNA sequences from processed pseudogenes, and their functional reverse transcribed homologs, are now stored in a nucleic acid database, HOPPSIGEN. HOPPSIGEN can be browsed on the PBIL (Pole Bioinformatique Lyonnais) World Wide Web server (http://pbil.univ-lyon1.fr/) or fully downloaded for local installation.

PubMed Disclaimer

Figures

**Figure 1**
Description of the method used to identify processed pseudogenes.

**Figure 2**
Duplication and reverse transcription. In complete genomes, we can identify four kinds of homologous sequences to a functional gene with introns. (A) A complete duplication of a functional gene leads to the formation of a paralogous gene. If the duplicated gene became non-functional after its insertion, it is classified as an unprocessed pseudogene. However, it may have evolved to give another gene with a new function. If the duplication is recent, we still detect similarities on the introns. (B) Another case of old duplicated gene. In this case, the duplicated gene may be still functional but introns have diverged much faster than exons, therefore we can detect a similarity between the two homologous copies only on the exons. (C) Single exon sequences are generated either by partial or old reverse transcription, or by partial or old gene duplication. (D) A retroelement is generated by reverse transcription. The retroelement lacks introns and is similar only to exons and UTR regions.

**Figure 3**
An alignment calculated by SIM (17) between a genome sequence and a functional gene. The distance between two consecutive positive matches i and i + 1 in the case of homologous sequence (designated as potential region or PR) is defined by d(*a_i*, a_i+1). The splicing sites located on the functional gene are designated by *S_j* for the splicing site number j. Based on these known features, we built a simple algorithm to discriminate between the different cases (Figure 2):
← If (SIM score > 20 and PR similar to at least two exons) ↠ PR is retained;
else if (SIM score > 20 and PR similar to one exon) ↠ PR is a single exon (case C);
else if (SIM score < 20) ↠ PR is discarded;
↑ for each splicing site: test if a_i or a_i±1 is near the splicing site S_j on the gene (S_j ± 10 bp);
If test succeeded: then if d(a_i, a_i+1) > 50 bp ↠ PR is an unprocessed paralog (case A and B);
if test failed ↠ PR is a retroelement (case D);

**Figure 4**
Structure of a retroelement in HOPPSIGEN. After being annotated, retroelements were classified into three classes: (i) putative ORF: retroelements having a conserved_ORF; (ii) putative retroelements, overlapping an ENSEMBL gene position; and (iii) other cases: all retroelements that are processed pseudogenes.

**Figure 5**
Browsing HOPPSIGEN database. HOPPSIGEN help pages (http://pbil.univ-lyon1.fr/databases/hoppsigen.html) contain information on how to query the database. The WWW-Query tools (http://pbil.univ-lyon1.fr/search/query_fam.php) are a powerful way to make queries to HOPPSIGEN.

See this image and copyright information in PMC

References

1. Vanin E.F. (1985) Processed pseudogenes: characteristics and evolution. Annu. Rev. Genet., 19, 253–272. - PubMed
1. Esnault C., Maestre,J. and Heidmann,T. (2000) Human LINE retrotransposons generate processed pseudogenes. Nature Genet., 24, 363–367. - PubMed
1. Mighell A.J., Smith,N.R., Robinson,P.A. and Markham,A.F. (2000) Vertebrate pseudogenes. FEBS Lett., 468, 109–114. - PubMed
1. Pavlicek A., Paces,J., Zika,R. and Hejnar,J. (2002) Length distribution of long interspersed nucleotide elements (LINEs) and processed pseudogenes of human endogenous retroviruses: implications for retrotransposition and pseudogene detection. Gene, 300, 189–194. - PubMed
1. Crollius H.R., Jaillon,O., Dasilva,C., Ozouf-Costaz,C., Fizames,C., Fischer,C., Bouneau,L., Billault,A., Quetier,F., Saurin,W., Bernot,A. and Weissenbach,J. (2000) Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res., 10, 939–949. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

HOPPSIGEN: a database of human and mouse processed pseudogenes

Affiliation

HOPPSIGEN: a database of human and mouse processed pseudogenes

Authors

Affiliation

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources