Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Jan;20(1):38-46.
doi: 10.1093/molbev/msg011.

The esterase and PHD domains in CR1-like non-LTR retrotransposons

Affiliations

The esterase and PHD domains in CR1-like non-LTR retrotransposons

Vladimir V Kapitonov et al. Mol Biol Evol. 2003 Jan.

Abstract

Most active non-LTR (long terminal repeat) retrotransposons carry two open reading frames (ORFs) encoding ORF1p and ORF2p proteins. The ORF2p proteins are relatively well studied and are known to contain endonuclease/reverse transcriptase domains. At the same time, the biological function of ORF1p proteins remains poorly understood, except in that they nonspecifically bind single-stranded mRNA/DNA molecules. CR1-like elements form the most widely distributed clade/superfamily of non-LTR retrotransposons. We found that ORF1p proteins encoded by diverse CR1-like elements contain conserved esterase domain (ES) or plant homeodomain (PHD). This indicates that CR1-like ORF1p proteins are either lipolytic enzymes or are involved in protein-protein interactions related to chromatin remodeling. Sequence conservation of ES suggests that interaction with cellular membranes is an important phase in life circles of CR1-like elements. Presumably such interaction helps in penetrating host cells. As a consequence, the presence of multiple young CR1 families characterized by approximately 10% intrafamily and 40% interfamily identities may be explained by a relatively frequent horizontal transfer of these CR1-like elements. Unexpectedly, ES links together non-LTR retrotransposons and single-stranded RNA viruses like influenza C and coronaviruses, which are known to depend on their own ES.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.
Schematic structure of complete CR1-like retrotransposons from fishes and insects. CR1-1_DR, CR1-2_DR and CR1-3_DR—are consensus sequences of retrotransposons that belong to the three families of retrotransposons identified in the Danio rerio genome. Maui and Rex1 are the consensus sequences of two retrotransposons from CR1-like families present in the Fugu rubripes genome. CR1_OL is a slightly damaged element identified in the Oryzias latipes genome. Horizontally shaded boxes mark ORF1s and ORF2s. ORF2s encode proteins composed of the apurinic/apyrimidinic endonuclease (APE) and reverse transcriptase (RT) domains. Proteins encoded by ORF1s are composed of putative zinc finger/leucine zipper (ZL) motifs, the plant homeodomain (PHD) and the esterase (ES) domains. Black squares, diamonds and hexagons indicate different unclassified domains. The 3′ termini of all retrotransposons, excluding CR1-2_DR and CR1_DM, are shown starting from the polyadenylation signal, followed by terminal microsatellite repeats composed of different 4–7-bp units repeated 2–8 times. The average number of the repetitions is shown as a subscript index
F<sc>ig</sc>. 2.
Fig. 2.
Phylogeny of the CR1-like non-LTR retrotransposons based on their endonuclease and reverse transcriptase domains. The phylogenetic tree also includes several retrotransposons from the Jockey, LOA, I, and L1 clades. Numbers next to each node indicate bootstrap values calculated as percentages of similar topologies out of 1,000 replicas for the neighbor-joining method. The names of non-LTR retrotransposons families and their host species are shown adjacent to the tree nodes. A scale of distances between the protein sequences is indicated. Solid triangles denote retrotransposons whose ORF1s code for the esterase. GenBank proteins identification numbers are as follows: Jockey (134083), Juan-C (1079026), Doc (8823), Lian (7511795), I (903726), CR1_BF (17529698), CR1 (2331059), CR1_PS (6576738), Q (11359829), T1 (159644), L1 (2072977). Sequences of the remaining retrotransposons have been deposited in the following sections of Repbase Update: humrep.ref (L2 and L3), dmrep.ref (CR1_DM, BAGGINS1, IVK), fugrep.ref (Maui, REX1), zebrep.ref (CR1-1_DR, CR1-2_DR, CR1-3_DR) and invrep.ref (SR1)
F<sc>ig</sc>. 3.
Fig. 3.
Zinc finger motifs in proteins encoded by ORF1s of different CR1-like non-LTR retrotransposons. C denotes cysteine; L, leucine; H, histidine; X, any residue; the subscript index indicates the number of the amino acid residues marked by it. A, Putative zinc finger/leucine zipper domains in CR1 (Haas et al. 2001), CR1_PS (Kajikawa, Ohshima, and Okada 1997), Maui (Poulter, Butler, and Ormandy 1999) and CR1_OL from chicken, turtle, pufferfish, and medaka, respectively. B, ORF1 proteins encoded by the fruit fly CR1_DM and the malaria mosquito Q1 and T non-LTR retrotransposons that harbor the PHD domain. Conserved cysteine and histidine residues matching the PHD consensus sequence are highlighted. Numbers at the beginning and the end of the amino acid sequences indicate positions of the corresponding amino acid residues in the protein sequences deposited in GenBank and Repbase Update
F<sc>ig</sc>. 4.
Fig. 4.
Multiple sequence alignment of the putative conserved esterase domains encoded by ORF1s in CR1-like non-LTR retrotransposons and other esterases. Solid arrowheads mark the catalytic serine-asparate-histidine triad. Ambiguous amino acids are denoted by Xs. GenBank protein identification numbers are as follows: CR1_PS (6576737), CR1 (2331058), Maui (4378024), NeuA (13876786, CMP-N-acetylneuramic acid synthetase from Streptococcus agalactiae), TesA (267107, Acyl-CoA thioesterase I from Escherichia coli), RGAE (7766904), rhamnogalacturonan acetylesterase from Aspergillus aculeatus), PAF-AH (2624421, platelet-activating factor acetylhydrolase from Bos taurus). Amino acid sequences of ORF1p proteins encoded by CR1-1_DR, CR1-2_DR, CR1-3_DR, CR1-1_TN, CR1_OL and L3 are deposited in Repbase Update
F<sc>ig</sc>. 5.
Fig. 5.
Flowchart of the identification of ORF1p encoded by the ancient L3 retrotransposon fossilized in the human genome. Arrows indicate information flow directions. Rectangles illustrate different computational processes indicated by corresponding program names. Parallelograms indicate specific sets of data. Cans symbolize databases. GenBank accession numbers of sequences containing DNA regions which encode protein sequences TBLASTN-similar to CR1_PS1p are indicated together with corresponding E values. The “smiley face” marks the esterase domain found in different CR1-like elements

References

    1. Aasland, R., T. J. Gibson, and A. F. Stewart. 1995. The PHD finger: implications for chromatin-mediated transcriptional regulation. Trends Biochem. Sci 20:56-59. - PubMed
    1. Aasland, R., and A. F. Stewart. 1995. The chromo shadow domain, a second chromo domain in heterochromatin-binding protein 1, HP1. Nucleic Acids Res 23:3168-3174. - PMC - PubMed
    1. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402. - PMC - PubMed
    1. Arpigny, J. L., and K. E. Jaeger. 1999. Bacterial lipolytic enzymes: classification and properties. Biochem. J 343:177-183. - PMC - PubMed
    1. Bailey, T. L., and W. N. Grundy. 1999. Classifying proteins by family using the product of correlated p-values, pp. 10–14. in P. Istrail, P. Pevzner, and M. Waterman, eds. Proceedings of the Third International Conference on Computational Molecular Biology (RECOMB99). ACM, New York.

Publication types