Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Sep 6:3:179-95.

Identification of novel retroid agents in Danio rerio, Oryzias latipes, Gasterosteus aculeatus and Tetraodon nigroviridis

Affiliations

Identification of novel retroid agents in Danio rerio, Oryzias latipes, Gasterosteus aculeatus and Tetraodon nigroviridis

Holly A Basta et al. Evol Bioinform Online. .

Abstract

Retroid agents are genomes that encode a reverse transcriptase (RT) and replicate or transpose by way of an RNA intermediate. The Genome Parsing Suite (GPS) is software created to identify and characterize Retroid agents in any genome database (McClure et al. 2005). The detailed analysis of all Retroid agents found by the GPS in Danio rerio (zebrafish), Oryzias latipes (medaka), Gasterosteus aculeatus (stickleback) and Tetraodon nigroviridis (spotted green pufferfish) reveals extensive Retroid agent diversity in the compact genomes of all four fish. Novel Retroid agents were identified by the GPS software: the telomerase reverse transcriptase (TERT) in O. latipes, G. aculeatus and T. nigroviridis and a potential TERT in D. rerio, a retrotransposon in D. rerio, and multiple lineages of endogenous retroviruses (ERVs) in D. rerio, O. latipes and G. aculeatus.

Keywords: Danio rerio; Gasterosteus aculeatus; Genome Parsing Suite software; Oryzias latipes; Retroid; Tetraodon nigroviridis; transposable elements.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A Representation of the GPS Software. The output of Stage I GPS includes Raw, Unique and Perfect RT hits as defined in the text. Unique hits are assessed for presence of the Ordered Series of Motifs (OSM) as illustrated in the Motif Distribution box. The columns indicate the number of motifs the Unique RT hits have (zero through six), and the rows indicate queries. For example, there are 17 copies of an RT sequence with all six motifs that is more closely related to the retrotransposon SUZU than any other query. All Unique hits are passed to Stage II GPS and a 14kb+ length of host DNA inclusive of each RT hit is excised and assessed. The results of this stage are full length Retroid genomes, classified as those with one stop codon (1SC) or frame shift, (1FS) and those with complete, error-free open reading frames (perfect). Given the observation of translational recoding in Retroid agents, these three classes are considered potentially active. Gene abbreviations are as follows: LTR = long terminal repeat, GAG = group specific antigen, PRO=protease, RT = reverse transcriptase, RH = ribonuclease H, IN = integrase, and ENV = envelope.
Figure 2.
Figure 2.
Example Query Library Gene Component Maps. The GPS accesses a database populated by Retroid genomes. This query library contains all the genes and non-coding components, which the GPS uses to identify and reconstruct potential Retroid agents found in organismal genome databases. Those gene abbreviations not found in Figure 1 are as follows: APE = apurinic endonuclease, UN/UNK= unknown region, EN= putative PDD endonuclease (Xiong et al 1988), TE=tether, 5UTR= 5′ untranslated region, 3UTR= 3′ untranslated region, 5LTR= 5′ long terminal repeat, 3LTR= 3′ long terminal repeat. The FTERT is divided into a Carboxyl portion (CARB) and the RT. The red box highlights the RT genes. If a potential Retroid agent encodes all the genes in a specific query component library, it is considered full length. Retroid agents accession numbers and the hosts in which they were discovered are presented in Figure 3.
Figure 3.
Figure 3.
Phylogenetic Tree of the Query Host Organisms. The query names and NCBI accession numbers are listed next to respective host organisms. The tree was created using TaxBrowser on the NCBI website (Benson et al. 2000; Wheeler et al. 2000). Those sequences that are considered fish-specific (see Results) are enclosed in the box.
Figure 4.
Figure 4.
Stage I GPS: low frequency RT hits. Unique RT hits retrieved by fish specific queries are shown in blue. Purple indicates those hits retrieved by non-fish specific queries, while those RT hits that are less than 100 bp in length are shown in off-white. The y axis indicates Unique RT hits, while the x axis indicates organism. See Figure 3 for fish and non-fish specific query names, host organisms and accession numbers.
Figure 5.
Figure 5.
Stage II GPS: number of full length copies by Retroid class and organism. Organisms are represented in different colors: D. rerio is light green, O. latipes is pink, G. aculeatus is off-white and T. nigroviridis is light blue. The y axis is the number of full length copies and the x axis is type of agent. Full length copies are separated into retroposons, retrotransposons and retroviruses. Numbers on the bars indicate the number of different retroposon, retrotransposon or retrovirus families that comprise the full length copies indicated by each bar.
Figure 6.
Figure 6.
Stage II GPS results for those queries identified as full length in D. rerio (DR), O. latipes (OL), G. aculeatus (GA) and T. nigroviridis (TN). Full length sequences are shown in grey for D. rerio, red for O. latipes, light purple for G. aculeatus and green for T. nigroviridis. Full length agents are further classified as potentially active, which includes full lengths with one frame shift, one stop codon, and those agents that have no frame shifts and no stop codons (Perfect). Potentially active sequences are shown in black for D. rerio, dark red for O. latipes, dark purple for G. aculeatus and green for T. nigroviridis. The y axis is full length copy numbers and the x axis is the query name. Retroid agents are grouped into retroposon and retrotransposon families. A two-dimensional square indicates zero copies. Their hosts of origin and accession numbers are listed in Figure 3.
Figure 7.
Figure 7.
Stage II GPS results for those queries identified as full length in three out of the four fish genomes. Color scheme is as in Figure 6. The y axis is full length copy numbers and the x axis is the query name. Retroid agents are grouped into retroposon, retrotransposon and retrovirus families. A two-dimensional square indicates zero copies. Their hosts of origin and accession numbers are listed in Figure 3.
Figure 8.
Figure 8.
Stage II GPS results for potential full length Retroid agents shared between two of the four fish genomes. Color scheme is as in Figure 6. The y axis is full length copy numbers and the x axis is the query name. Retroid agents are grouped into retroposon, retrotransposon and retrovirus families. Their hosts of origin and accession numbers are listed in Figure 3. Note that the O. latipes full length REX6 copies extend beyond the graph, and there are a total of 279 copies. A two-dimensional square indicates zero copies.
Figure 9.
Figure 9.
Alignment of the two sections (CARB and RT) of FTERT with the potential TERT proteins found in T. nigroviridis (TNTERT), O. latipes (OLTERT), G. aculeatus (GATERT) and D. rerio (DRTERT). The TERT sequences are actually a single long RT, but the sequence is divided into two to increase the chances of finding the entire TERT through multiple introns, as well as to keep it uniform with the rest of the query RT sizes. Note the string of N’s in the D. rerio sequence, which is caused by large regions that contain unsequenced regions, annotated by “N” amino acids in the chromosome sequence. Large regions of unsequenced data correspond to only a small portion of the DRTERT because they primarily make up introns that are spliced out when the mRNA is made. This unsequenced portion falls over the second, third, and forth RT motifs. The OSM (see Methods) is indicated by boxes, and the splice points are indicated by stars. The alignment was created using Clustalx in the MEGA 3.0 software package (Kumar et al. 2004).
Figure 10.
Figure 10.
Phylogenetic tree of fish retroviruses. The tree was constructed using RT amino acid sequences from 23 retroviruses (both endogenous and exogenous) and only includes RTs which have all six motifs of the OSM (see Methods). This tree was made using the MEGA 3.0 software (Kumar et al. 2004), using the UPGMA (Sneath et al. 1973) method with bootstrap values (3000 repetitions) (Felsenstein et al. 1985). The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method (Zuckerkandl et al. 1985) and are in the units of the number of amino acid substitutions per site. Organism name, chromosome number, and then Retroid name label tree tips for those novel agents pulled out by a query. The retroviruses and accession numbers that are not included on the query host organism tree (Figure 3) are Walleye epidermal hyperplasia virus type 1 (WEHV1) (AF014792), Walleye epidermal hyperplasia virus type 2 (WEHV1) (AF014793), Atlantic salmon swim bladder sarcoma virus (SSSV) (DQ174103) and Rous sarcoma virus (RSV) (NC_001407). In addition to RSV, GYPSY, HTLV1, HIV1 and SRV2 (Figure 3) are included as non-fish-retrovirus out groups.

Similar articles

Cited by

References

    1. Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215(3):403–10. - PubMed
    1. Baranov PV, Gesteland RF, Atkins JF. Recoding: translational bifurcations in gene expression. Gene. 2002;286(2):187–201. - PubMed
    1. Bell MA, Foster SA, editors. The evolutionary biology of the threespine stickleback. Oxford University Press; Oxford: 1994.
    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, et al. GenBank. Nucleic Acids Res. 2000;28(1):15–8. - PMC - PubMed
    1. Blond JL, Lavillette D, Cheynet V, et al. An envelope glycoprotein of the human endogenous retrovirus HERV-W is expressed in the human placenta and fuses cells expressing the type D mammalian retrovirus receptor. J. Virol. 2000;74(7):3321–9. - PMC - PubMed

LinkOut - more resources