Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 14:13:1011565.
doi: 10.3389/fpls.2022.1011565. eCollection 2022.

Genome-wide identification of Reverse Transcriptase domains of recently inserted endogenous plant pararetrovirus (Caulimoviridae)

Affiliations

Genome-wide identification of Reverse Transcriptase domains of recently inserted endogenous plant pararetrovirus (Caulimoviridae)

Carlos de Tomás et al. Front Plant Sci. .

Abstract

Endogenous viral elements (EVEs) are viral sequences that have been integrated into the nuclear chromosomes. Endogenous pararetrovirus (EPRV) are a class of EVEs derived from DNA viruses of the family Caulimoviridae. Previous works based on a limited number of genome assemblies demonstrated that EPRVs are abundant in plants and are present in several species. The availability of genome sequences has been immensely increased in the recent years and we took advantage of these resources to have a more extensive view of the presence of EPRVs in plant genomes. We analyzed 278 genome assemblies corresponding to 267 species (254 from Viridiplantae) using tBLASTn against a collection of conserved domains of the Reverse Transcriptases (RT) of Caulimoviridae. We concentrated our search on complete and well-conserved RT domains with an uninterrupted ORF comprising the genetic information for at least 300 amino acids. We obtained 11.527 sequences from the genomes of 202 species spanning the whole Tracheophyta clade. These elements were grouped in 57 clusters and classified in 13 genera, including a newly proposed genus we called Wendovirus. Wendoviruses are characterized by the presence of four open reading frames and two of them encode for aspartic proteinases. Comparing plant genomes, we observed important differences between the plant families and genera in the number and type of EPRVs found. In general, florendoviruses are the most abundant and widely distributed EPRVs. The presence of multiple identical RT domain sequences in some of the genomes suggests their recent amplification.

Keywords: Caulimoviridae; Reverse Transcriptase (RT); endogenous; pararetrovirus; virus.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Phylogenetic relationships within the episomal and endogenous Caulimoviridae. Phylogram obtained from a maximum likelihood analysis with protein sequence data from RT conserved domains using 500 bootstrap replications. The size of the point indicated the bootstrap support of the tree branch. Known episomal and endogenous pararetrovirus are shown in grey and small letters. New endogenous Clusters60 are shown in bold letters. The color of the branch indicates the genus of Caulimoviridae; Bad, Badnavirus; Dio, Dioscovirus; Yen, yendovirus; Tun, tungrovirus; Zen, zendovirus; Vac, vaccinivirus; Ros, rosadnavirus; Flo, florendovirus; Gym1 and Gym2, gymnendovirus1 and 2; Pet, petuvirus; Fer, fernendovirus; Cav, cavemovirus; Sol, solendovirus; Cau, caulimovirus; Ruf, ruflodivirus; Soy, soymovirus; Xen, xendovirus; and Wen, wendovirus.
Figure 2
Figure 2
Phylogenetic relationships of representative sequences of the Cluster100. Representative sequences of the RT-EPRV Cluster100 (in red) were aligned with RT sequences of pararetroviral elements (in black), and a phylogenetic tree was constructed using the NJ method and 1000 bootstrap replications.
Figure 3
Figure 3
Schematic representation of wendovirus endogenous pararetrovirus. A scaled linear view of the genome organization of Wendovirus. The name of the sequences is the same as in Supplementary Data 4 . Grey arrows mark open reading frames and colored regions within ORFs are conserved protein domains: blue, zinc finger typically present in the coat proteins; green, Movement Protein; yellow, Aspartic Proteinase; red, Reverse Transcriptase; pink, RNaseH.

References

    1. Aiewsakun P., Katzourakis A. (2015). Endogenous viruses: Connecting recent and ancient viral evolution. Virology 479-480, 26–37. doi: 10.1016/j.virol.2015.02.011 - DOI - PubMed
    1. Bao W., Kojima K. K., Kohany O. (2015). Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11. doi: 10.1186/s13100-015-0041-9 - DOI - PMC - PubMed
    1. Bejarano E. R., Khashoggi A., Witty M., Lichtenstein C. (1996). Integration of multiple repeats of geminiviral DNA into the nuclear genome of tobacco during evolution. Proc. Natl. Acad. Sci. U.S.A. 93, 759–764. doi: 10.1073/pnas.93.2.759 - DOI - PMC - PubMed
    1. Bertsch C., Beuve M., Dolja V. V., Wirth M., Pelsy F., Herrbach E., et al. (2009). Retention of the virus-derived sequences in the nuclear genome of grapevine as a potential pathway to virus resistance. Biol. Direct 4, 21. doi: 10.1186/1745-6150-4-21 - DOI - PMC - PubMed
    1. Catlin N. S., Josephs E. B. (2022). The important contribution of transposable elements to phenotypic variation and evolution. Curr. Opin. Plant Biol. 65, 102140. doi: 10.1016/j.pbi.2021.102140 - DOI - PubMed

LinkOut - more resources