Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 3;13(1):31.
doi: 10.1186/s13100-022-00288-w.

CAULIFINDER: a pipeline for the automated detection and annotation of caulimovirid endogenous viral elements in plant genomes

Affiliations

CAULIFINDER: a pipeline for the automated detection and annotation of caulimovirid endogenous viral elements in plant genomes

Héléna Vassilieff et al. Mob DNA. .

Abstract

Plant, animal and protist genomes often contain endogenous viral elements (EVEs), which correspond to partial and sometimes entire viral genomes that have been captured in the genome of their host organism through a variety of integration mechanisms. While the number of sequenced eukaryotic genomes is rapidly increasing, the annotation and characterization of EVEs remains largely overlooked. EVEs that derive from members of the family Caulimoviridae are widespread across tracheophyte plants, and sometimes they occur in very high copy numbers. However, existing programs for annotating repetitive DNA elements in plant genomes are poor at identifying and then classifying these EVEs. Other than accurately annotating plant genomes, there is intrinsic value in a tool that could identify caulimovirid EVEs as they testify to recent or ancient host-virus interactions and provide valuable insights into virus evolution. In response to this research need, we have developed CAULIFINDER, an automated and sensitive annotation software package. CAULIFINDER consists of two complementary workflows, one to reconstruct, annotate and group caulimovirid EVEs in a given plant genome and the second to classify these genetic elements into officially recognized or tentative genera in the Caulimoviridae. We have benchmarked the CAULIFINDER package using the Vitis vinifera reference genome, which contains a rich assortment of caulimovirid EVEs that have previously been characterized using manual methods. The CAULIFINDER package is distributed in the form of a Docker image.

Keywords: Bioinformatics; Caulimoviridae; Endogenous viral elements; Genome annotation; Paleovirology; Plant genomes; Repetitive elements.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of CAULIFINDER Branch A workflow. The line of arrows at the top represents the four main steps of the workflow. Grey boxes indicate successive sub-steps with the main tools highlighted in red font. The dark grey box comprises several analyses that are run on the consensus library. The main output files are shown in blue boxes. The input datasets are shown in khaki boxes with arrows indicating in which sub-step they are used. The red two-headed arrow represents the “blastclust_supplementation” option (default = FALSE). The purple two-headed arrow represents the “filter_chimeras” option (default = TRUE)
Fig. 2
Fig. 2
Overview of CAULIFINDER Branch B workflow. The line of three arrows on the top represents the three main steps of the workflow. Grey boxes indicate successive sub-steps with the main tools highlighted with red font. The main output files are shown in blue boxes. The input datasets are shown in khaki boxes with arrows indicating in which sub-step they are used. The grey looping arrows in steps 2 and 3 indicate the number of iterations of sequence selection using protein alignment with MUSCLE, followed by trimAl with empirical parameters
Fig. 3
Fig. 3
Overview of the multiple sequence alignments obtained for the VvinAV-VvinBV cluster using CAULIFINDER Branch A. The alignments were obtained using MAFFT with the ginsi and leave gappy regions 0.8 settings and visualized in the overview window of the Jalview program [39] with the following nucleotide colours: A (green), T (blue), G (red), C (orange). The colour densities are smoothed in the overview. The reference sequences are highlighted in red (VvinAV), blue (VvinBV_compA) or turquoise (VvinBV_compB). Branch A output sequences are not highlighted and concatemers have been removed for the ease of visualization. The raw alignments can be visualized in Supplementary Fig. 1. The alignments correspond to run 1 (A), run 2 (B) and run 3 (C). For the latter, only the VvinBV_compB sequences are shown
Fig. 4
Fig. 4
Caulimovirid endogenous viral element diversity in Vitis vinifera. Phylogenetic tree of reverse transcriptase domains built from the Newick file obtained from CAULIFINDER Branch B applied on the V. vinifera PN40024 genome. All caulimovirid, Gypsy and retroviral reference sequences contain the tag ‘REF” in their label. Branches are colored as follows: Retroviridae (brown), Gypsy elements (blue), Caulimoviridae (black) and Branch B representative sequences (red). Several clades have been collapsed for ease of visualization. The reference RT from Vitis endovirus is indicated as “Unclassified Vvin”. Bootstrap values above 70% are highlighted using purple disks in the branch nodes

References

    1. Azzam O, Chancellor TCB. The biology, epidemiology, and management of rice tungro disease in Asia. Plant Dis. 2002;86:88–100. doi: 10.1094/PDIS.2002.86.2.88. - DOI - PubMed
    1. Bombarely A, Moser M, Amrad A, Bao M, Bapaume L, Barry CS, Bliek M, Boersma MR, Borghi L, Bruggmann R, et al. Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida. Nat Plants. 2016;2:16074. doi: 10.1038/nplants.2016.74. - DOI - PubMed
    1. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. - DOI - PMC - PubMed
    1. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. - DOI - PMC - PubMed
    1. Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17:540–552. doi: 10.1093/oxfordjournals.molbev.a026334. - DOI - PubMed

LinkOut - more resources