Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 27;6(1):vez060.
doi: 10.1093/ve/vez060. eCollection 2020 Jan.

Wide spectrum and high frequency of genomic structural variation, including transposable elements, in large double-stranded DNA viruses

Affiliations

Wide spectrum and high frequency of genomic structural variation, including transposable elements, in large double-stranded DNA viruses

Vincent Loiseau et al. Virus Evol. .

Abstract

Our knowledge of the diversity and frequency of genomic structural variation segregating in populations of large double-stranded (ds) DNA viruses is limited. Here, we sequenced the genome of a baculovirus (Autographa californica multiple nucleopolyhedrovirus [AcMNPV]) purified from beet armyworm (Spodoptera exigua) larvae at depths >195,000× using both short- (Illumina) and long-read (PacBio) technologies. Using a pipeline relying on hierarchical clustering of structural variants (SVs) detected in individual short- and long-reads by six variant callers, we identified a total of 1,141 SVs in AcMNPV, including 464 deletions, 443 inversions, 160 duplications, and 74 insertions. These variants are considered robust and unlikely to result from technical artifacts because they were independently detected in at least three long reads as well as at least three short reads. SVs are distributed along the entire AcMNPV genome and may involve large genomic regions (30,496 bp on average). We show that no less than 39.9 per cent of genomes carry at least one SV in AcMNPV populations, that the vast majority of SVs (75%) segregate at very low frequency (<0.01%) and that very few SVs persist after ten replication cycles, consistent with a negative impact of most SVs on AcMNPV fitness. Using short-read sequencing datasets, we then show that populations of two iridoviruses and one herpesvirus are also full of SVs, as they contain between 426 and 1,102 SVs carried by 52.4-80.1 per cent of genomes. Finally, AcMNPV long reads allowed us to identify 1,757 transposable elements (TEs) insertions, 895 of which are truncated and occur at one extremity of the reads. This further supports the role of baculoviruses as possible vectors of horizontal transfer of TEs. Altogether, we found that SVs, which evolve mostly under rapid dynamics of gain and loss in viral populations, represent an important feature in the biology of large dsDNA viruses.

Keywords: baculovirus; genomic structural variation; herpesvirus; iridovirus; large double-stranded DNA viruses; transposable elements.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Illustration of two important steps of the hierarchical clustering of SVs. (A) Influence of the clustering threshold value. The top panel illustrates three reads (one long PacBio read and two short Illumina reads) mapped onto overlapping regions of the viral genome. Red asterisks correspond to sequencing errors that prevent accurate mapping of long reads. ‘start’ and ‘end’ correspond to start and end coordinates of the SV detected by SV callers (a deletion in the case of Reads 1 and 2 and a duplication for Read 3. The bottom panel shows how using multiple clustering thresholds prevents discarding well-supported SVs. With a low threshold, all clusters contain a single SV because none of the SVs have the exact same coordinates. Because a downstream filter of our pipeline requires that SVs must be detected either by both long and short reads (in the case of the AcMNPV population sequenced using both Illumina and PacBio technologies) or by two programs (in the case of the three other large dsDNA viruses sequenced only with Illumina) to be retained, none of the SVs are retained with this low clustering threshold. With a high clustering threshold, all SVs (two deletions and one duplication) end up in the same cluster because they are defined by coordinates that are close to each other. Because a downstream filter of our pipeline requires that all SVs within a cluster must be of the same nature for a cluster to be retained, the cluster is here not considered further. With a medium threshold, the deletions detected by Reads 1 and 2 are lumped into the same cluster because their coordinates are close enough and the duplication detected by Read 3 forms another cluster because its coordinates are too far from those of the deletion. After running the downstream filters of our pipeline, Cluster 2 is retained and one deletion is counted because it has been detected independently by long and short reads. The cluster containing the duplication is not considered further because it contains only one SV detected by short reads only. Note that although SVs supported by only one read are represented here for the sake of simplicity but our approach only retained SVs supported by a minimum of three reads. (B) Influence of the minimum number of reads supporting a SV. On the left panel, using three reads as the minimum number of reads required to retain SVs, ten SVs of different nature and/or supported by different numbers of reads have been detected by SV callers. Under a given clustering threshold value, these ten SVs form five clusters, only two of which are retained (A–B and I–J) by downstream filters because they contain several SVs which are all of the same nature. On the right panel, only six of the ten SVs detected on the left panel are detected by SV callers using eight reads as the minimum number of reads required to retain SVs. With the same given clustering threshold value as in the left panel, SVs form four clusters, two of which (A, B and C, D) are retained by downstream filters because they contain several SVs which are all of the same nature. Using multiple minimum numbers of reads supporting SVs ensure that well-supported SVs (here the inversion in C and D) are not eliminated by downstream filters.
Figure 2.
Figure 2.
Number, size, and frequency of SVs in the four viral populations. (A) Number of detected SVs by SV type for the four viruses. No insertions were detected in the HCMV and IIV6 viral population. Insertions were only detected in long-read AcMNPV and in the IIV31 short reads. (B) Boxplots representing the size of detected SVs by SV type for the four viral populations. (C) Frequency of viral genomes carrying SVs shown by SV type for the four viral populations. The frequency was computed considering SV number per viral genome follows a Poisson distribution. DEL, deletion; DUP, duplication; INS, insertion; INV, inversion.
Figure 3.
Figure 3.
Map of the circular AcMNPV genome illustrating all SVs present in more than 0.1 per cent of the viral population sequenced with Illumina and PacBio technologies. Each SV is illustrated by a curve linking their start and end coordinates. Histograms on top of SVs correspond to the relative frequency of each SV, with the most frequent SV involving hr4b.
Figure 4.
Figure 4.
Number of SVs by 0.001 per cent frequency bin detected in the AcMNPV population sequenced using Illumina and PacBio technologies. Only the first 100 frequency bins are shown. The vast majority of SVs (92%) are present in viral genomes at a very low frequency (<0.01%).
Figure 5.
Figure 5.
Dynamics of SVs in twenty evolved AcMNPV lines. (A) Red circles show the number of SVs detected in ten AcMNPV populations which were each purified after ten infection cycles on larvae of the beet armyworm (S. exigua). The green circle shows the number of SVs detected in the parental population of AcMNPV purified from larvae of the cabbage looper moth (T. ni). The size of the circle is proportional to the number of SVs and the frequency of viral genomes carrying a SV is given between brackets, assuming the number of SVs per viral genome follows a Poisson distribution. The thickness of the lines linking the parental AcMNPV population to each of the ten evolved populations is proportional to the number of shared SVs (numbers in black on the lines). (B) Same as in A except that the ten evolved AcMNPV populations were purified after ten infection cycles on larvae of the same species (T. ni) as that used to generate the parental AcMNPV population. (C) Frequency of the ten most frequent SVs detected among the twenty-one viral populations and which were initially present in the parental AcMNPV population. One color corresponds to one SV. The ‘G0’ population refers to the parental population. The S0–S9 populations refer to the populations evolved on S. exigua larvae. The T0–T9 populations refer to populations evolved on T. ni. Note that no SVs reached >10 per cent in frequency in any AcMNPV population. (D) Number of SVs present in the parental AcMNPV population that were also detected in one to ten evolved viral populations. Most SVs were only detected in one evolved population (seventy-four in S. exigua and forty-six in T. ni).
Figure 6.
Figure 6.
Number of TEs integrated as full-length copies for the nine TE superfamilies found in the AcMNPV genomes. Six and three TE superfamilies belong to Classes II and I TEs, respectively. The major part of full-length inserted TE sequences belong to the Class II TE superfamilies (480 complete TE sequences out of 524), mainly to the PiggyBac TE superfamily.

Similar articles

Cited by

References

    1. Acevedo A., Andino R. (2014) ‘Library Preparation for Highly Accurate Population Sequencing of RNA Viruses’, Nature Protocols, 9: 1760–9. - PMC - PubMed
    1. Acevedo A., Brodsky L., Andino R. (2014) ‘Mutational and Fitness Landscapes of an RNA Virus Revealed through Population Sequencing’, Nature, 505: 686–90. - PMC - PubMed
    1. Ackermann H.-W., Smirnoff W. A. (1983) ‘A Morphological Investigation of 23 Baculoviruses’, Journal of Invertebrate Pathology, 41: 269–80.
    1. Akhtar L. N. et al. (2019) ‘Genotypic and Phenotypic Diversity of Herpes Simplex Virus 2 within the Infected Neonatal Population’, MSphere, 4: e00590–18 - PMC - PubMed
    1. Alkan C., Coe B. P., Eichler E. E. (2011) ‘Genome Structural Variation Discovery and Genotyping’, Nature Reviews Genetics, 12: 363–76. - PMC - PubMed