Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 7;11(1):veaf051.
doi: 10.1093/ve/veaf051. eCollection 2025.

Insights into diversity, host range, and evolution of iflaviruses in Lepidoptera through transcriptome mining

Affiliations

Insights into diversity, host range, and evolution of iflaviruses in Lepidoptera through transcriptome mining

Devin van Valkengoed et al. Virus Evol. .

Abstract

Insects are associated with a wide variety of diverse RNA viruses, including iflaviruses, a group of positive stranded RNA viruses that mainly infect arthropods. Whereas some iflaviruses cause severe diseases in insects, numerous iflaviruses detected in healthy populations of butterflies and moths (order: Lepidoptera) do not show apparent symptoms. Compared to other hosts, only few iflavirus genomes for lepidopteran hosts could be found in publicly available databases and we know little about the occurrence of iflaviruses in natural and laboratory lepidopteran populations. To expand the known diversity of iflaviruses in Lepidoptera, we developed a pipeline to automatically reconstruct virus genomes from public transcriptome data. We reconstructed 1548 virus genomes from 55 different lepidopteran species, which were identified as coding-complete based on their length. To include incompletely assembled genomes, we developed a reference-based patching approach, resulting in 240 patched genomes. By including publicly available genomes, we inferred a phylogeny consisting of 139 non-redundant iflavirus genomes. Of these, 65 represent novel complete genomes, of which 39 might even belong to novel virus species. Our analysis expanded virus host range, where highly similar viruses were found in the transcriptomes of different lepidopteran species, genera, or even families. Additionally, we find two groups of lepidopteran species depending on the diversity of viruses that infect them: some species were only infected by closely related viruses, whereas other species are infected by highly diverse viruses from different regions of the phylogeny. Finally, we show that the evolution of one virus species, Iflavirus betaspexiguae, is impacted by recombination within the species, which is also supported by the co-occurrence of multiple strains within the data sets. Our analysis demonstrates how data mining of publicly available sequencing data can be used at a large scale to reconstruct intra-family viral diversity which serves as a basis to study virus host range and evolution. Our results contain numerous novel viruses and novel virus-host associations, including viruses for relevant insect pests, highlighting the impact of iflaviruses in insect ecology and as potential biological control agents in the future.

Keywords: data mining; iflavirus; phylogeny.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflow and data set overview. (A) Computational workflow. The phylogenetic analysis provides the numbers for the Lepidoptera data set. (B) Numbers of processed libraries and included genomes. (C) Overview of included genomes in both data sets. There are two complete and one patched genome missing from the Lepidoptera data set since they are in a cluster with a non-Lepidoptera representative: SRR11614424_complete_1 from host Parnara guttata clusters with UHR49772.1 from host Tetragnatha nitens, SRR9171794_complete_1 from host Heliozelidae sp. clusters with YP_009305421.1 (Moka virus) from host Vespula pensylvanica, ERR10123688_pg_nr1 from host Synanthedon andrenaeformis clusters with AUK50716.1 (Sacbrood virus) from host Apis mellifera.
Figure 2
Figure 2
Iflavirus phylogeny for Lepidoptera data set. The Lepidoptera host species and family is shown in the outer ring; note that the host species is only given for the cluster representative, only host species and families occurring in at least three clusters are coloured. Taxa names are coloured by data type, clades are coloured by iflavirus species, and branches are coloured by bootstrap value. A completely annotated phylogeny can be found at https://itol.embl.de/tree/312010331252431721380571.
Figure 3
Figure 3
Host range. LCA of Lepidoptera hosts (A) within all members of a cluster (also marked as LCA in Fig. 2) and (B) for pairs across all cluster representatives, excluding patched genomes. Pairwise distance is given by 1-amino acid sequence identity of the capsid proteins. The legend provides examples for the LCA of four different species (see colour legend in Fig. 2) based on their presence and absence in a pair. (C) Phylogenetic diversity (PD) of viruses infecting a particular Lepidoptera species (coloured by host species according to Fig. 2).
Figure 4
Figure 4
Genome evolution of Iflavirus betaspexiguae. The polyprotein of YP_009010984.1 and the domains predicted by InterPro are shown. All domains with an IPR ID are shown and domains without an IPR ID are only shown if they do not overlap with other domains. Positions under diversifying selection (P < 0.1, horizontal grey line marks P=0.01) are marked by circles on top of the genome, coloured by how radical the amino acid change is (BLOSUM80 score, i.e. more negative corresponds to more radical) and if the change is in only one genome (grey) or multiple (black). The changes in one genome occur in AHX00961.1 (positions 687, 688, 689, 691, 954, 955, 958, 960), in SRR11822924_complete_1 belonging to ES2 (positions 292, 1353), and the remaining positions are all in different genomes. Recombination breakpoints are listed by squares on the bottom of the genome. Numbers within the square mark the order in which they were found, i.e. number 1 is the most supported. Phylogenies based on codon alignments are shown below the breakpoints. Groups are marked by the country which submitted the data set: SE (Sweden): SRR5464078_complete_1, SRR8269436_complete_1, ES1b: SRR1050534_complete_3, ES1a: SRR1050532_complete_2, SRR1050533_complete_3, ES1: ES1a and ES1b, ES2: SRR11822922_complete_1, SRR11822924_complete_1, ES (Spain): ES1 and ES2, FR (France): SRR13488425_complete_1, NL1a: SRR7415763_complete_2, SRR12002069_complete_1, SRR12002072_complete_1, NL1b: SRR7415760_complete_2, SRR7415766_complete_2, SRR12002067_complete_1, NL1: NL1a and NL1b, NL2: SRR7415761_complete_2, SRR7415762_complete_2, SRR7415767_complete_2, SRR7415768_complete_2, SRR7415771_complete_1, SRR12002063_complete_1, SRR12002066_complete_1, SRR12002073_complete_1, NL (Netherlands): NL1 and NL2, CN (mostly China): all the remaining genomes. Expanded phylogenies can be found in Fig. S4.

References

    1. Amiri E, Meixner MD, Kryger P. Deformed wing virus can be transmitted during natural mating in honey bees and infect the queens. Sci Rep 2016;6:33065. 10.1038/srep33065 - DOI - PMC - PubMed
    1. Barrera G, Simón O, Villamizar L et al. Spodoptera frugiperda multiple nucleopolyhedrovirus as a potential biological insecticide: genetic and phenotypic comparison of field isolates from Colombia. Biol Control 2011;58:113–20. 10.1016/j.biocontrol.2011.04.009 - DOI
    1. Bejerman N, Debat H. Exploring the tymovirales landscape through metatranscriptomics data. Arch Virol 2022;167:1785–803. 10.1007/s00705-022-05493-9 - DOI - PubMed
    1. Breeschoten T, Ros VID, Schranz ME et al. An influential meal: host plant dependent transcriptional variation in the beet armyworm, Spodoptera exigua (Lepidoptera: Noctuidae). BMC Genomics 2019;20:845. 10.1186/s12864-019-6081-7 - DOI - PMC - PubMed
    1. Buchfink B, Reuter K, Drost H-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 2021;18:366–8. 10.1038/s41592-021-01101-x - DOI - PMC - PubMed

LinkOut - more resources