Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 19;26(1):161.
doi: 10.1186/s12864-025-11338-x.

Gene novelty and gene family expansion in the early evolution of Lepidoptera

Affiliations

Gene novelty and gene family expansion in the early evolution of Lepidoptera

Asia E Hoile et al. BMC Genomics. .

Abstract

Background: Almost 10% of all known animal species belong to Lepidoptera: moths and butterflies. To understand how this incredible diversity evolved we assess the role of gene gain in driving early lepidopteran evolution. Here, we compared the complete genomes of 115 insect species, including 99 Lepidoptera, to search for novel genes coincident with the emergence of Lepidoptera.

Results: We find 217 orthogroups or gene families which emerged on the branch leading to Lepidoptera; of these 177 likely arose by gene duplication followed by extensive sequence divergence, 2 are candidates for origin by horizontal gene transfer, and 38 have no known homology outside of Lepidoptera and possibly arose via de novo gene genesis. We focus on two new gene families that are conserved across all lepidopteran species and underwent extensive duplication, suggesting important roles in lepidopteran biology. One encodes a family of sugar and ion transporter molecules, potentially involved in the evolution of diverse feeding behaviours in early Lepidoptera. The second encodes a family of unusual propeller-shaped proteins that likely originated by horizontal gene transfer from Spiroplasma bacteria; we name these the Lepidoptera propellin genes.

Conclusion: We provide the first insights into the role of genetic novelty in the early evolution of Lepidoptera. This gives new insight into the rate of gene gain during the evolution of the order as well as providing context on the likely mechanisms of origin. We describe examples of new genes which were retained and duplicated further in all lepidopteran species, suggesting their importance in Lepidoptera evolution.

Keywords: Gene duplication; Genome evolution; HGT; Insect evolution.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Molecular phylogenetic tree of the 99 lepidopteran species from 24 families and 16 outgroup species inferred from 25 single-copy orthologues. Branches are coloured by insect order; species belonging to the named lepidopteran families are labelled with black lines on the outside of the tree
Fig. 2
Fig. 2
A Species tree showing numbers of orthogroups gained at each phylogenetic node. Insect orders are separated by colours. Bar chart to the right of the tree displays the total number of orthogroups identified in each species. B Pie charts show the number of orthogroups originating at the Lepidoptera and Ditrysia nodes and proportions of the putative modes of new gene origin
Fig. 3
Fig. 3
Copy number of genes gained on the ancestral node of Lepidoptera. A Heatmap (right) showing gene copy number for each orthogroup originating at the Lepidoptera node mapped to the species tree (left). Lepidoptera node is labelled with a blue circle. Orthogroups on the right-hand side of the figure have genes present in few species and may include spurious homologies. B Boxplots showing copy number variation per species in the top 6 orthogroups present in all or most lepidopteran species. Blue broken line signifies the mean copy number per species for all orthogroups. Orthogroups OG0000164 and OG0000175 have a mean copy number significantly different from the mean copy number of lepidopteran orthogroups, as signified by an asterisk (p < 0.05). C Table showing functions and features from six orthogroups deviating above the average copy number per orthogroup
Fig. 4
Fig. 4
Origins and evolution of lepidopteran and ditrysian-specific sugar transporter genes. Orthogroups are identified as follows: a—OG0000700, b—OG0000319, c—OG0007512, d—OG0001801, e—OG0000540, f—OG0008208, g—OG0008344, h—OG0000164, i—OG000840 (A) Species tree on the left is coloured by taxonomic group, with the Lepidoptera and Ditrysia nodes labelled. The numbered ancestral nodes correlate to the node of origin for the orthogroups shown in the gene tree (part B). The grey bar chart (middle) shows that Lepidoptera (darker grey bars) have a higher total number of sugar transporter orthogroups compared to outgroup species. Copy number of lepidopteran and ditrysian-specific orthogroups varies greatly (heatmap, right): SLC22 transporters are below the light and dark brown bars (labelled a-d); SLC2 transporters are below the light and dark blue/green bars (labelled with letters e-i). B Phylogenetic tree built using a representative sample of outgroup sugar transporters, combined with sugar transporters identified in the Lepidoptera. SLC2 transporters are highlighted in light and dark blue/green along with letters e-i, while SLC22 transporters are highlighted in light and dark brown with letters a-d. Tip colours represent the node of origin (as shown in the species tree in part A) for each orthogroup. C The four closely related SLC2 transporters were mapped to a selection of lepidopteran chromosomes (left to right: Micropterix aruncella, Tinea trinotella, Tinea semifulvella, Papilio machaon and Autographa gamma). Sugar transporters of lepidopteran origin are represented by a triangle, while those of ditrysian origin are represented by a square. All four transporter orthogroups group in close physical proximity, on the same chromosome. D Heatmap of expression patterns of nine lepidopteran/ditrysian originating sugar transporters in Bombyx mori tissues
Fig. 5
Fig. 5
Lepidoptera-specific genes encoding proteins with sequence identity and structural similarity to bacterial 6-bladed propeller proteins. A Gene tree of propellin and putative bacterial homologs. The Lepidoptera clade (blue) and Spiroplasma clades (red) are labelled with coloured boxes and text. All other branches represent a range of bacterial species which are shown in Supplementary Figure S3. Molecular phylogenetic analysis indicates the propellin genes of Lepidoptera are monophyletic, whose most closely branching lineages are Spiroplasma genes, and sister group to a clade dominated by Spiroplasma genes (highlighted in red). B AlphaFold predictions suggest lepidopteran propellin proteins form 6-bladed propeller structures similar to bacterial homologues; examples shown from Macrococcus (green), Spiroplasma (red) and the lepidopteran M. sexta (blue). Additional protein structure predictions in Supplementary Figure S4. C Molecular phylogenetic analysis indicates that the largest orthogroup of lepidopteran propellin genes divides into 6 clades, each gene (purple) located at a different chromosomal location, most of which show conserved synteny between lepidopteran species (synteny indicated by shaded purple regions). The Micropterigidae species M. aruncella only has a gene in clades 1. Marker genes are shown by various colours

Similar articles

References

    1. Carroll SB, Gates J, Keys DN, Paddock SW, Panganiban GE, Selegue JE, et al. Pattern formation and eyespot determination in butterfly wings. Science. 1994;265:109–14. - PubMed
    1. Wucherpfennig JI, Howes TR, Au JN, Au EH, Roberts Kingman GA, Brady SD, et al. Evolution of stickleback spines through independent cis-regulatory changes at HOXDB. Nat Ecol Evol. 2022;6:1537–52. - PMC - PubMed
    1. Tian S, Asano Y, Banerjee TD, Wee JLQ, Lamb A, Wang Y, et al. A micro-RNA is the effector gene of a classic evolutionary hotspot locus. bioRxivorg. 2024;:2024.02.09.579741.
    1. Livraghi L, Hanly JJ, Evans E, Wright CJ, Loh LS, Mazo-Vargas A, et al. A long noncoding RNA at the cortex locus controls adaptive coloration in butterflies. Proc Natl Acad Sci U S A. 2024;121: e2403326121. - PMC - PubMed
    1. Hoekstra HE, Coyne JA. The locus of evolution: evo devo and the genetics of adaptation: The locus of evolution. Evolution. 2007;61:995–1016. - PubMed

Substances

LinkOut - more resources