Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 11;25(1):235.
doi: 10.1186/s12859-024-05853-z.

SimSpliceEvol2: alternative splicing-aware simulation of biological sequence evolution and transcript phylogenies

Affiliations

SimSpliceEvol2: alternative splicing-aware simulation of biological sequence evolution and transcript phylogenies

Wend Yam D D Ouedraogo et al. BMC Bioinformatics. .

Abstract

Background: SimSpliceEvol is a tool for simulating the evolution of eukaryotic gene sequences that integrates exon-intron structure evolution as well as the evolution of the sets of transcripts produced from genes. It takes a guide gene tree as input and generates a gene sequence with its transcripts for each node of the tree, from the root to the leaves. However, the sets of transcripts simulated at different nodes of the guide gene tree lack evolutionary connections. Consequently, SimSpliceEvol is not suitable for evaluating methods for transcript phylogeny inference or gene phylogeny inference that rely on transcript conservation.

Results: Here, we introduce SimSpliceEvol2, which, compared to the first version, incorporates an explicit model of transcript evolution for simulating alternative transcripts along the branches of a guide gene tree, as well as the transcript phylogenies inferred. We offer a comprehensive software with a graphical user interface and an updated version of the web server, ensuring easy and user-friendly access to the tool.

Conclusion: SimSpliceEvol2 generates synthetic datasets that are useful for evaluating methods and tools for spliced RNA sequence analysis, such as spliced alignment methods, methods for identifying conserved transcripts, and transcript phylogeny reconstruction methods. The web server is accessible at https://simspliceevol.cobius.usherbrooke.ca , where you can also download the standalone software. Comprehensive documentation for the software is available at the same address. For developers interested in the source code, which requires the installation of all prerequisites to run, it is provided at https://github.com/UdeS-CoBIUS/SimSpliceEvol .

Keywords: Alternative splicing; Evolution; Exon-intron structure; Simulation; Transcript phylogeny.

PubMed Disclaimer

Conflict of interest statement

Not applicable.

Figures

Fig. 1
Fig. 1
Illustration of the transcript evolution simulation framework. The figure depicts the phylogeny resulting from the simulated evolution of transcripts in a guide gene tree. The guide gene tree depicted as 3 cylinders consists in the evolution of two extant genes, Gene2 and Gene3, from an ancestral gene, Gene1. The bottom surfaces of the cylinders represent the two leaves (Gene2 and Gene3) of the guide gene tree and their ancestor (Gene1). The legend at the bottom of the figure shows the meaning for each graphical element. The exon-intron structures of each gene is diplayed, as well as the exon composition of each transcript. The evolution history consists of evolutionary stages. The root nodes of the transcript phylogeny correspond to transcript gains. The values of user input parameters are (tc_rs0.0, tc_tl=0.1, tc_a5=0.1, tc_a3=0.1, tc_es=0.2, tc_me=0.1, and tc_ir=0.1) and constants factors values (k_tc and c_s_r) are given such that k_tc×c_s_r=1 where c_s_r is the length of the branch. For instance, regarding transcripts in Gene2, the number of transcripts undergoing the intron retention event is equal to 1, which corresponds to the ceiling result of k_tc×c_s_r×n×tc_ir, where n=7 represents the number of source transcripts at this particular evolutionary stage ({1#1, 2#0, 2#2, 2#3, 2#4, 2#5, 2#6})
Fig. 2
Fig. 2
Web server screenshot. The screenshot presents two main sections of the web server. The input section, highlighted in red, allows users to set parameter values. Users can launch the program (illustrated by a green arrow) and save the query parameters for future use (illustrated by a purple arrow). The results section, highlighted in blue, displays the available data for download once it is ready, as indicated by the blue arrow
Fig. 3
Fig. 3
SimSpliceEvol2 GUI screenshot. The graphical user interface of SimSpliceEvol2 enables users to browse the file system for selecting the input guide tree file (indicated by the red arrow) and the output directory (indicated by the green arrow). It enables also to set parameter values (indicated by the blue arrow). Upon clicking the “generate command” button, the corresponding command line is produced (indicated by the orange arrow). Users have the option to copy this command for future use by clicking the “copy command” button. Help related to the program or its options is provided at the top of the interface (indicated by the black arrow). The outputs generated from running the simulation (indicated by the purple arrow) correspond to the figures shown in Fig. 4
Fig. 4
Fig. 4
Outputs of SimSpliceEvol2 using the GUI with the default options (k_tc=5, tc_rs=1, tc_es=0.25, tc_me=0.15, tc_a5=0.15, tc_a3=0.15, tc_ir=0.15, tc_tl=0.05). (Top left) Visualization of the simulated transcript phylogeny (multiple transcript trees that form a transcript forest) through a carousel interface within the GUI (figures generated using ETE 3 [25]). (Top right) An exon alignment for all transcripts generated at leaves of the phylogeny is shown alongside the guide gene tree. (Bottom) The software outputs figures that show the multiple sequence alignment of transcripts generated at the leaves of each transcript tree (figure also generated using ETE 3 [25])
Fig. 5
Fig. 5
Workflows of SimSpliceEvol1 and SimSpliceEvol2. The illustration compares SimSpliceEvol1 and SimSpliceEvol2 to emphasize the new transcript evolution model integrated. Changes from SimSpliceEvol1 are indicated in red (with a red minus sign), while additions from SimSpliceEvol2 are depicted in green (with a green plus sign). Tasks retained by both methods are underscored in gray and the data flow is highlighted in blue
Fig. 6
Fig. 6
Transcript level conservation in SimSpliceEvol1 and SimpliceEvol2

Similar articles

Cited by

References

    1. Harrow J, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22(9):1760–1774. doi: 10.1101/gr.135350.111. - DOI - PMC - PubMed
    1. Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, et al. Ensembl 2018. Nucleic Acids Res. 2018;46(D1):D754–D761. doi: 10.1093/nar/gkx1098. - DOI - PMC - PubMed
    1. Keren H, Lev-Maor G, Ast G. Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet. 2010;11(5):345–355. doi: 10.1038/nrg2776. - DOI - PubMed
    1. Guillaudeux N, Belleannée C, Blanquart S. Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog. BMC Genomics. 2022;23(1):1–14. doi: 10.1186/s12864-022-08429-4. - DOI - PMC - PubMed
    1. Ma J, Wu JY, Zhu L. Detection of orthologous exons and isoforms using EGIO. Bioinformatics. 2022;38(19):4474–4480. doi: 10.1093/bioinformatics/btac548. - DOI - PMC - PubMed

LinkOut - more resources