Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 21;12(3):e0174250.
doi: 10.1371/journal.pone.0174250. eCollection 2017.

IsoSel: Protein Isoform Selector for phylogenetic reconstructions

Affiliations

IsoSel: Protein Isoform Selector for phylogenetic reconstructions

Héloïse Philippon et al. PLoS One. .

Abstract

The reliability of molecular phylogenies is strongly dependent on the quality of the assembled datasets. In the case of eukaryotes, the selection of only one protein isoform per genomic locus is mandatory to avoid biases linked to redundancy. Here, we present IsoSel, a tool devoted to the selection of alternative isoforms in the context of phylogenetic reconstruction. It provides a better alternative to the widely used approach consisting in the selection of the longest isoforms and it performs better than Guidance, its only available counterpart. IsoSel is publicly available at http://doua.prabi.fr/software/isosel.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. IsoSel workflow.
Schematic representation of the different steps performed during an IsoSel run. T-x represent alternatives isoforms generated by a same gene x. In this example, isoforms a1 and b2 are selected for the genes a and b, respectively.
Fig 2
Fig 2. SP score computation.
Example of score computation for four genes (a, b, c and d) producing three (a), two (b) and no (c and d) alternative isoforms.
Fig 3
Fig 3. Input and output files.
IsoSel minimal input requirement is an unaligned protein sequence dataset in Fasta format (example.fasta). The two output files generated contain the alignment (output.aln) and the sequences scores (output.scores or output.DistanceScore if the -DS option is used). Optionally, the user can provide a file containing the genomic origin of the input sequences (isoforms_locus_tag.txt). In this case, an additional file containing, for each locus, the sequence having the highest score is created (output_filtered.fasta).
Fig 4
Fig 4. Workflow used for testing IsoSel performances.
For a given human protein from UniProtKB, a BLASTP search is performed. The alternative isoforms detected for each set of homologs are then selected using either the longest isoform, a random choice, Guidance or IsoSel. Then the sets are aligned and the corresponding gene trees are inferred by BioNJ and IQ-TREE for computing tree lengths and DL scores, respectively. For each step, algorithms used are indicated in red.
Fig 5
Fig 5. Tree lengths and DL scores distributions.
Charts are proportional to the number of: A) the shortest trees; and B) the trees with the lower DL score obtained with the different options and programs. Charts in shades of blue correspond to the different IsoSel options.
Fig 6
Fig 6. Maximum likelihood trees for WDR18 protein.
Isoform selection was done by selecting the longest isoform (A) and by running IsoSel with its default parameters (B). Sequences are colored according to their taxonomic classification. Green and yellow circles correspond to nodes with SH > 0.95 and SH > 0.90, respectively. The scale bar represents the average number of substitutions per site.

References

    1. Barbazuk WB, Fu Y, McGinnis KM (2008) Genome-wide analyses of alternative splicing in plants: opportunities and challenges. Genome Res 18:1381–1392. 10.1101/gr.053678.106 - DOI - PubMed
    1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C et al. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–476. 10.1038/nature07509 - DOI - PMC - PubMed
    1. Löytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci USA 102:10557–10562. 10.1073/pnas.0409137102 - DOI - PMC - PubMed
    1. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552. 10.1093/oxfordjournals.molbev.a026334 - DOI - PubMed
    1. Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10:210 10.1186/1471-2148-10-210 - DOI - PMC - PubMed

Substances

LinkOut - more resources