Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 20:5:50.
doi: 10.1186/1756-0500-5-50.

SSE: a nucleotide and amino acid sequence analysis platform

Affiliations

SSE: a nucleotide and amino acid sequence analysis platform

Peter Simmonds. BMC Res Notes. .

Abstract

Background: There is an increasing need to develop bioinformatic tools to organise and analyse the rapidly growing amount of nucleotide and amino acid sequence data in organisms ranging from viruses to eukaryotes.

Finding: A simple sequence editor (SSE) was developed to create an integrated environment where sequences can be aligned, annotated, classified and directly analysed by a number of built-in bioinformatic programs. SSE incorporates a sequence editor for the creation of sequence alignments, a process assisted by integrated CLUSTAL/MUSCLE alignment programs and automated removal of indels. Sequences can be fully annotated and classified into groups and annotated of sequences and sequence groups and access to analytical programs that analyse diversity, recombination and RNA secondary structure. Methods for analysing sequence diversity include measures of divergence and evolutionary distances, identity plots to detect regions of nucleotide or amino acid homology, reconstruction of sequence changes, mono-, di- and higher order nucleotide compositional biases and codon usage.Association Index calculations, GroupScans, Bootscanning and TreeOrder scans perform phylogenetic analyses that reconcile group membership with tree branching orders and provide powerful methods for examining segregation of alleles and detection of recombination events. Phylogeny changes across alignments and scoring of branching order differences between trees using the Robinson-Fould algorithm allow effective visualisation of the sites of recombination events.RNA secondary and tertiary structures play important roles in gene expression and RNA virus replication. For the latter, persistence of infection is additionally associated with pervasive RNA secondary structure throughout viral genomic RNA that modulates interactions with innate cell defences. SSE provides several programs to scan alignments for RNA secondary structure through folding energy thermodynamic calculations and phylogenetic methods (detection of co-variant changes, and structure conservation between divergent sequences). These analyses complement methods based on detection of sequence constraints, such as suppression of synonymous site variability.For each program, results can be plotted in real time during analysis through an integrated graphics package, providing publication quality graphs. Results can be also directed to tabulated datafiles for import into spreadsheet or database programs for further analysis.

Conclusions: SSE combines sequence editor functions with analytical tools in a comprehensive and user-friendly package that assists considerably in bioinformatic and evolution research.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The basic editor screen for direct alignment editing, GUI controls, context and window menus. The main editing screen showing translated nucleotide sequences of hepatitis C virus (HCV) and options available on a context menu pointing to the selected sequence block in the centre of the alignment.
Figure 2
Figure 2
Research menu options for sequence analysis. Menu for selection of sequence analysis (divergence, homology, composition), phylogeny (grouping, phylogeny violation detection), structure (RNA secondary structure, covariation) methods and sequence manipulation programs (mutation and sequence order randomization) programs. Program access, command descriptions, brief summaries, summary instructions (buttons on right) and detailed descriptions from a context-sensitive helpfile) are directly available from this menu.
Figure 3
Figure 3
Program-generated output of sequence divergence scan of uncorrected HCV amino acid p distances. Program output directly generated from integrated graphics package showing scan of mean translated amino acid sequence divergence between different groups across the HCV genome. Sequences were assigned into three different tag groups based on HCV subtype (15 sequences of HCV-1a, 27 HCV-1b sequences) or genotype (25 HCV-2 sequences).
Figure 4
Figure 4
Ordering of sequence groups and phylogeny violation in different regions of the enterovirus (EV) species c genome. A) Ordering of sequences from the three poliovirus serotypes (PV1 - PV3) and other enterovirus species serotypes (EV-C) assigned to different tag groups in phylogenetic trees constructed from 250 base sequence fragments sequentially generated across the viral genome alignment. Segregation by serotype occurs only in the VP4-VP1 (capsid encoding region) and is disrupted by recombination events elsewhere [14]. (B) Phylogenetic compatibility matrix of the same dataset show frequency of inter-group phylogeny violations between trees generated from different genome regions.

References

    1. Higgins DG, Sharp PM. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene. 1988;73:237–244. doi: 10.1016/0378-1119(88)90330-7. - DOI - PubMed
    1. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. - DOI - PMC - PubMed
    1. Li WH, Graur D. Fundamentals of molecular evolution. Sinaur Associates, Inc; 1991.
    1. Sharp PM, Li WH. An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol. 1986;24:28–38. doi: 10.1007/BF02099948. - DOI - PubMed
    1. Wright F. The 'effective number of codons' used in a gene. Gene. 1990;87:23–29. doi: 10.1016/0378-1119(90)90491-9. - DOI - PubMed

LinkOut - more resources