Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 2:5:24.
doi: 10.1186/1748-7188-5-24.

AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

Affiliations

AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

Darío Guerrero et al. Algorithms Mol Biol. .

Abstract

Background: Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses.

Results: AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly".

Conclusions: AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The AlignMiner algorithm. (A) Flow diagram of the main components of the algorithm, as explained in the text; the bold boxes are detalied in B. (B) The details of how a divergent region is obtained using a given scoring method. The "score calculation" renders a single numeric value for each MSA column. "FFT" is a fast Fourier transform for smoothing the curve of raw scores. The original (left branch) and Fourier-transformed (right branch) curves are trimmed with their respective "cutoffs" in order to obtain putative SNPs and conserved/divergent regions, respectively. The bold dashed boxes are detailed in C. (C) Details of the determination of the final cutoffs used for trimming scores and providing the validated conserved/divergent regions.
Figure 2
Figure 2
Execution time versus number of nucleotides in the MSA, excluding delays due to the queue system. The upper panel represents the time taken when MSA length increases for a given number of sequences. The lower panel (solid line) represents the time taken when MSA length is kept constant while the number of sequences is increased. The number of nucleotides in each case is a simple multiplication of MSA length by the number of sequences.
Figure 3
Figure 3
Distribution of the percentage of divergent regions by alignment and as a total average for nucleotide (A) or amino acid (B) sequences identified with AlignMiner. Names of the MSAs are explained in Table 1. MultAlin and M-Coffee were used to obtain the input MSAs. SEM, standard error of the mean.
Figure 4
Figure 4
Distribution of the divergent region percentages by length for DNA (A) or protein (B) MSAs identified with AlignMiner. Names of the MSAs are explained in Table 1. MultAlin and M-Coffee were used to obtain the input MSAs. DR, divergent region; bp, base pairs; aas, amino acids.
Figure 5
Figure 5
Distribution of score values of the divergent regions using the three scoring methods (Entropy, Variability or Weighting) in the five protein MSAs, obtained with M-Coffee.
Figure 6
Figure 6
Use of AlignMiner for designing several specific primer pairs for PCR amplification of the different isoforms of the AtGS1 nucleotide sequence (A) The 5' and 3' divergent regions obtained with Entropy that were selected for primer design including the characteristic parameters of each region. (B) Results of the in silico "PCR amplification" with BioPHP [34] using the different primer pairs. Note that the actual 3' primers are complementary to the sequences shown on the right.
Figure 7
Figure 7
Correlation between the most divergent amino acid sequences and antigenicity of the AtGS1 protein MSA. (A) Similarity plot obtained using the Entropy method; the most divergent regions being are highlighted. (B) Aligned sequences for the two divergent regions together (underlined in black) and their score in relation to other divergent regions. (C) Localisation of each divergent region in the alignment where: (i) nucleotides in bold are the predicted epitopes for B-cells; (ii) an "e" denotes predicted solvent accessibility for this position; and (iii) red-boxed amino acids correspond to the sequence of the matching divergent region. It is clearly seen that divergent sequences overlap with the predicted epitopes and the solvent-accessible amino acids.
Figure 8
Figure 8
Analysis of two Pinus pinaster gene isoforms. The specific primer pair for the photosynthetic isoform is identified by a "P" and for the non-photosynthetic isoform by an "N" (A) Predicted sequence and properties of the two primer pairs designed for specific identification of each isoform. (B) PCR analysis using the previously-predicted primers. Table 2 includes the expected amplicon size using these primer pairs. The template in the different lanes is: cDNA for the photosynthetic isoform (lanes 1), cDNA for the non-photosynthetic isoform (lanes 2), cDNA synthesised from total mRNA extracted from Pinus pinaster (lanes 3), Pinus pinaster genomic DNA (lanes 4), and negative controls (lines 5), which do not contain any DNA. Lanes M are molecular weight markers (vector pFL61 digested with Hpa II). Arrows indicate the specific amplification bands. DNA sizes are given in base pairs.

Similar articles

Cited by

References

    1. Capra JA, Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics. 2007;23(15):1875–1882. doi: 10.1093/bioinformatics/btm270. - DOI - PubMed
    1. Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H. An approach of orthology detection from homologous sequences under minimum evolution. Nucleic Acids Res. 2008;36(17):e110. doi: 10.1093/nar/gkn485. - DOI - PMC - PubMed
    1. Merkl R, Zwick M. H2r: Identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics. 2008;9:151. doi: 10.1186/1471-2105-9-151. - DOI - PMC - PubMed
    1. Kemena K, Notredame C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics. 2009;25(19):2455–2465. doi: 10.1093/bioinformatics/btp452. - DOI - PMC - PubMed
    1. Czechowski T, Bari RP, Stitt M, Scheible W, Udvardi MK. Real-time RT-PCR profiling of over 1400 Arabidospis transcription factors: unprecedented sensity reveals novel root- and shoot-specific genes. Plant J. 2004;38:366–379. doi: 10.1111/j.1365-313X.2004.02051.x. - DOI - PubMed