. 2010 Jun 2:5:24.

doi: 10.1186/1748-7188-5-24.

AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

Darío Guerrero¹, Rocío Bautista, David P Villalobos, Francisco R Cantón, M Gonzalo Claros

Affiliations

PMID: 20525162
PMCID: PMC2902484
DOI: 10.1186/1748-7188-5-24

AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

Darío Guerrero et al. Algorithms Mol Biol. 2010.

. 2010 Jun 2:5:24.

doi: 10.1186/1748-7188-5-24.

Authors

Darío Guerrero¹, Rocío Bautista, David P Villalobos, Francisco R Cantón, M Gonzalo Claros

Affiliation

¹ Plataforma Andaluza de Bioinformática (Universidad de Málaga), Severo Ochoa, 34, 29590 Málaga, Spain. claros@uma.es.

PMID: 20525162
PMCID: PMC2902484
DOI: 10.1186/1748-7188-5-24

Abstract

Background: Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses.

Results: AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly".

Conclusions: AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer.

PubMed Disclaimer

Figures

**Figure 1**
**The AlignMiner algorithm**. (A) Flow diagram of the main components of the algorithm, as explained in the text; the bold boxes are detalied in B. (B) The details of how a divergent region is obtained using a given scoring method. The "score calculation" renders a single numeric value for each MSA column. "FFT" is a fast Fourier transform for smoothing the curve of raw scores. The original (left branch) and Fourier-transformed (right branch) curves are trimmed with their respective "cutoffs" in order to obtain putative SNPs and conserved/divergent regions, respectively. The bold dashed boxes are detailed in C. (C) Details of the determination of the final cutoffs used for trimming scores and providing the validated conserved/divergent regions.

**Figure 2**
**Execution time versus number of nucleotides in the MSA, excluding delays due to the queue system**. The upper panel represents the time taken when MSA length increases for a given number of sequences. The lower panel (solid line) represents the time taken when MSA length is kept constant while the number of sequences is increased. The number of nucleotides in each case is a simple multiplication of MSA length by the number of sequences.

**Figure 3**
**Distribution of the percentage of divergent regions by alignment and as a total average for nucleotide (A) or amino acid (B) sequences identified with AlignMiner**. Names of the MSAs are explained in Table 1. MultAlin and M-Coffee were used to obtain the input MSAs. SEM, standard error of the mean.

**Figure 4**
**Distribution of the divergent region percentages by length for DNA (A) or protein (B) MSAs identified with AlignMiner**. Names of the MSAs are explained in Table 1. MultAlin and M-Coffee were used to obtain the input MSAs. DR, divergent region; bp, base pairs; aas, amino acids.

**Figure 5**
**Distribution of score values of the divergent regions using the three scoring methods** (Entropy, Variability or Weighting) **in the five protein MSAs, obtained with M-Coffee**.

**Figure 6**
**Use of AlignMiner for designing several specific primer pairs for PCR amplification of the different isoforms of the AtGS1 nucleotide sequence** (A) The 5' and 3' divergent regions obtained with Entropy that were selected for primer design including the characteristic parameters of each region. (B) Results of the *in silico* "PCR amplification" with BioPHP [34] using the different primer pairs. Note that the actual 3' primers are complementary to the sequences shown on the right.

**Figure 7**
**Correlation between the most divergent amino acid sequences and antigenicity of the AtGS1 protein MSA**. (A) Similarity plot obtained using the Entropy method; the most divergent regions being are highlighted. (B) Aligned sequences for the two divergent regions together (underlined in black) and their score in relation to other divergent regions. (C) Localisation of each divergent region in the alignment where: (i) nucleotides in bold are the predicted epitopes for B-cells; (ii) an "e" denotes predicted solvent accessibility for this position; and (iii) red-boxed amino acids correspond to the sequence of the matching divergent region. It is clearly seen that divergent sequences overlap with the predicted epitopes and the solvent-accessible amino acids.

**Figure 8**
**Analysis of two *Pinus pinaster* gene isoforms. The specific primer pair for the photosynthetic isoform is identified by a "P" and for the non-photosynthetic isoform by an "N"** (A) Predicted sequence and properties of the two primer pairs designed for specific identification of each isoform. (B) PCR analysis using the previously-predicted primers. Table 2 includes the expected amplicon size using these primer pairs. The template in the different lanes is: cDNA for the photosynthetic isoform (lanes 1), cDNA for the non-photosynthetic isoform (lanes 2), cDNA synthesised from total mRNA extracted from *Pinus pinaster* (lanes 3), *Pinus pinaster* genomic DNA (lanes 4), and negative controls (lines 5), which do not contain any DNA. Lanes M are molecular weight markers (vector pFL61 digested with *Hpa* II). Arrows indicate the specific amplification bands. DNA sizes are given in base pairs.

See this image and copyright information in PMC

Cited by

Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons.
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Oliveira LS, et al. Viruses. 2023 Feb 13;15(2):519. doi: 10.3390/v15020519. Viruses. 2023. PMID: 36851733 Free PMC article.
EuroPineDB: a high-coverage web database for maritime pine transcriptome.
Fernández-Pozo N, Canales J, Guerrero-Fernández D, Villalobos DP, Díaz-Moreno SM, Bautista R, Flores-Monterroso A, Guevara MÁ, Perdiguero P, Collada C, Cervera MT, Soto A, Ordás R, Cantón FR, Avila C, Cánovas FM, Claros MG. Fernández-Pozo N, et al. BMC Genomics. 2011 Jul 15;12:366. doi: 10.1186/1471-2164-12-366. BMC Genomics. 2011. PMID: 21762488 Free PMC article.
ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome.
Carmona R, Zafra A, Seoane P, Castro AJ, Guerrero-Fernández D, Castillo-Castillo T, Medina-García A, Cánovas FM, Aldana-Montes JF, Navas-Delgado I, Alché Jde D, Claros MG. Carmona R, et al. Front Plant Sci. 2015 Aug 11;6:625. doi: 10.3389/fpls.2015.00625. eCollection 2015. Front Plant Sci. 2015. PMID: 26322066 Free PMC article.

References

1. Capra JA, Singh M. Predicting functionally important residues from sequence conservation. Bioinformatics. 2007;23(15):1875–1882. doi: 10.1093/bioinformatics/btm270. - DOI - PubMed
1. Kim KM, Sung S, Caetano-Anollés G, Han JY, Kim H. An approach of orthology detection from homologous sequences under minimum evolution. Nucleic Acids Res. 2008;36(17):e110. doi: 10.1093/nar/gkn485. - DOI - PMC - PubMed
1. Merkl R, Zwick M. H2r: Identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics. 2008;9:151. doi: 10.1186/1471-2105-9-151. - DOI - PMC - PubMed
1. Kemena K, Notredame C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics. 2009;25(19):2455–2465. doi: 10.1093/bioinformatics/btp452. - DOI - PMC - PubMed
1. Czechowski T, Bari RP, Stitt M, Scheible W, Udvardi MK. Real-time RT-PCR profiling of over 1400 Arabidospis transcription factors: unprecedented sensity reveals novel root- and shoot-specific genes. Plant J. 2004;38:366–379. doi: 10.1111/j.1365-313X.2004.02051.x. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

Affiliation

AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous