Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 May 17;34(9):2598-606.
doi: 10.1093/nar/gkl274. Print 2006.

Refining multiple sequence alignments with conserved core regions

Affiliations
Comparative Study

Refining multiple sequence alignments with conserved core regions

Saikat Chakrabarti et al. Nucleic Acids Res. .

Abstract

Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/REFINER) and will be incorporated into the next release of the Cn3D structure/alignment viewer.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of the refinement algorithm.
Figure 2
Figure 2
Improvement of alignment score after refinement. The histogram of relative improvement of alignment score (AS, Equation 2) for curated CDD alignments (a) and un-curated PFAM alignments (b) with or without block extension.
Figure 3
Figure 3
Comparison of performance of refinement. Histograms of relative improvement of alignments score achieved by our refinement method (REFINER) over RF algorithm of Wallace et al. (19) refinement package. Alignments are evaluated by both average and LE scores (19). Relative improvement of alignment score is measured as the difference between the final scores after application of REFINER and RF method divided by the final score obtained by RF method.
Figure 4
Figure 4
Effect of block shift on block score. The relative improvement (using structure-based, curated CDD alignments) of score of 25% or higher per block is plotted versus the block shift. The central line in each box shows the median value, the upper and lower boundaries of individual box show the upper and lower quartiles, and the vertical lines extent to a value of 1.5 times the inter quartile range. Outlier values are shown outside the whiskers. Values on top of each box provide the percentage of data points for each block shift bin.
Figure 5
Figure 5
Effect of block extension on alignment score. The relative improvement of alignment score (AS, Equation 2) is shown for each bin of the block extension. Block extension is calculated as the sum of the extended columns for all blocks within a CDD alignment.
Figure 6
Figure 6
Quality control by testing the recovery of FIC. Alignments of FICs are compared before and after the refinement. The automated refinement procedure could reproduce the exact same alignment that was obtained by careful manual curation for most (shown by black box) of the FICs. In addition, majority of the changed FICs show better score (inset, improvement+) when compared against the score derived before the refinement.
Figure 7
Figure 7
Improvement of alignment after refinement. Alignments of Bowman-Birk type proteinase inhibitor (BBI) family (CDD code: cd00023) derived (a) before and (b) after the refinement show marked improvement. Block forming columns are displayed in capitals where functional important residues are boxed in yellow. One of the conserved cysteine sites is shown in red in (b) and (c) where probable misalignments are corrected in number of sequences. (c) Displays backbone representation of the structure of hydrolase inhibitor (pdb code: 1C2A). Functional important sites are marked in yellow and disulfide bonds are shown in green.

References

    1. Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S.R., Griffiths-Jones S., Howe K.L., Marshall M., et al. The Pfam protein families database. Nucleic Acids Res. 2002;30:276–280. - PMC - PubMed
    1. Servant F., Bru C., Carrere S., Courcelle E., Gouzy J., Peyruc D., Kahn D. ProDom: Automated clustering of homologous domains. Brief. Bioinformatics. 2002;3:246–251. - PubMed
    1. Letunic I., Goodstadt L., Dickens N.J., Doerks T., Schultz J., Mott R., Ciccarelli F., Copley R.R., Ponting C.P., Bork P. Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 2002;30:242–244. - PMC - PubMed
    1. Marchler-Bauer A., Panchenko A.R., Shoemaker B.A., Thiessen P.A., Geer L.Y., Bryant S.H. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 2002;30:281–283. - PMC - PubMed
    1. Lipman D.J., Altschul S.F., Kececioglu J.D. A tool for multiple sequence alignment. Proc. Natl Acad. Sci. USA. 1989;86:4412–4415. - PMC - PubMed

Publication types