Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 16;17(1):10.
doi: 10.1186/s13015-022-00219-7.

Bi-alignments with affine gaps costs

Affiliations

Bi-alignments with affine gaps costs

Peter F Stadler et al. Algorithms Mol Biol. .

Abstract

Background: Commonly, sequence and structure elements are assumed to evolve congruently, such that homologous sequence positions correspond to homologous structural features. Assuming congruent evolution, alignments based on sequence and structure similarity can therefore optimize both similarities at the same time in a single alignment. To model incongruent evolution, where sequence and structural features diverge positionally, we recently introduced bi-alignments. This generalization of sequence and structure-based alignments is best understood as alignments of two distinct pairwise alignments of the same entities: one modeling sequence similarity, the other structural similarity.

Results: Optimal bi-alignments with affine gap costs (or affine shift cost) for two constituent alignments can be computed exactly in quartic space and time. Even bi-alignments with affine shift and gap cost, as well as bi-alignment with sub-additive gap cost are optimized efficiently. Affine gap-cost bi-alignment of large proteins ([Formula: see text] aa) can be computed.

Conclusion: Affine cost bi-alignments are of practical interest to study shifts of protein sequences and protein structures relative to each other.

Availability: The affine cost bi-alignment algorithm has been implemented in Python 3 and Cython. It is available as free software from https://github.com/s-will/BiAlign/releases/tag/v0.3 and as bioconda package bialign.

Keywords: Dynamic programming; Multi-tape formal grammar; Recursion; Scoring functions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Two pairwise alignments and a bi-alignment of peptide sequences and their predicted secondary structures (helix red, turn blue, β-sheet green, coil orange). Structure are predicted according to the Chou Fasman method [13] with CFSSP [14]. To facilitate quick visual assessment of sequence alignment quality, sequence mismatches are shown in bold black, sequence indels in non-bold black, and mismatches in dark red. The upper alignment optimizes sequence similarity, and shows the structure out of sync: the helix is moved to left, the last β-sheet is shifted to the right by 1 position. The second alignment maximizes structural similarity and thus shows little sequence similarity. The evolution of the two peptides is explained much better by a bi-alignment (third panel), which supports shift events (marked by rectangles) that can shift either sequence against its structure to the left (<) or to the right (>). The resulting regions of shift are indicated by in general k blue and red lines corresponding to shifts by k positions to the left or to the right. While the shift events shown in this example delete and insert structure of A with respect to both sequences and the structure of B, shift alignments also support as well analogous shifts of sequences and the second structure (which would be shown in the bottom row). In our representation, shift events are the only visible difference between the bi-alignment A in the third panel and the two alignments. Nevertheless, the representation can be mapped to our formalization of bi-alignments as alignments of two constituent alignments U and V: U is obtained from the 2nd and 3rd bi-alignment row by removing the two all-gap columns (i.e. the first and the 3rd-to-last column). The secondary structure alignment V coincides with the 1st and 4th row since there is no column that contains only gaps in these to rows
Fig. 2
Fig. 2
Alignment (top) and Bi-alignment (bottom) of 145 N-terminal amino acids of two CYP1B1 cytochrome P450 enzymes: the extant human enzyme (Human 1B1) and the corresponding ancestral mammalian cytochrome (N98 1B1_M). See Fig. 1 for the representation of the alignments and secondary structure elements. Only the bi-alignment properly aligns the ’shifted’ fifth helix and explains the structural incongruence by evolutionary shifts (two forward and two backward shifts
Fig. 3
Fig. 3
Shifts in a bi-alignment. The bi-alignment consists of two alignments U or V (colored horizontal boxes) of the pair of objects a and b that are aligned with each other two different ways i.e. w.r.t. to two different objective functions. Since the actual letters in a and b are irrelevant for definition of shifts, we distinguish only letters (filled circles) and gaps (dashes). Note that a and b may be represented by different alphabets in U and V. Insertions and deletions in the alignment of alignments W, i.e. the alignment of the columns of U with the columns of W, are (highlighted by darker colors) correspond to all-gap columns in either U or V. Aligned columns in W are shifts if the gap patterns in the upper pair and the lower pair differ. Colored outlines distinguish single (blue) and double shifts (red)
Fig. 4
Fig. 4
The end column type of an bi-alignment is defined by the last column of each of the constituent pairwise alignments of a and b that is not an all-gap column
Fig. 5
Fig. 5
Bi-alignment of the proteins DNA Polymerase I of Escherichia (WP_016262675.1) and Xanthomonas hortorum (WP_095575020.1). We use the same representation as in Fig. 1

References

    1. Wagner GP. Homology, genes, and evolutionary innovation. Princeton: Princeton Univ. Press; 2014.
    1. Hofacker IL, Fekete M, Stadler PF. Secondary structure prediction for aligned RNA sequences. J Mol Biol. 2002;319:1059–1066. doi: 10.1016/S0022-2836(02)00308-X. - DOI - PubMed
    1. Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nat Biotechnol. 2012;30:1072–1080. doi: 10.1038/nbt.2419. - DOI - PMC - PubMed
    1. Chapman MA, Donaldson IJ, Gilbert J, Grafham D, Rogers J, Green AR, Göttgens B. Analysis of multiple genomic sequence alignments: a web resource, online tools, and lessons learned from analysis of mammalian SCL loci. Genome Res. 2004;14:313–318. doi: 10.1101/gr.1759004. - DOI - PMC - PubMed
    1. Hiller M, Findeiß S, Lein S, Marz M, Nickel C, Rose D, Schulz C, Backofen R, Prohaska SJ, Reuter G, Stadler PF. Conserved introns reveal novel transcripts in Drosophila melanogaster. Genome Res. 2009;19:1289–1300. doi: 10.1101/gr.090050.108. - DOI - PMC - PubMed

LinkOut - more resources