. 2022 May 16;17(1):10.

doi: 10.1186/s13015-022-00219-7.

Bi-alignments with affine gaps costs

Peter F Stadler^{1

2

3

4

5

6}, Sebastian Will⁷

Affiliations

¹ Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany. studla@bioinf.uni-leipzig.de.
² Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv), and Leipzig Research Center for Civilization Diseases, Universität Leipzig, Augustusplatz 12, 04107, Leipzig, Germany. studla@bioinf.uni-leipzig.de.
³ Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04109, Leipzig, Germany. studla@bioinf.uni-leipzig.de.
⁴ Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, 1090, Vienna, Austria. studla@bioinf.uni-leipzig.de.
⁵ Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Ciudad Universitaria, 111321, Bogotá, D.C., Colombia. studla@bioinf.uni-leipzig.de.
⁶ Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA. studla@bioinf.uni-leipzig.de.
⁷ AMIBio, Laboratoire d'Informatique de l'École Polytechnique (LIX), Institute Polytechnique de Paris (IP Paris), Batiment Turing, 1 rue d'Estienne d'Orves, 91120, Palaiseau, France.

PMID: 35578255
PMCID: PMC9109335
DOI: 10.1186/s13015-022-00219-7

Bi-alignments with affine gaps costs

Peter F Stadler et al. Algorithms Mol Biol. 2022.

. 2022 May 16;17(1):10.

doi: 10.1186/s13015-022-00219-7.

Authors

Peter F Stadler^{1

2

3

4

5

6}, Sebastian Will⁷

Affiliations

¹ Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany. studla@bioinf.uni-leipzig.de.
² Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv), and Leipzig Research Center for Civilization Diseases, Universität Leipzig, Augustusplatz 12, 04107, Leipzig, Germany. studla@bioinf.uni-leipzig.de.
³ Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04109, Leipzig, Germany. studla@bioinf.uni-leipzig.de.
⁴ Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, 1090, Vienna, Austria. studla@bioinf.uni-leipzig.de.
⁵ Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Ciudad Universitaria, 111321, Bogotá, D.C., Colombia. studla@bioinf.uni-leipzig.de.
⁶ Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA. studla@bioinf.uni-leipzig.de.
⁷ AMIBio, Laboratoire d'Informatique de l'École Polytechnique (LIX), Institute Polytechnique de Paris (IP Paris), Batiment Turing, 1 rue d'Estienne d'Orves, 91120, Palaiseau, France.

PMID: 35578255
PMCID: PMC9109335
DOI: 10.1186/s13015-022-00219-7

Abstract

Background: Commonly, sequence and structure elements are assumed to evolve congruently, such that homologous sequence positions correspond to homologous structural features. Assuming congruent evolution, alignments based on sequence and structure similarity can therefore optimize both similarities at the same time in a single alignment. To model incongruent evolution, where sequence and structural features diverge positionally, we recently introduced bi-alignments. This generalization of sequence and structure-based alignments is best understood as alignments of two distinct pairwise alignments of the same entities: one modeling sequence similarity, the other structural similarity.

Results: Optimal bi-alignments with affine gap costs (or affine shift cost) for two constituent alignments can be computed exactly in quartic space and time. Even bi-alignments with affine shift and gap cost, as well as bi-alignment with sub-additive gap cost are optimized efficiently. Affine gap-cost bi-alignment of large proteins ([Formula: see text] aa) can be computed.

Conclusion: Affine cost bi-alignments are of practical interest to study shifts of protein sequences and protein structures relative to each other.

Availability: The affine cost bi-alignment algorithm has been implemented in Python 3 and Cython. It is available as free software from https://github.com/s-will/BiAlign/releases/tag/v0.3 and as bioconda package bialign.

Keywords: Dynamic programming; Multi-tape formal grammar; Recursion; Scoring functions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Two pairwise alignments and a bi-alignment of peptide sequences and their predicted secondary structures (helix red, turn blue, $β$ -sheet green, coil orange). Structure are predicted according to the Chou Fasman method [13] with CFSSP [14]. To facilitate quick visual assessment of sequence alignment quality, sequence mismatches are shown in bold black, sequence indels in non-bold black, and mismatches in dark red. The upper alignment optimizes sequence similarity, and shows the structure out of sync: the helix is moved to left, the last $β$ -sheet is shifted to the right by 1 position. The second alignment maximizes structural similarity and thus shows little sequence similarity. The evolution of the two peptides is explained much better by a bi-alignment (third panel), which supports shift events (marked by rectangles) that can shift either sequence against its structure to the left ( $<$ ) or to the right ( $>$ ). The resulting *regions of shift* are indicated by in general k blue and red lines corresponding to shifts by k positions to the left or to the right. While the shift events shown in this example delete and insert structure of A with respect to both sequences and the structure of B, shift alignments also support as well analogous shifts of sequences and the second structure (which would be shown in the bottom row). In our representation, shift events are the only visible difference between the bi-alignment $A$ in the third panel and the two alignments. Nevertheless, the representation can be mapped to our formalization of bi-alignments as alignments of two constituent alignments $U$ and $V$ : $U$ is obtained from the 2nd and 3rd bi-alignment row by removing the two all-gap columns (i.e. the first and the 3rd-to-last column). The secondary structure alignment $V$ coincides with the 1st and 4th row since there is no column that contains only gaps in these to rows

**Fig. 2**
Alignment (top) and Bi-alignment (bottom) of 145 N-terminal amino acids of two CYP1B1 cytochrome P450 enzymes: the extant human enzyme (Human 1B1) and the corresponding ancestral mammalian cytochrome (N98 1B1_M). See Fig. 1 for the representation of the alignments and secondary structure elements. Only the bi-alignment properly aligns the ’shifted’ fifth helix and explains the structural incongruence by evolutionary shifts (two forward and two backward shifts

**Fig. 3**
Shifts in a bi-alignment. The bi-alignment consists of two alignments $U$ or $V$ (colored horizontal boxes) of the pair of objects $a$ and $b$ that are aligned with each other two different ways i.e. w.r.t. to two different objective functions. Since the actual letters in $a$ and $b$ are irrelevant for definition of shifts, we distinguish only letters (filled circles) and gaps (dashes). Note that $a$ and $b$ may be represented by different alphabets in $U$ and $V$ . Insertions and deletions in the alignment of alignments $W$ , i.e. the alignment of the columns of $U$ with the columns of $W$ , are (highlighted by darker colors) correspond to all-gap columns in either $U$ or $V$ . Aligned columns in $W$ are shifts if the gap patterns in the upper pair and the lower pair differ. Colored outlines distinguish single (blue) and double shifts (red)

**Fig. 4**
The *end column type* of an bi-alignment is defined by the last column of each of the constituent pairwise alignments of $a$ and $b$ that is not an all-gap column

**Fig. 5**
Bi-alignment of the proteins DNA Polymerase I of *Escherichia* (WP_016262675.1) and *Xanthomonas hortorum* (WP_095575020.1). We use the same representation as in Fig. 1

See this image and copyright information in PMC

References

1. Wagner GP. Homology, genes, and evolutionary innovation. Princeton: Princeton Univ. Press; 2014.
1. Hofacker IL, Fekete M, Stadler PF. Secondary structure prediction for aligned RNA sequences. J Mol Biol. 2002;319:1059–1066. doi: 10.1016/S0022-2836(02)00308-X. - DOI - PubMed
1. Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nat Biotechnol. 2012;30:1072–1080. doi: 10.1038/nbt.2419. - DOI - PMC - PubMed
1. Chapman MA, Donaldson IJ, Gilbert J, Grafham D, Rogers J, Green AR, Göttgens B. Analysis of multiple genomic sequence alignments: a web resource, online tools, and lessons learned from analysis of mammalian SCL loci. Genome Res. 2004;14:313–318. doi: 10.1101/gr.1759004. - DOI - PMC - PubMed
1. Hiller M, Findeiß S, Lein S, Marz M, Nickel C, Rose D, Schulz C, Backofen R, Prohaska SJ, Reuter G, Stadler PF. Conserved introns reveal novel transcripts in Drosophila melanogaster. Genome Res. 2009;19:1289–1300. doi: 10.1101/gr.090050.108. - DOI - PMC - PubMed

Grants and funding

SFB 1423 / 421152132/Deutsche Forschungsgemeinschaft

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bi-alignments with affine gaps costs

Affiliations

Bi-alignments with affine gaps costs

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous