Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 2;13(11):2005.
doi: 10.3390/genes13112005.

Heuristic Pairwise Alignment in Database Environments

Affiliations

Heuristic Pairwise Alignment in Database Environments

Panna Lipták et al. Genes (Basel). .

Abstract

Biological data have gained wider recognition during the last few years, although managing and processing these data in an efficient way remains a challenge in many areas. Increasingly, more DNA sequence databases can be accessed; however, most algorithms on these sequences are performed outside of the database with different bioinformatics software. In this article, we propose a novel approach for the comparative analysis of sequences, thereby defining heuristic pairwise alignment inside the database environment. This method takes advantage of the benefits provided by the database management system and presents a way to exploit similarities in data sets to quicken the alignment algorithm. We work with the column-oriented MonetDB, and we further discuss the key benefits of this database system in relation to our proposed heuristic approach.

Keywords: bioinformatics; database systems; pairwise alignment.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Flowchart of the GLASS steps.
Figure 2
Figure 2
Flowchart of our heuristic algorithm steps.
Figure 3
Figure 3
Example of inserting subalignments of length 4 into a sequence.
Figure 4
Figure 4
The tables used in the heuristic pairwise alignment.
Figure 5
Figure 5
Standard deviation (y axis) depending on the size of the sample set (x axis) and length of the k-mers in the case of fixed gap penalty: (a) k = 5; (b) k = 7; (c) k = 10; and (d) k = 15.
Figure 6
Figure 6
Standard deviation (y axis) depending on the size of the sample set (x axis) and length of the k-mers in the case of affine gap penalty: (a) k = 7; (b) k = 10; (c) k = 5; and (d) k = 15.
Figure 7
Figure 7
Utilization rate (y axis) depending on the size of the sample set (x axis) and length of the k-mers in the case of fixed gap penalty: (a) k = 5; (b) k = 7; (c) k = 10; and (d) k = 15.
Figure 8
Figure 8
Utilization rate (y axis) depending on the size of the sample set (x axis) and length of the k-mers in the case of affine gap penalty: (a) k = 5; (b) k = 7; (c) k = 10; and (d) k = 15.
Figure 9
Figure 9
Exact match rate (y axis) depending on the size of the sample set (x axis) and length of the k-mers in the case of fixed gap penalty: (a) k = 5; (b) k = 7; (c) k = 10; and (d) k = 15.
Figure 10
Figure 10
Exact match rate (y axis) depending on the size of the sample set (x axis) and length of the k-mers in the case of affine gap penalty: (a) k = 5; (b) k = 7; (c) k = 10; and (d) k = 15.

References

    1. Can T. miRNomics: MicroRNA Biology and Computational Analysis. Springer; Berlin/Heidelberg, Germany: 2014. Introduction to bioinformatics; pp. 51–71.
    1. Bentley D.R. The Human Genome Project—An Overview. Med. Res. Rev. 2000;20:189–196. doi: 10.1002/(SICI)1098-1128(200005)20:3<189::AID-MED2>3.0.CO;2-#. - DOI - PubMed
    1. Ruffalo M., LaFramboise T., Koyutürk M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 2011;27:2790–2796. doi: 10.1093/bioinformatics/btr477. - DOI - PubMed
    1. Needleman S.B., Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48:443–453. doi: 10.1016/0022-2836(70)90057-4. - DOI - PubMed
    1. Gotoh O. An improved algorithm for matching biological sequences. J. Mol. Biol. 1982;162:705–708. doi: 10.1016/0022-2836(82)90398-9. - DOI - PubMed

Publication types

LinkOut - more resources