Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 12;3(1):vbad108.
doi: 10.1093/bioadv/vbad108. eCollection 2023.

Seedability: optimizing alignment parameters for sensitive sequence comparison

Affiliations

Seedability: optimizing alignment parameters for sensitive sequence comparison

Lorraine A K Ayad et al. Bioinform Adv. .

Abstract

Motivation: Most sequence alignment techniques make use of exact k-mer hits, called seeds, as anchors to optimize alignment speed. A large number of bioinformatics tools employing seed-based alignment techniques, such as Minimap2, use a single value of k per sequencing technology, without a strong guarantee that this is the best possible value. Given the ubiquity of sequence alignment, identifying values of k that lead to more sensitive alignments is thus an important task. To aid this, we present Seedability, a seed-based alignment framework designed for estimating an optimal seed k-mer length (as well as a minimal number of shared seeds) based on a given alignment identity threshold. In particular, we were motivated to make Minimap2 more sensitive in the pairwise alignment of short sequences.

Results: The experimental results herein show improved alignments of short and divergent sequences when using the parameter values determined by Seedability in comparison to the default values of Minimap2. We also show several cases of pairs of real divergent sequences, where the default parameter values of Minimap2 yield no output alignments, but the values output by Seedability produce plausible alignments.

Availability and implementation: https://github.com/lorrainea/Seedability (distributed under GPL v3.0).

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Step 1 of estimating esi,sj.
Figure 2.
Figure 2.
Step 2 of estimating esi,sj.
Figure 3.
Figure 3.
(a) The average alignment identities (i.e. the total alignment identity score divided by the total number of pairs) output for the 100 pairs of sequences when using the default Minimap2 (κ,w) values in comparison to the (κ,w) values determined by Seedability. For the preset options, we have used: (i) the default preset option map-ont, if the average sequence length is ≥1000; or (ii) the preset option sr, if the average sequence length is <1000. (b) The number of mapped alignments when using the default Minimap2(κ,w) values in comparison to the (κ,w) values determined by Seedability. For the preset options, we have used: (i) the default preset option map-ont, if the average sequence length is ≥1000; or (ii) the preset option sr, if the average sequence length is <1000.
Figure 4.
Figure 4.
The number of alignments that have an alignment length at least 90% of the original sequence length when using the default Minimap2(κ,w) values in comparison to the (κ,w) values determined by Seedability. For the preset options, we have used: (i) the default preset option map-ont, if the average sequence length is ≥1000; or (ii) the preset option sr, if the average sequence length is <1000.
Figure 5.
Figure 5.
The average time in ms required when using preset option map-ont for (a) Seedability to compute (t, k), (b) Minimap2 to compute an alignment using default parameter values, and (c) Minimap2 to compute an alignment using the (κ,w) values determined by Seedability.
Figure 6.
Figure 6.
The average peak memory in MB required when using preset option map-ont for (a) Seedability to compute (t, k), (b) Minimap2 to compute an alignment using default parameter values and (c) Minimap2 to compute an alignment using the (κ,w) values determined by Seedability.
Figure 7.
Figure 7.
Human gene versus ortholog alignment produced by Minimap2 when using (κ=4,w=3) determined by Seedability in comparison to no output alignment produced when using the default (κ,w) values of Minimap2.
Figure 8.
Figure 8.
(a) The number of mapped sequences when using the default (κ,w) values of Minimap2. (b) The number of mapped sequences when using the (κ,w) values determined by Seedability from Table 1.

References

    1. Alser M, Rotman J, Deshpande D. et al. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021;22:249. - PMC - PubMed
    1. Altschul SF, Gish W, Miller W. et al. Basic local alignment search tool. J Mol Biol 1990;215:403–10. - PubMed
    1. Charalampopoulos P, Crochemore M, Fici G. et al. Alignment-free sequence comparison using absent words. Inf Comput 2018;262:57–68.
    1. Chikhi R, Medvedev P.. Informed and automated k-mer size selection for genome assembly. Bioinformatics 2013;30:31–7. - PubMed
    1. Dewey CN. 2012. Whole-genome alignment. In: Anisimova M. (ed.), Evolutionary Genomics: Statistical and Computational Methods, Vol. 1. Totowa, NJ: Humana Press, 237–57.