Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb;30(2):149-160.
doi: 10.1089/cmb.2021.0520. Epub 2022 Aug 8.

Computing Maximal Covers for Protein Sequences

Affiliations

Computing Maximal Covers for Protein Sequences

G Brian Golding et al. J Comput Biol. 2023 Feb.

Abstract

A partial cover of a string or sequence of length n, which we model as an array x=x[1..n], is a repeating substring u of x such that "many" positions in x lie within occurrences of u. A maximal cover u*-introduced in 2018 by Mhaskar and Smyth as optimal cover-is a partial cover that, over all partial covers u, maximizes the positions covered. Applying data structures also introduced by Mhaskar and Smyth, our software MAXCOVER for the first time enables efficient computation of u* for any x-in particular, as described here, for protein sequences of Arabidopsis, Caenorhabditis elegans, Drosophila melanogaster, and humans. In this protein context, we also compare an extended version of MAXCOVER with existing software (MUMmer's repeat-match) for the closely related task of computing non-extendible repeating substrings (a.k.a. maximal repeats). In practice, MAXCOVER is an order-of-magnitude faster than MUMmer, with much lower space requirements, while producing more compact output that, nevertheless, yields a more exact and user-friendly specification of the repeats.

Keywords: MAXCOVER; MUMmer; protein; repeats; string covers.

PubMed Disclaimer

Publication types

LinkOut - more resources