Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 4:5:7.
doi: 10.1186/1748-7188-5-7.

Linear-time protein 3-D structure searching with insertions and deletions

Affiliations

Linear-time protein 3-D structure searching with insertions and deletions

Tetsuo Shibuya et al. Algorithms Mol Biol. .

Abstract

Background: Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era.

Results: We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (i.e., insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels k is bounded by a constant. Our algorithm solves the above problem for a query of size m and a database of size N in average-case O(N) time, whereas the time complexity of the previously best algorithm was O(Nm(k+1)).

Conclusions: Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Ungapped substructures. There are at least 2k + 2 pairs of subsequence structures formula image and formula image such that such that formula image = Qj and formula image is a substructure of P.

References

    1. Aung Z, Tan KL. Rapid retrieval of protein structures from databases. Drug Discovery Today. 2007;12:732–739. doi: 10.1016/j.drudis.2007.07.014. - DOI - PubMed
    1. Eidhammer I, Jonassen I, Taylor WR. Structure comparison and structure patterns. J Computational Biology. 2000;7(5):685–716. doi: 10.1089/106652701446152. - DOI - PubMed
    1. Gerstein M. Integrative database analysis in structural genomics. Nat Struct Biol. 2000. pp. 960–963. - DOI - PubMed
    1. Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Current Opinion in Structural Biology. 2009;19:341–348. doi: 10.1016/j.sbi.2009.04.003. - DOI - PubMed
    1. Koehl P. Protein structure similarities. Current Opinion in Structural Biology. 2001;11:348–353. doi: 10.1016/S0959-440X(00)00214-1. - DOI - PubMed

LinkOut - more resources