. 2009 Jun 15;25(12):i305-12.

doi: 10.1093/bioinformatics/btp220.

A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery

Lei Xie¹, Li Xie, Philip E Bourne

Affiliations

PMID: 19478004
PMCID: PMC2687974
DOI: 10.1093/bioinformatics/btp220

A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery

Lei Xie et al. Bioinformatics. 2009.

. 2009 Jun 15;25(12):i305-12.

doi: 10.1093/bioinformatics/btp220.

Authors

Lei Xie¹, Li Xie, Philip E Bourne

Affiliation

¹ San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA. lxie@sdsc.edu

PMID: 19478004
PMCID: PMC2687974
DOI: 10.1093/bioinformatics/btp220

Abstract

Functional relationships between proteins that do not share global structure similarity can be established by detecting their ligand-binding-site similarity. For a large-scale comparison, it is critical to accurately and efficiently assess the statistical significance of this similarity. Here, we report an efficient statistical model that supports local sequence order independent ligand-binding-site similarity searching. Most existing statistical models only take into account the matching vertices between two sites that are defined by a fixed number of points. In reality, the boundary of the binding site is not known or is dependent on the bound ligand making these approaches limited. To address these shortcomings and to perform binding-site mapping on a genome-wide scale, we developed a sequence-order independent profile-profile alignment (SOIPPA) algorithm that is able to detect local similarity between unknown binding sites a priori. The SOIPPA scoring integrates geometric, evolutionary and physical information into a unified framework. However, this imposes a significant challenge in assessing the statistical significance of the similarity because the conventional probability model that is based on fixed-point matching cannot be applied. Here we find that scores for binding-site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version. Efficient statistical analysis makes it possible to apply SOIPPA to genome-based drug discovery. Consequently, we have applied the approach to the structural genome of Mycobacterium tuberculosis to construct a protein-ligand interaction network. The network reveals highly connected proteins, which represent suitable targets for promiscuous drugs.

PubMed Disclaimer

Figures

**Fig. 1.**
Fitting of the square of the SOIPPA raw scores to an extreme value distribution (EVD) for alignment lengths N of 5, 15, 25 and 35, respectively. The EVD is determined by two parameters μ and σ, which are estimated from linear regression of the rearrangement of Equations 4 and 5 (see text and Fig. 2) as S² = μ + σ(–ln(–ln(1−-P))).

**Fig. 2.**
The derived parameters μ and σ that determine a unique extreme value distribution (EVD) for a specific alignment length can be fitted to a quadratic function based on the logarithm of alignment length.

**Fig. 3.**
Computational time for 5000 randomly selected non-redundant chains searched against two structures with chain lengths of 564 (red triangle) and 166 (black diamond), respectively.

**Fig. 4.**
Percentage of false positive rate versus true positive rate for the original SOIPPA algorithm (Xie and Bourne, 2008) (solid) and the improved SMAP implementation (dashed with circles) using (a) BLOSUM45 and (b) McLachlan substitution matrices. The details of the benchmark used are given in the method section.

**Fig. 5.**
Predicted protein–ligand interaction network of *M. tuberculosis*. Proteins that are predicted to have similar binding sites are connected. Squares represent the top 18 most connected proteins.

See this image and copyright information in PMC

Cited by

Protein function annotation by local binding site surface similarity.
Spitzer R, Cleves AE, Varela R, Jain AN. Spitzer R, et al. Proteins. 2014 Apr;82(4):679-94. doi: 10.1002/prot.24450. Epub 2013 Nov 22. Proteins. 2014. PMID: 24166661 Free PMC article.
Identification of distant drug off-targets by direct superposition of binding pocket surfaces.
Schumann M, Armen RS. Schumann M, et al. PLoS One. 2013 Dec 31;8(12):e83533. doi: 10.1371/journal.pone.0083533. eCollection 2013. PLoS One. 2013. PMID: 24391782 Free PMC article.
Rational discovery of dual-indication multi-target PDE/Kinase inhibitor for precision anti-cancer therapy using structural systems pharmacology.
Lim H, He D, Qiu Y, Krawczuk P, Sun X, Xie L. Lim H, et al. PLoS Comput Biol. 2019 Jun 17;15(6):e1006619. doi: 10.1371/journal.pcbi.1006619. eCollection 2019 Jun. PLoS Comput Biol. 2019. PMID: 31206508 Free PMC article.
VASP-E: specificity annotation with a volumetric analysis of electrostatic isopotentials.
Chen BY. Chen BY. PLoS Comput Biol. 2014 Aug 28;10(8):e1003792. doi: 10.1371/journal.pcbi.1003792. eCollection 2014 Aug. PLoS Comput Biol. 2014. PMID: 25166865 Free PMC article.
eMatchSite: sequence order-independent structure alignments of ligand binding pockets in protein models.
Brylinski M. Brylinski M. PLoS Comput Biol. 2014 Sep 18;10(9):e1003829. doi: 10.1371/journal.pcbi.1003829. eCollection 2014 Sep. PLoS Comput Biol. 2014. PMID: 25232727 Free PMC article.

See all "Cited by" articles

References

1. AltschulS F, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
1. Andreeva A, Murzin AG. Evolution of protein fold in the presence of functional constraints. Curr. Opin. Struct. Biol. 2006;16:399–408. - PubMed
1. Artymiuk PJ, et al. A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. J. Mol. Biol. 1994;243:327–344. - PubMed
1. Barker JA, Thornton JM. An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics. 2003;19:1644–1649. - PubMed
1. Bashton M, Chothia C. The generation of new protein functions by the combination of domains. Structure. 2007;15:85–99. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery

Affiliation

A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources