Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2008 Sep;11(8):677-85.
doi: 10.2174/138620708785739899.

Machine learning for in silico virtual screening and chemical genomics: new strategies

Affiliations
Free PMC article
Review

Machine learning for in silico virtual screening and chemical genomics: new strategies

Jean-Philippe Vert et al. Comb Chem High Throughput Screen. 2008 Sep.
Free PMC article

Abstract

Support vector machines and kernel methods belong to the same class of machine learning algorithms that has recently become prominent in both computational biology and chemistry, although both fields have largely ignored each other. These methods are based on a sound mathematical and computationally efficient framework that implicitly embeds the data of interest, respectively proteins and small molecules, in high-dimensional feature spaces where various classification or regression tasks can be performed with linear algorithms. In this review, we present the main ideas underlying these approaches, survey how both the "biological" and the "chemical" spaces have been separately constructed using the same mathematical framework and tricks, and suggest different avenues to unify both spaces for the purpose of in silico chemogenomics.

PubMed Disclaimer

Figures

Fig. (1)
Fig. (1)
Defining a kernel over a space X, such as the space of all small molecules or the space of all proteins, is equivalent to embedding X in a vector space F of finite or infinite dimension through a mapping Φ:X → F.The kernel between two points in X is equal to the inner products of their images in F, as shown in (1).
Fig. (2)
Fig. (2)
We can define the distance between two objects x1 and x2, such as two small molecules or proteins, as the Euclidean distance between their images Φ(x1) and Φ(x2). If the mapping Φ is defined by a valid kernel k, then this distance can be computed easily without computing Φ(x1) and Φ(x2), as shown in (2). This kernel trick can be extended to a variety of linear algorithms that only manipulate the data through inner products.

Similar articles

Cited by

References

    1. Jaakkola T, Diekhans M, Haussler D.A. J. Comput. Biol. 2000;7:95–114. - PubMed
    1. Lanckriet GRG, De Bie T, Cristianini N, Jordan MI, Noble WS. Bioinformatics. 2004;20:2626–2635. - PubMed
    1. Dobson PD, Doig AJ. J. Mol. Biol. 2005;345:187–199. - PubMed
    1. Matsuda A, Vert J-P, Saigo H, Ueda N, Toh H, Akutsu T. Protein Sci. 2005;14:2804–2813. - PMC - PubMed
    1. Blake JF. Curr. Opin. Biotechnol. 2000;11:104–107. - PubMed

MeSH terms

LinkOut - more resources