Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2007 Jun;3(6):e116.
doi: 10.1371/journal.pcbi.0030116.

Machine learning and its applications to biology

Review

Machine learning and its applications to biology

Adi L Tarca et al. PLoS Comput Biol. 2007 Jun.
No abstract available

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Binary Decision Tree
The left panel shows the data for a two-class decision problem, with dimensionality p = 2. The points known to belong to classes 1 and 2 are displayed with filled circles and squares, respectively. The decision boundary is shown as the blue thick line in the left panel. The triangle designates a new point, z, to be classified. The right panel shows the decision tree derived for this dataset whereas the new point z is classified in class 2 (squares). The regions in the input space covered by nodes I and IV in the tree are represented by the dashed areas at the top and bottom of the left panel, respectively.
Figure 2
Figure 2. A Schematic Representation of a Feed-Forward Three-Layered Neural Network
Two-dimensional data points (p = 2) are classified into K = 2 known classes. The sigmoid hidden and output units are shown as white circles containing an S-like red curve.
Figure 3
Figure 3. Support Vector Machines Class Boundaries
Two-dimensional data points belonging to two different classes (circles and squares) are shown in the left panel. The right panel shows the maximum-margin decision boundary implemented by the SVMs. Samples along the dashed lines are called SVs.
Figure 4
Figure 4. Heat Map of the ALL Data after Filtering
Class membership is indicated by a magenta (NEG) or blue (BCR/ABL) stripe at the top of the plot region. Rows correspond to data features (genes), while columns correspond to data points (samples). Hierarchical clustering is applied simultaneously to both rows (genes) and columns (samples) of the expression matrix to organize the display.
Figure 5
Figure 5. Two Views of the Partition Obtained by PAM
Left, PC display; right, silhouette display. The ellipses plotted on the left are cluster-specific minimum volume ellipsoids for the data projected into the PCs plane. These should be regarded as two-dimensional representations of the robust approximate variance–covariance matrix for the projected clusters. The silhouette display comprises a single horizontal segment for each observation, ordered by clusters and by object-specific silhouette value within a cluster. Large average silhouette values for a cluster indicate good separation of most cluster members from members of other clusters; negative silhouette values for objects indicate instances of indecisiveness or error of the given partition.
Figure 6
Figure 6. A PCA Plot
The 79 samples of the ALL dataset are projected on the first three PCs derived from the 50 original features. The blue and magenta colors are used to denote the known membership of the samples in the two classes, NEG and BCR/ABL, respectively. Note that PCA is an unsupervised data projection method, since the class membership is not required to compute the PCs.
Figure 7
Figure 7. Rendering of a Conditional Tree
The figure is obtained with the Ctree function of the party package.
Figure 8
Figure 8. Display of Four Two-Gene Classifiers
Top left: CART with minsplit tuning parameter set to 4; top right: a single-layer feed-forward neural network with eight units; bottom left, k = 3 nearest neighbors; bottom right, the default SVM from the e1071 package. The planarPlot function of the MLInterfaces package can be used to construct such displays. If the expression level of a given sample falls into the magenta-colored area, then the sample is predicted to have status NEG; if it falls into the blue-colored area, then the sample is predicted to have BCR/ABL status.
Figure 9
Figure 9. Display of Relative Variable Importance as Computed in a Gradient Boosting Machine Run

References

    1. Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65:386–408. - PubMed
    1. Stormo GD, Schneider TD, Gold L, Ehrenfeuch A. Use of the perceptron algorithm to distinguish translation initiation sites in E. coli . Nucleic Acids Res. 1982;10:2997–3011. - PMC - PubMed
    1. Carpenter GA, Grossberg S. The art of adaptive pattern recognition by a self-organizing neural network. Computer. 1988;21:77–88.
    1. Fukushima K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;36:193–202. - PubMed
    1. Weston J, Leslie C, Ie E, Zhou D, Elisseeff A, et al. Semi-supervised protein classification using cluster kernels. Bioinformatics. 2005;21:3241–3247. - PubMed

Publication types