Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2000 Jul;2(3):228-38.
doi: 10.1006/mben.2000.0155.

Mining of biological data II: assessing data structure and class homogeneity by cluster analysis

Affiliations

Mining of biological data II: assessing data structure and class homogeneity by cluster analysis

R T Kamimura et al. Metab Eng. 2000 Jul.

Abstract

An important step in data analysis is class assignment which is usually done on the basis of a macroscopic phenotypic or bioprocess characteristic, such as high vs low growth, healthy vs diseased state, or high vs. low productivity. Unfortunately, such an assignment may lump together samples, which when derived from a more detailed phenotypic or bioprocess description are dissimilar, giving rise to models of lower quality and predictive power. In this paper we present a clustering algorithm for data preprocessing which involves the identification of fundamentally similar lots on the basis of the extent of similarity among the system variables. The algorithm combines aspects of cluster analysis and principal component analysis by applying agglomerative clustering methods to the first principal component of the system data matrix. As part of a rational strategy for developing empirical models, this technique selects lots (samples) which are most appropriate for inclusion in a training set by analyzing multivariate data homogeneity. Samples with similar data structures are identified and grouped together into distinct clusters. This knowledge is used in the formation of potential training sets. Additionally, this technique can identify atypical lots, i.e., samples that are not simply outliers but exhibit the general properties of one class but have been given the assignment of the other. The method is presented along with examples from its application to fermentation data sets.

PubMed Disclaimer

Publication types

LinkOut - more resources