Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Mar 17:6:58.
doi: 10.1186/1471-2105-6-58.

Towards precise classification of cancers based on robust gene functional expression profiles

Affiliations

Towards precise classification of cancers based on robust gene functional expression profiles

Zheng Guo et al. BMC Bioinformatics. .

Abstract

Background: Development of robust and efficient methods for analyzing and interpreting high dimension gene expression profiles continues to be a focus in computational biology. The accumulated experiment evidence supports the assumption that genes express and perform their functions in modular fashions in cells. Therefore, there is an open space for development of the timely and relevant computational algorithms that use robust functional expression profiles towards precise classification of complex human diseases at the modular level.

Results: Inspired by the insight that genes act as a module to carry out a highly integrated cellular function, we thus define a low dimension functional expression profile for data reduction. After annotating each individual gene to functional categories defined in a proper gene function classification system such as Gene Ontology applied in this study, we identify those functional categories enriched with differentially expressed genes. For each functional category or functional module, we compute a summary measure (s) for the raw expression values of the annotated genes to capture the overall activity level of the module. In this way, we can treat the gene expressions within a functional module as an integrative data point to replace the multiple values of individual genes. We compare the classification performance of decision trees based on functional expression profiles with the conventional gene expression profiles using four publicly available datasets, which indicates that precise classification of tumour types and improved interpretation can be achieved with the reduced functional expression profiles.

Conclusion: This modular approach is demonstrated to be a powerful alternative approach to analyzing high dimension microarray data and is robust to high measurement noise and intrinsic biological variance inherent in microarray data. Furthermore, efficient integration with current biological knowledge has facilitated the interpretation of the underlying molecular mechanisms for complex human diseases at the modular level.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Training classification rules for four cancer types based on functional expression profiles of 114 modules. A – Decision tree trained with the NCI60 FEP median measure. The internal nodes of the tree are denoted with the functional modules from Gene Ontology. The leaf nodes give the classification results for the cancer types. The numbers in the leaf nodes are the total number of samples contained over the number of the incorrectly predicted samples. B – Functional expression profiles of the three identified modules. For the identified GO modules from decision analysis, their functional expression profiles are demonstrated with a colouring spectrum of their medians. Each GO module corresponds to a row, and the column denotes the functional expression for each cell line. At the top are names of cell lines (renal cancer (RE), colon cancer (CO), leukaemia (LE), melanoma (ME)). Samples with a missing value or the null value are coded with black colour, a positive with red colour and a negative with green colour. C – numbers of genes annotated and differentially expressed in the three identified modules.
Figure 2
Figure 2
Comparison of different gene expression measures for classification of cancer types in terms of accuracy (A), precision (B) and recall (C).
Figure 3
Figure 3
Training classification rules for lymphoma subtypes based on functional expression profiles of 44 GO modules. A – Decision tree trained with the lymphoma FEP median measure. The internal nodes of the tree are denoted with the functional modules from Gene Ontology. The leaf nodes give the classification results for the lymphoma subtypes. The numbers in the leaf nodes are the total number of samples contained over the number of the incorrectly predicted samples. B – Functional expression profiles of the three identified modules. For the identified GO modules from decision analysis, their functional expression profiles are demonstrated with a colouring spectrum of their medians. Each GO module corresponds to a row, and the column denotes the functional expression for each cell line. At the top are names of cell lines (diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), chronic lymphocyte leukaemia (CLL), and the healthy sources (NORMAL)). Samples with a missing value or the null value are coded with black colour, a positive with red colour and a negative with green colour. C – Numbers of genes annotated and differentially expressed in the three identified modules.
Figure 4
Figure 4
Comparison of different gene expression measures for classification of lymphoma tissues in terms of accuracy (A), precision (B) and recall (C).

Similar articles

Cited by

References

    1. Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, Pergamenschikov A, Lee JC, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO. Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet. 2000;24:227–235. doi: 10.1038/73432. - DOI - PubMed
    1. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson JJ, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. doi: 10.1038/35000501. - DOI - PubMed
    1. Antoniadis A, Lambert-Lacroix S, Leblanc F. Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics. 2003;19:563–570. doi: 10.1093/bioinformatics/btg062. - DOI - PubMed
    1. Zhang H, Yu CY, Singer B, Xiong M. Recursive partitioning for tumor classification with gene expression microarray data. Proc Natl Acad Sci U S A. 2001;98:6730–6735. doi: 10.1073/pnas.111153698. - DOI - PMC - PubMed
    1. Li L, Jiang W, Li X, Moser KL, Guo Z, Du L, Wang Q, Topol EJ, Rao S. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics. 2005;85:16–23. doi: 10.1016/j.ygeno.2004.09.007. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources