Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 23:11:477.
doi: 10.1186/1471-2105-11-477.

A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer

Affiliations

A bi-ordering approach to linking gene expression with clinical annotations in gastric cancer

Fan Shi et al. BMC Bioinformatics. .

Abstract

Background: In the study of cancer genomics, gene expression microarrays, which measure thousands of genes in a single assay, provide abundant information for the investigation of interesting genes or biological pathways. However, in order to analyze the large number of noisy measurements in microarrays, effective and efficient bioinformatics techniques are needed to identify the associations between genes and relevant phenotypes. Moreover, systematic tests are needed to validate the statistical and biological significance of those discoveries.

Results: In this paper, we develop a robust and efficient method for exploratory analysis of microarray data, which produces a number of different orderings (rankings) of both genes and samples (reflecting correlation among those genes and samples). The core algorithm is closely related to biclustering, and so we first compare its performance with several existing biclustering algorithms on two real datasets - gastric cancer and lymphoma datasets. We then show on the gastric cancer data that the sample orderings generated by our method are highly statistically significant with respect to the histological classification of samples by using the Jonckheere trend test, while the gene modules are biologically significant with respect to biological processes (from the Gene Ontology). In particular, some of the gene modules associated with biclusters are closely linked to gastric cancer tumorigenesis reported in previous literature, while others are potentially novel discoveries.

Conclusion: In conclusion, we have developed an effective and efficient method, Bi-Ordering Analysis, to detect informative patterns in gene expression microarrays by ranking genes and samples. In addition, a number of evaluation metrics were applied to assess both the statistical and biological significance of the resulting bi-orderings. The methodology was validated on gastric cancer and lymphoma datasets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Heat map of super-bicluster 7. Heat map for the prototype of the most prominent super-bicluster, SBC7, generated by the BOA algorithm for the gastric cancer data. The vertical axis shows the 515 most significant genes ordered by f(g) in Algorithm 1, and cut by θG = 5.0, while the horizontal axis shows all samples ordered by h(s) and cut by θS = 4.5. The yellow vertical line in the middle of figure indicates the boundary between the samples in the bicluster (left-side) and others (right-side). The bicluster samples are enriched with the CG subtype with a p-value of 4.32 × 10-10 in terms of the SCS metric or enriched with a combination of {normal, CG, IM} subtypes with a p-value of 4.03 × 10-14 in terms of the MCS metric. Moreover, we observe a strong gradation from least malignant samples (normal and CG), through an intermediate phenotype IM, to the malignant samples (combined intestinal, diffuse and mixed gastric cancers). Two phenotypes, squamous and adenosquamous, with only one sample are annotated with black and white, respectively, but are not shown on the legend. The probability of obtaining such or better ordering by random chance was estimated to have a p-value of 5.35 × 10-22 in terms of Jonckheere's test.
Figure 2
Figure 2
Saturation metrics for gastric cancer dataset. Gastric cancer benchmark results for five biclustering algorithms. We plot the number of unique biclusters (solid lines) and super-biclusters (broken lines) with the p-value below the threshold indicated by the x-axis. Each algorithm is represented with a unique color as shown in the legend. The results for the super-biclusters are represented with the same color as the biclusters for BOA, ISA and Gibbs (broken lines). Note that Gibbs produces exactly the same lines for biclusters and super-biclusters due to their algorithm. We have used the SCS (left sub- figure) and MCS (right sub- figure) metrics to calculate the p-values. We have applied 1000 random initializations for BOA and ISA and the parameter settings follow the suggestions in these studies.
Figure 3
Figure 3
Saturation metrics for lymphoma dataset. Lymphoma dataset benchmark results for five biclustering algorithms. The experimental settings and elements of these figures are the same as the gastric cancer experiments.

References

    1. Boussioutas A. Distinctive Patterns of Gene Expression in Premalignant Gastric Mucosa and Gastric Cancer. Cancer Research. 2003. pp. 2569–2577. - PubMed
    1. Cheng Y, Church GM. Biclustering of expression data. Proceedings of International Conference on Intelligent Systems for Molecular Biology. 2000;8:93–103. - PubMed
    1. Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM transactions on computational biology and bioinformatics/IEEE, ACM. 2004;1:24–45. doi: 10.1109/TCBB.2004.2. - DOI - PubMed
    1. Prelić A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006;22(9):1122–1129. doi: 10.1093/bioinformatics/btl060. - DOI - PubMed
    1. Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics. 2002;18(Suppl 1) - PubMed

Publication types

MeSH terms