Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 29:6:30672.
doi: 10.1038/srep30672.

Discovering Pair-wise Synergies in Microarray Data

Affiliations

Discovering Pair-wise Synergies in Microarray Data

Yuan Chen et al. Sci Rep. .

Abstract

Informative gene selection can have important implications for the improvement of cancer diagnosis and the identification of new drug targets. Individual-gene-ranking methods ignore interactions between genes. Furthermore, popular pair-wise gene evaluation methods, e.g. TSP and TSG, are helpless for discovering pair-wise interactions. Several efforts to discover pair-wise synergy have been made based on the information approach, such as EMBP and FeatKNN. However, the methods which are employed to estimate mutual information, e.g. binarization, histogram-based and KNN estimators, depend on known data or domain characteristics. Recently, Reshef et al. proposed a novel maximal information coefficient (MIC) measure to capture a wide range of associations between two variables that has the property of generality. An extension from MIC(X; Y) to MIC(X1; X2; Y) is therefore desired. We developed an approximation algorithm for estimating MIC(X1; X2; Y) where Y is a discrete variable. MIC(X1; X2; Y) is employed to detect pair-wise synergy in simulation and cancer microarray data. The results indicate that MIC(X1; X2; Y) also has the property of generality. It can discover synergic genes that are undetectable by reference feature selection methods such as MIC(X; Y) and TSG. Synergic genes can distinguish different phenotypes. Finally, the biological relevance of these synergic genes is validated with GO annotation and OUgene database.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Synergic pairs conducted by function.
Y = |X1 – X2|(n = 200). Y is binarized with a median. Red point: positive sample. Green point: negative sample.
Figure 2
Figure 2. Examples of scatter plots of discretization for gene expression.
(A,B) are real-word gene expression values for prostate dataset and yeast dataset; the values of HTB1 gene are binarized with 0. C and D are simulation datasets from Y = 4·X2 and Y = sin (4·π·X), Y is binarized with 0.5 and 0, respectively. Red point: positive sample. Green point: negative sample.
Figure 3
Figure 3. Schematic of getting superclumps partition for three variables.
The points with the same color belong to the same superclump.
Figure 4
Figure 4. Y completely determined by the synergy between X1 and X2.
X1 and X2∈[10, 30], formula image andformula image result from binarization vector of X1 and X2, respectively. Y = formula image(n = 1000). Green and red dots represent Y = 1 and Y = 0, respectively.
Figure 5
Figure 5. Ten noiseless functions with Y = f (X1, X2).
Y is binarized with median, green and red dots represent Y=1 and Y=0, respectively.
Figure 6
Figure 6. Overlaps among the Top200s selected by MIC(X; Y), MRMR, SVM-RFE and TSG in the Prostate dataset.
Figure 7
Figure 7. Overlaps among the Top200s selected by MIC(X; Y), MRMR, SVM-RFE and TSG in the DLBCL dataset.
Figure 8
Figure 8. Overlaps among the Top200s selected by MIC(X; Y), MRMR, SVM-RFE and TSG in the Lung dataset.
Figure 9
Figure 9. Overlaps between the Top200 selected by MIC(X1; X2; Y) and the Top200s selected by MIC(X; Y), MRMR, SVM-RFE and TSG in the Prostate dataset.
Figure 10
Figure 10. Overlaps between the Top200 selected by MIC(X1; X2; Y) and the Top200s selected by MIC(X; Y), MRMR, SVM-RFE and TSG in the DLBCL dataset.
Figure 11
Figure 11. Overlaps between the Top200 selected by MIC(X1; X2; Y) and the Top200s selected by MIC(X; Y), MRMR, SVM-RFE and TSG in the Lung dataset.
Figure 12
Figure 12. Prediction accuracy of five feature selection methods combined with SVC Classifier over three datasets.
Figure 13
Figure 13. GO annotations for the Top200s selected by different methods in the Prostate dataset.
Deeper colors of one point in the figure means the terms covered with more genes. We have removed the terms in which the sum of genes number is less than 25 across all methods.
Figure 14
Figure 14. Three representative patterns of pair-wise synergy identified by MIC(X1, X2: Y) method.
(A–E) are from real-world datasets, (F–H) are the corresponding hypothetical extreme examples.

Similar articles

Cited by

References

    1. Liu Q. et al.. Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data. PloS One 4, e8250 (2009). - PMC - PubMed
    1. Wang H., Zhang H., Dai Z., Chen M. S. & Yuan Z. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. BMC Med Genomics 6, S3 (2013). - PMC - PubMed
    1. Cai H., Ruan P., Ng M. & Akutsu T. Feature weight estimation for gene selection: a local hyperlinear learning approach. BMC Bioinformatics 15, 70 (2014). - PMC - PubMed
    1. Sandhu R. et al.. Graph curvature for differentiating cancer networks. Sci. Rep. 5, 12323 (2015). - PMC - PubMed
    1. Hsueh Y. Y. et al.. Synergy of endothelial and neural progenitor cells from adipose-derived stem cells to preserve neurovascular structures in rat hypoxic-ischemic brain injury. Sci. Rep. 5, 14985 (2015). - PMC - PubMed

Publication types

LinkOut - more resources