. 2016 Jul 29:6:30672.

doi: 10.1038/srep30672.

Discovering Pair-wise Synergies in Microarray Data

Yuan Chen^{1

2}, Dan Cao³, Jun Gao^{4

5}, Zheming Yuan^{1

2}

Affiliations

¹ Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, 410128, China.
² Hunan Provincial Key Laboratory for Germplasm Innovation and Utilization of Crop, Hunan Agricultural University, Changsha, Hunan, 410128, China.
³ Orient Science &Technology College of Hunan Agricultural University, Changsha, Hunan, 410128, China.
⁴ College of Resources &Environment, Hunan Agricultural University, Changsha, Hunan, 410128, China.
⁵ Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, 72205, USA.

PMID: 27470995
PMCID: PMC4965793
DOI: 10.1038/srep30672

Discovering Pair-wise Synergies in Microarray Data

Yuan Chen et al. Sci Rep. 2016.

. 2016 Jul 29:6:30672.

doi: 10.1038/srep30672.

Authors

Yuan Chen^{1

2}, Dan Cao³, Jun Gao^{4

5}, Zheming Yuan^{1

2}

Affiliations

¹ Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, Hunan, 410128, China.
² Hunan Provincial Key Laboratory for Germplasm Innovation and Utilization of Crop, Hunan Agricultural University, Changsha, Hunan, 410128, China.
³ Orient Science &Technology College of Hunan Agricultural University, Changsha, Hunan, 410128, China.
⁴ College of Resources &Environment, Hunan Agricultural University, Changsha, Hunan, 410128, China.
⁵ Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, 72205, USA.

PMID: 27470995
PMCID: PMC4965793
DOI: 10.1038/srep30672

Abstract

Informative gene selection can have important implications for the improvement of cancer diagnosis and the identification of new drug targets. Individual-gene-ranking methods ignore interactions between genes. Furthermore, popular pair-wise gene evaluation methods, e.g. TSP and TSG, are helpless for discovering pair-wise interactions. Several efforts to discover pair-wise synergy have been made based on the information approach, such as EMBP and FeatKNN. However, the methods which are employed to estimate mutual information, e.g. binarization, histogram-based and KNN estimators, depend on known data or domain characteristics. Recently, Reshef et al. proposed a novel maximal information coefficient (MIC) measure to capture a wide range of associations between two variables that has the property of generality. An extension from MIC(X; Y) to MIC(X1; X2; Y) is therefore desired. We developed an approximation algorithm for estimating MIC(X1; X2; Y) where Y is a discrete variable. MIC(X1; X2; Y) is employed to detect pair-wise synergy in simulation and cancer microarray data. The results indicate that MIC(X1; X2; Y) also has the property of generality. It can discover synergic genes that are undetectable by reference feature selection methods such as MIC(X; Y) and TSG. Synergic genes can distinguish different phenotypes. Finally, the biological relevance of these synergic genes is validated with GO annotation and OUgene database.

PubMed Disclaimer

Figures

**Figure 1. Synergic pairs conducted by function.**
Y = |X1 – X2|(n = 200). Y is binarized with a median. Red point: positive sample. Green point: negative sample.

**Figure 2. Examples of scatter plots of discretization for gene expression.**
(**A,B**) are real-word gene expression values for prostate dataset and yeast dataset; the values of *HTB1* gene are binarized with 0. C and D are simulation datasets from Y = 4·X² and Y = sin (4·π·X), Y is binarized with 0.5 and 0, respectively. Red point: positive sample. Green point: negative sample.

**Figure 3. Schematic of getting superclumps partition for three variables.**
The points with the same color belong to the same superclump.

**Figure 4. Y completely determined by the *synergy* between X₁ and X₂.**
X₁ and X₂∈[10, 30], and result from binarization vector of X₁ and X₂, respectively. Y = (n = 1000). Green and red dots represent Y = 1 and Y = 0, respectively.

formula image — **Figure 4. Y completely determined by the *synergy* between X₁ and X₂.**
X₁ and X₂∈[10, 30], and result from binarization vector of X₁ and X₂, respectively. Y = (n = 1000). Green and red dots represent Y = 1 and Y = 0, respectively.

**Figure 5. Ten noiseless functions with Y = f (X₁, X₂).**
Y is binarized with median, green and red dots represent Y=1 and Y=0, respectively.

**Figure 6. Overlaps among the Top200s selected by *MIC*(X; Y), MRMR, SVM-RFE and TSG in the Prostate dataset.**

**Figure 7. Overlaps among the Top200s selected by *MIC*(X; Y), MRMR, SVM-RFE and TSG in the DLBCL dataset.**

**Figure 8. Overlaps among the Top200s selected by *MIC*(X; Y), MRMR, SVM-RFE and TSG in the Lung dataset.**

**Figure 9. Overlaps between the Top200 selected by *MIC*(X₁; X₂; Y) and the Top200s selected by *MIC*(X; Y), MRMR, SVM-RFE and TSG in the Prostate dataset.**

**Figure 10. Overlaps between the Top200 selected by *MIC*(X₁; X₂; Y) and the Top200s selected by *MIC*(X; Y), MRMR, SVM-RFE and TSG in the DLBCL dataset.**

**Figure 11. Overlaps between the Top200 selected by *MIC*(X₁; X₂; Y) and the Top200s selected by *MIC*(X; Y), MRMR, SVM-RFE and TSG in the Lung dataset.**

**Figure 12. Prediction accuracy of five feature selection methods combined with SVC Classifier over three datasets.**

**Figure 13. GO annotations for the Top200s selected by different methods in the Prostate dataset.**
Deeper colors of one point in the figure means the terms covered with more genes. We have removed the terms in which the sum of genes number is less than 25 across all methods.

**Figure 14. Three representative patterns of pair-wise synergy identified by *MIC*(X₁, X₂: Y) method.**
(**A–E)** are from real-world datasets, (**F–H**) are the corresponding hypothetical extreme examples.

See this image and copyright information in PMC

Cited by

A fast approach to detect gene-gene synergy.
Xing P, Chen Y, Gao J, Bai L, Yuan Z. Xing P, et al. Sci Rep. 2017 Nov 27;7(1):16437. doi: 10.1038/s41598-017-16748-w. Sci Rep. 2017. PMID: 29180805 Free PMC article.
High dimensional model representation of log-likelihood ratio: binary classification with expression data.
Foroughi Pour A, Pietrzak M, Dalton LA, Rempała GA. Foroughi Pour A, et al. BMC Bioinformatics. 2020 Apr 25;21(1):156. doi: 10.1186/s12859-020-3486-x. BMC Bioinformatics. 2020. PMID: 32334509 Free PMC article.
Data analysis methods for defining biomarkers from omics data.
Li C, Gao Z, Su B, Xu G, Lin X. Li C, et al. Anal Bioanal Chem. 2022 Jan;414(1):235-250. doi: 10.1007/s00216-021-03813-7. Epub 2021 Dec 24. Anal Bioanal Chem. 2022. PMID: 34951658 Review.

References

1. Liu Q. et al.. Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data. PloS One 4, e8250 (2009). - PMC - PubMed
1. Wang H., Zhang H., Dai Z., Chen M. S. & Yuan Z. TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. BMC Med Genomics 6, S3 (2013). - PMC - PubMed
1. Cai H., Ruan P., Ng M. & Akutsu T. Feature weight estimation for gene selection: a local hyperlinear learning approach. BMC Bioinformatics 15, 70 (2014). - PMC - PubMed
1. Sandhu R. et al.. Graph curvature for differentiating cancer networks. Sci. Rep. 5, 12323 (2015). - PMC - PubMed
1. Hsueh Y. Y. et al.. Synergy of endothelial and neural progenitor cells from adipose-derived stem cells to preserve neurovascular structures in rat hypoxic-ischemic brain injury. Sci. Rep. 5, 14985 (2015). - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Discovering Pair-wise Synergies in Microarray Data

Affiliations

Discovering Pair-wise Synergies in Microarray Data

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous