. 2008 Mar 27:9:172.

doi: 10.1186/1471-2105-9-172.

Predicting cancer involvement of genes from heterogeneous data

Ramon Aragues¹, Chris Sander, Baldo Oliva

Affiliations

Affiliation

¹ Structural Bioinformatics Lab, (GRIB), Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine (PRBB), 08003-Barcelona, Catalonia, Spain. ramon.aragues@upf.edu

PMID: 18371197
PMCID: PMC2330045
DOI: 10.1186/1471-2105-9-172

Predicting cancer involvement of genes from heterogeneous data

Ramon Aragues et al. BMC Bioinformatics. 2008.

. 2008 Mar 27:9:172.

doi: 10.1186/1471-2105-9-172.

Authors

Ramon Aragues¹, Chris Sander, Baldo Oliva

Affiliation

¹ Structural Bioinformatics Lab, (GRIB), Universitat Pompeu Fabra-IMIM, Barcelona Research Park of Biomedicine (PRBB), 08003-Barcelona, Catalonia, Spain. ramon.aragues@upf.edu

PMID: 18371197
PMCID: PMC2330045
DOI: 10.1186/1471-2105-9-172

Abstract

Background: Systematic approaches for identifying proteins involved in different types of cancer are needed. Experimental techniques such as microarrays are being used to characterize cancer, but validating their results can be a laborious task. Computational approaches are used to prioritize between genes putatively involved in cancer, usually based on further analyzing experimental data.

Results: We implemented a systematic method using the PIANA software that predicts cancer involvement of genes by integrating heterogeneous datasets. Specifically, we produced lists of genes likely to be involved in cancer by relying on: (i) protein-protein interactions; (ii) differential expression data; and (iii) structural and functional properties of cancer genes. The integrative approach that combines multiple sources of data obtained positive predictive values ranging from 23% (on a list of 811 genes) to 73% (on a list of 22 genes), outperforming the use of any of the data sources alone. We analyze a list of 20 cancer gene predictions, finding that most of them have been recently linked to cancer in literature.

Conclusion: Our approach to identifying and prioritizing candidate cancer genes can be used to produce lists of genes likely to be involved in cancer. Our results suggest that differential expression studies yielding high numbers of candidate cancer genes can be filtered using protein interaction networks.

PubMed Disclaimer

Figures

**Figure 1**
**Calculating the Cancer Linker Degree (CLD) of a protein**. The Cancer Linker Degree (CLD) of a protein is defined as the absolute number of partners of the protein that are known to be involved in cancer. The procedure followed to calculate the CLD of a protein consists of 3 steps: 1) setting the known cancer genes as seeds; 2) retrieving the direct interaction partners for the known cancer genes; and 3) calculating the CLD of each protein (i.e. the number of known cancer genes to which it is connected). In the example provided, we observe that proteins with high CLD are more likely to be cancer gene products that proteins with low CLD.

**Figure 2**
**Positive predictive value and Sensitivity when predicting cancer genes based on the cancer linker degree of proteins**. The positive predictive value and sensitivity shown are for accumulative cancer linker degrees (CLD) (i.e. cancer linker degree 5 represents proteins with CLD ≥ 5). The average protein in the data set is represented by CLD 0.

**Figure 3**
**Positive predictive value and sensitivity when predicting cancer genes based on differential expression data**. The positive predictive value and sensitivity are shown for 12 cancer types and genes over- or under-expressed in at least 1, 2 and 5 cancer types.

**Figure 4**
Positive predictive value and sensitivity when predicting cancer genes based on their probability of being a cancer gene according to structural, functional and evolutionary properties (SF-Probability). The positive predictive value and sensitivity shown are for accumulative SF-Probabilities (i.e. SF-Probability 0.7 represents genes with SF-Probability ≥ 0.7). The average gene in the data set is represented by SF-Probability ≥ 0. SF-Probabilities were obtained from [37].

**Figure 5**
The average number of cancer types in which genes appear differentially expressed (A) and the probability of being a cancer gene according to structural, functional and evolutionary properties (B) are plotted as a function the cancer linker degree (CLD) of the gene products. A) The average number of cancer types shown are for an accumulative CLD (i.e. CLD 5 represents proteins with CLD ≥ 5). The average protein in the dataset is represented by CLD 0. Known cancer genes appear differentially expressed in an average of 2.8 cancer types. B) The average SF-Probabilities shown are for an accumulative CLD (i.e. CLD 5 represents proteins with CLD ≥ 5). The average protein in the dataset is represented by CLD 0. Known cancer genes had an average SF-Probability of 0.41.

**Figure 6**
**Contour maps for positive predictive value and sensitivity obtained when varying the thresholds applied by the integrative approach**. In each of the following images, the x-axis is the SF-Probability threshold and the y-axis is the cancer linker degree (CLD) threshold. For a given restriction on the number of cancer types in which a gene must be differentially expressed in order to be considered a candidate (no restriction, at least two cancer types and at least 5 cancer types), the positive predictive value and sensitivity are provided for each combination of CLD and SF-Probability. Positive predictive values and sensitivities are shown using colored contour maps, from red (i.e. 0) to turquoise (i.e., 0.7 for positive predictive value and 0.3 for sensitivity). For example, imposing a gene to be differentially expressed in at least two cancer types, with a CLD of 6 and with an SF-Probability of 0.4, the positive predictive value is 0.4 for sensitivity of 0.05.

**Figure 7**
**Positive predictive value calculated for diverse overlaps of cancer gene candidates**. The criteria applied was the following: (i) cancer linker degree ≥ 5; (ii) differentially expressed in at least four cancer types; and (iii) SF-Probability ≥ 0.6. The Venn diagram shows the total number of candidates, the number of hits (i.e. known cancer genes among the candidates) and the positive predictive value for overlap case. For example, the positive predictive value when solely applying an SF-Probability threshold of 0.6 was 14%. In contrast, when combining the SF-Probability with a cancer linker degree threshold of 5, the positive predictive value was 37% (59 hits for a total of 158 candidates).

**Figure 8**
**Procedure followed to predict cancer gene candidates**. First, a cancer protein interaction network is built from the list of known cancer genes. Second, expression data from different cancer types is mapped onto the network. Third, probabilities of being a cancer gene based on structural, functional and evolutionary properties are retrieved for proteins in the network. Fourth, cancer genes are predicted based on the thresholds provided by the user for each type of data.

See this image and copyright information in PMC

Cited by

Associations of SNPs located at candidate genes to bovine growth traits, prioritized with an interaction networks construction approach.
Paredes-Sánchez FA, Sifuentes-Rincón AM, Segura Cabrera A, García Pérez CA, Parra Bracamonte GM, Ambriz Morales P. Paredes-Sánchez FA, et al. BMC Genet. 2015 Jul 22;16:91. doi: 10.1186/s12863-015-0247-3. BMC Genet. 2015. PMID: 26198337 Free PMC article.
Biana: a software framework for compiling biological interactions and analyzing networks.
Garcia-Garcia J, Guney E, Aragues R, Planas-Iglesias J, Oliva B. Garcia-Garcia J, et al. BMC Bioinformatics. 2010 Jan 27;11:56. doi: 10.1186/1471-2105-11-56. BMC Bioinformatics. 2010. PMID: 20105306 Free PMC article.
A systematic in silico mining of the mechanistic implications and therapeutic potentials of estrogen receptor (ER)-α in breast cancer.
Li X, Sun R, Chen W, Lu B, Li X, Wang Z, Bao J. Li X, et al. PLoS One. 2014 Mar 10;9(3):e91894. doi: 10.1371/journal.pone.0091894. eCollection 2014. PLoS One. 2014. PMID: 24614816 Free PMC article.
Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms.
Huang CH, Peng HS, Ng KL. Huang CH, et al. Biomed Res Int. 2015;2015:312047. doi: 10.1155/2015/312047. Epub 2015 Mar 17. Biomed Res Int. 2015. PMID: 25866773 Free PMC article.
Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification.
Wang SL, Li XL, Fang J. Wang SL, et al. BMC Bioinformatics. 2012 Jul 25;13:178. doi: 10.1186/1471-2105-13-178. BMC Bioinformatics. 2012. PMID: 22830977 Free PMC article.

See all "Cited by" articles

References

1. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100:57–70. doi: 10.1016/S0092-8674(00)81683-9. - DOI - PubMed
1. Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat Med. 2004;10:789–799. doi: 10.1038/nm1087. - DOI - PubMed
1. Bielas JH, Loeb KR, Rubin BP, True LD, Loeb LA. Human cancers express a mutator phenotype. Proceedings of the National Academy of Sciences of the United States of America. 2006;103:18238–18242. doi: 10.1073/pnas.0607057103. - DOI - PMC - PubMed
1. Segal E, Friedman N, Koller D, Regev A. A module map showing conditional activity of expression modules in cancer. Nat Genet. 2004;36:1090–1098. doi: 10.1038/ng1434. - DOI - PubMed
1. Fan C, Oh DS, Wessels L, Weigelt B, Nuyten DS, Nobel AB, van't Veer LJ, Perou CM. Concordance among gene-expression-based predictors for breast cancer. The New England journal of medicine. 2006;355:560–569. doi: 10.1056/NEJMoa052933. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting cancer involvement of genes from heterogeneous data

Affiliation

Predicting cancer involvement of genes from heterogeneous data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources