Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers

doi:10.1186/1471-2164-16-S7-S2

. 2015;16 Suppl 7(Suppl 7):S2.

doi: 10.1186/1471-2164-16-S7-S2. Epub 2015 Jun 11.

Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers

Ghim Siong Ow, Vladimir A Kuznetsov

PMID: 26100469
PMCID: PMC4474413
DOI: 10.1186/1471-2164-16-S7-S2

Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers

Ghim Siong Ow et al. BMC Genomics. 2015.

. 2015;16 Suppl 7(Suppl 7):S2.

doi: 10.1186/1471-2164-16-S7-S2. Epub 2015 Jun 11.

Authors

Ghim Siong Ow, Vladimir A Kuznetsov

PMID: 26100469
PMCID: PMC4474413
DOI: 10.1186/1471-2164-16-S7-S2

Abstract

Background: The lack of consensus among reported gene signature subsets (GSSs) in multi-gene biomarker discovery studies is often a concern for researchers and clinicians. Subsequently, it discourages larger scale prospective studies, prevents the translation of such knowledge into a practical clinical setting and ultimately hinders the progress of the field of biomarker-based disease classification, prognosis and prediction.

Methods: We define all "gene identificators" (gIDs) as constituents of the entire potential disease biomarker space. For each gID in a GSS of interest ("tested GSS"/tGSS), our method counts the empirical frequency of gID co-occurrences/overlaps in other reference GSSs (rGSSs) and compares it with the expected frequency generated via implementation of a randomized sampling procedure. Comparison of the empirical frequency distribution (EFD) with the expected background frequency distribution (BFD) allows dichotomization of statistically novel (SN) and common (SC) gIDs within the tGSS.

Results: We identify SN or SC biomarkers for tGSSs obtained from previous studies of high-grade serous ovarian cancer (HG-SOC) and breast cancer (BC). For each tGSS, the EFD of gID co-occurrences/overlaps with other rGSSs is characterized by scale and context-dependent Pareto-like frequency distribution function. Our results indicate that while independently there is little overlap between our tGSS with individual rGSSs, comparison of the EFD with BFD suggests that beyond a confidence threshold, tested gIDs become more common in rGSSs than expected. This validates the use of our tGSS as individual or combined prognostic factors. Our method identifies SN and SC genes of a 36-gene prognostic signature that stratify HG-SOC patients into subgroups with low, intermediate or high-risk of the disease outcome. Using 70 BC rGSSs, the method also predicted SN and SC BC prognostic genes from the tested obesity and IGF1 pathway GSSs.

Conclusions: Our method provides a strategy that identify/predict within a tGSS of interest, gID subsets that are either SN or SC when compared to other rGSSs. Practically, our results suggest that there is a stronger association of the IGF1 signature genes with the 70 BC rGSSs, than for the obesity-associated signature. Furthermore, both SC and SN genes, in both signatures could be considered as perspective prognostic biomarkers of BCs that stratify the patients onto low or high risks of cancer development.

PubMed Disclaimer

Figures

**Figure 1**
**Definition of novel or common biomarkers**. (A) Traditional definition of novel or common biomarkers. (B) Statistical definition of novel or common biomarkers. A further vertical dimension is extended which provides a statistical measure of whether the signature gene is considered "novel".

**Figure 2**
**Schema of gene list comparison with other defined sets**. (A) Actual observations of gene lists overlap between single list of interest (AS₀) with other defined sets. (B) Observations of gene lists overlap in a simulation where other defined gene sets are randomly and independently sampled without replacement. AS and RS denote actual and random set respectively. O_mand RO_mdenotes overlap segments and random overlap segments respectively. Blue solid circle represents our gene list of interest (AS₀). Green oval, red rectangle and yellow triangle represent 3 other defined set of genes with sizes |AS_{i = 1}|, |AS_{i = 2}|, |AS_{i = 3}| respectively.

**Figure 3**
**Family of null frequency distribution of expected co-occurrences of our signature genes with other signatures**. The horizontal axis represents the number of samples that contain the gene from our signature of interest. The dotted lines represent the fitted curves of Weibull function whereas the dashed lines represent the fitted curves of Sigmoid function

**Figure 4**
**Actual and expected frequency distribution of gene overlap from a query signature with other reference signatures**. Comparison of genes from (A) 36-gene ovarian cancer prognostic gene signature, (B) tumor breast obesity gene signature and (C) tumor breast IGF1 gene signature, with other reference gene signatures for that disease. (D) Comparison of the actual frequency distribution generated from tumor breast obesity (From B) and tumor breast IGF1 (From C). The expected frequency distributions were generated via performing N simulations, where N is 100, 1000 or 10000. The y-axis is log10 transformed. p1 denotes the two-sided p-value from Kolmogorov-Smirnov statistic which tests if the actual and expected (for N = 100) distribution are similar. p2 denotes the p-value that represents the significance of that threshold in dichotomizing statistically novel or common biomarkers from a GSS of interest.

**Figure 5**
**Classification of high-grade serous ovarian cancer patients**. The patients diagnosed with high-grade serous ovarian cancer were classified using a data-driven method for statistically novel biomarkers (A) ***FZD1*** and (B) ***HGF*** and common biomarkers (C) ***COL3A1*** and (D) ***EDNRA***. Log-rank tests were used to assess the survival statistical significance of the two patient subgroups. Expr: expression.

**Figure 6**
**Classification of breast cancer patients for both Stockholm and Uppsala patient cohort**. The patients diagnosed with breast cancer were classified using data-driven method for statistically novel biomarkers (A) ***PIK3C3*** and (B) ***APPBP2*** and common biomarkers (C) ***IL6ST*** and (D) ***DUSP6***. Top panel: Stockholm breast cancer patient cohort, Bottom panel: Uppsala breast cancer patient cohort. Log-rank tests were used to assess the survival statistical significance of the two patient subgroups. Expr: expression.

See this image and copyright information in PMC

Cited by

Recommendations for the Application of Sex and Gender Medicine in Preclinical, Epidemiological and Clinical Research.
Cattaneo A, Bellenghi M, Ferroni E, Mangia C, Marconi M, Rizza P, Borghini A, Martini L, Luciani MN, Ortona E, Carè A, Appetecchia M, Ministry Of Health-Gender Medicine Team. Cattaneo A, et al. J Pers Med. 2024 Aug 27;14(9):908. doi: 10.3390/jpm14090908. J Pers Med. 2024. PMID: 39338162 Free PMC article. Review.
Genome and transcriptome delineation of two major oncogenic pathways governing invasive ductal breast cancer development.
Aswad L, Yenamandra SP, Ow GS, Grinchuk O, Ivshina AV, Kuznetsov VA. Aswad L, et al. Oncotarget. 2015 Nov 3;6(34):36652-74. doi: 10.18632/oncotarget.5543. Oncotarget. 2015. PMID: 26474389 Free PMC article.
Education, collaboration, and innovation: intelligent biology and medicine in the era of big data.
Ruan J, Jin V, Huang Y, Xu H, Edwards JS, Chen Y, Zhao Z. Ruan J, et al. BMC Genomics. 2015;16 Suppl 7(Suppl 7):S1. doi: 10.1186/1471-2164-16-S7-S1. Epub 2015 Jun 11. BMC Genomics. 2015. PMID: 26099197 Free PMC article.
Identification of disease modules using higher-order network structure.
Singh P, Kuder H, Ritz A. Singh P, et al. Bioinform Adv. 2023 Oct 4;3(1):vbad140. doi: 10.1093/bioadv/vbad140. eCollection 2023. Bioinform Adv. 2023. PMID: 37860106 Free PMC article.
Circulating miR-16-5p, miR-92a-3p, and miR-451a in Plasma from Lung Cancer Patients: Potential Application in Early Detection and a Regulatory Role in Tumorigenesis Pathways.
Reis PP, Drigo SA, Carvalho RF, Lopez Lapa RM, Felix TF, Patel D, Cheng D, Pintilie M, Liu G, Tsao MS. Reis PP, et al. Cancers (Basel). 2020 Jul 27;12(8):2071. doi: 10.3390/cancers12082071. Cancers (Basel). 2020. PMID: 32726984 Free PMC article.

References

1. Chin L, Hahn WC, Getz G, Meyerson M. Making sense of cancer genomic data. Genes & development. 2011;25(6):534–555. doi: 10.1101/gad.2017311. - DOI - PMC - PubMed
1. Lizardi PM, Forloni M, Wajapeyee N. Genome-wide approaches for cancer gene discovery. Trends Biotechnol. 2011;29(11):558–568. doi: 10.1016/j.tibtech.2011.06.003. - DOI - PMC - PubMed
1. Fortney K, Jurisica I. Integrative computational biology for cancer research. Hum Genet. 2011;130(4):465–481. doi: 10.1007/s00439-011-0983-z. - DOI - PMC - PubMed
1. Li Y, Chen L. Big Biological Data: Challenges and Opportunities. Genomics, Proteomics & Bioinformatics. 2014. - DOI - PMC - PubMed
1. Wang Y, Zhang XS, Chen L. Computational systems biology in the big data era. BMC Syst Biol. 2013;7(Suppl 2):S1. doi: 10.1186/1752-0509-7-S2-S1. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

[1] Chin L, Hahn WC, Getz G, Meyerson M. Making sense of cancer genomic data. Genes & development. 2011;25(6):534–555. doi: 10.1101/gad.2017311. - DOI - PMC - PubMed

[2] Chin L, Hahn WC, Getz G, Meyerson M. Making sense of cancer genomic data. Genes & development. 2011;25(6):534–555. doi: 10.1101/gad.2017311. - DOI - PMC - PubMed

[3] Lizardi PM, Forloni M, Wajapeyee N. Genome-wide approaches for cancer gene discovery. Trends Biotechnol. 2011;29(11):558–568. doi: 10.1016/j.tibtech.2011.06.003. - DOI - PMC - PubMed

[4] Lizardi PM, Forloni M, Wajapeyee N. Genome-wide approaches for cancer gene discovery. Trends Biotechnol. 2011;29(11):558–568. doi: 10.1016/j.tibtech.2011.06.003. - DOI - PMC - PubMed

[5] Fortney K, Jurisica I. Integrative computational biology for cancer research. Hum Genet. 2011;130(4):465–481. doi: 10.1007/s00439-011-0983-z. - DOI - PMC - PubMed

[6] Fortney K, Jurisica I. Integrative computational biology for cancer research. Hum Genet. 2011;130(4):465–481. doi: 10.1007/s00439-011-0983-z. - DOI - PMC - PubMed

[7] Li Y, Chen L. Big Biological Data: Challenges and Opportunities. Genomics, Proteomics & Bioinformatics. 2014. - DOI - PMC - PubMed

[8] Li Y, Chen L. Big Biological Data: Challenges and Opportunities. Genomics, Proteomics & Bioinformatics. 2014. - DOI - PMC - PubMed

[9] Wang Y, Zhang XS, Chen L. Computational systems biology in the big data era. BMC Syst Biol. 2013;7(Suppl 2):S1. doi: 10.1186/1752-0509-7-S2-S1. - DOI - PMC - PubMed

[10] Wang Y, Zhang XS, Chen L. Computational systems biology in the big data era. BMC Syst Biol. 2013;7(Suppl 2):S1. doi: 10.1186/1752-0509-7-S2-S1. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers

Multiple signatures of a disease in potential biomarker space: Getting the signatures consensus and identification of novel biomarkers

Authors

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous