Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 16;8(9):e74250.
doi: 10.1371/journal.pone.0074250. eCollection 2013.

SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis

Affiliations

SurvExpress: an online biomarker validation tool and database for cancer gene expression data using survival analysis

Raul Aguirre-Gamboa et al. PLoS One. .

Abstract

Validation of multi-gene biomarkers for clinical outcomes is one of the most important issues for cancer prognosis. An important source of information for virtual validation is the high number of available cancer datasets. Nevertheless, assessing the prognostic performance of a gene expression signature along datasets is a difficult task for Biologists and Physicians and also time-consuming for Statisticians and Bioinformaticians. Therefore, to facilitate performance comparisons and validations of survival biomarkers for cancer outcomes, we developed SurvExpress, a cancer-wide gene expression database with clinical outcomes and a web-based tool that provides survival analysis and risk assessment of cancer datasets. The main input of SurvExpress is only the biomarker gene list. We generated a cancer database collecting more than 20,000 samples and 130 datasets with censored clinical information covering tumors over 20 tissues. We implemented a web interface to perform biomarker validation and comparisons in this database, where a multivariate survival analysis can be accomplished in about one minute. We show the utility and simplicity of SurvExpress in two biomarker applications for breast and lung cancer. Compared to other tools, SurvExpress is the largest, most versatile, and quickest free tool available. SurvExpress web can be accessed in http://bioinformatica.mty.itesm.mx/SurvExpress (a tutorial is included). The website was implemented in JSP, JavaScript, MySQL, and R.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Overview of the SurvExpress web tool.
Panel A shows a schematic diagram of the SurvExpress workflow while Panel B shows snapshots of the interfaces tagging the required input fields. In the first Input web page, the user can paste the list of genes (tagged with the number 1, which can be symbols, entrez gene identifier and others identifiers) and choose the dataset from around 140 available datasets (tagged with 2 and 3). SurvExpress validates and searches the genes and dataset to show the Analysis web page where the user selects the censored outcome (tag 4) and visualizes the results (right-bottom expanded in Figure 2). The whole process can be achieved in less than one minute for a sensible number of genes.
Figure 2
Figure 2. Common outputs of the SurvExpress Results page.
This figure shows the results from a breast cancer meta-base included in SurvExpress. Panel A shows the Kaplan-Meier curve for risk groups, concordance index, and p-value of the log-rank testing equality of survival curves. Panel B shows clinical information available related to risk group, prognostic index, and outcome data. Panel C shows a heat map representation of the gene expression values. Panel D shows a box plot across risk groups, including the p-value testing for difference using t-test (or f-test for more than two groups). Panel E shows the relation between risk groups and prognostic index. Panel F shows fragments of tables with the summary of the Cox fitting and the prognostic indexes. Details are provided in SurvExpress Tutorial.
Figure 3
Figure 3. Kaplan-Meier curves and performance of the OncoTypeDX biomarker in four datasets.
Censoring samples are shown as “+” marks. Horizontal axis represents time to event. Dataset, outcome event, time scale, concordance index (CI), and p-value of the log-rank test are shown. Red and Green curves denote High- and Low-risk groups respectively. The red and green numbers below horizontal axis represent the number of individuals not presenting the event of the corresponding risk group along time. The number of individuals, the number of censored, and the CI of each risk group are shown in the top-right insets.
Figure 4
Figure 4. Kaplan-Meier curves and performance of the OncoTypeDX biomarker in the breast cancer Ivshina dataset across three tumor grades.
Legends as in Figure 3.
Figure 5
Figure 5. Performance and representation the two NSCLC biomarkers.
Kaplan-Meier curves as in Figure 3. Heat map shows the expression of each gene (rows) along samples (columns) in risk groups. Low expression is represented in green grades and high expression in red grades. Corresponding beta coefficients from the Cox fitting is shown. Two stars (**) marks genes whose fitting p-value <0.05, one star (*) for marginal significant genes having p-value <0.10, and no stars for genes whose p-value is >0.1. Box plots compare the difference of gene expression between risk groups using a t-test.
Figure 6
Figure 6. Comparison of Kaplan-Meier curves of the two NSCLC biomarkers in three representative lung cancer databases.
Legends as in Figure 3.

References

    1. Gyorffy B, Benke Z, Lanczky A, Balazs B, Szallasi Z, et al. (2012) RecurrenceOnline: an online analysis tool to determine breast cancer recurrence and hormone receptor status using microarray data. Breast Cancer Res Treat 132: 1025–1034. - PubMed
    1. Venet D, Dumont JE, Detours V (2011) Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput Biol 7: e1002240. - PMC - PubMed
    1. Kern SE (2012) Why your new cancer biomarker may never work: recurrent patterns and remarkable diversity in biomarker failures. Cancer Res 72: 6097–6101. - PMC - PubMed
    1. Mizuno H, Kitada K, Nakai K, Sarai A (2009) PrognoScan: a new database for meta-analysis of the prognostic value of genes. BMC Med Genomics 2: 18. - PMC - PubMed
    1. Ringner M, Fredlund E, Hakkinen J, Borg A, Staaf J (2011) GOBO: gene expression-based outcome for breast cancer online. PLoS One 6: e17911. - PMC - PubMed

Publication types