Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan 21:9:30.
doi: 10.1186/1471-2105-9-30.

SignS: a parallelized, open-source, freely available, web-based tool for gene selection and molecular signatures for survival and censored data

Affiliations

SignS: a parallelized, open-source, freely available, web-based tool for gene selection and molecular signatures for survival and censored data

Ramon Diaz-Uriarte. BMC Bioinformatics. .

Abstract

Background: Censored data are increasingly common in many microarray studies that attempt to relate gene expression to patient survival. Several new methods have been proposed in the last two years. Most of these methods, however, are not available to biomedical researchers, leading to many re-implementations from scratch of ad-hoc, and suboptimal, approaches with survival data.

Results: We have developed SignS (Signatures for Survival data), an open-source, freely-available, web-based tool and R package for gene selection, building molecular signatures, and prediction with survival data. SignS implements four methods which, according to existing reviews, perform well and, by being of a very different nature, offer complementary approaches. We use parallel computing via MPI, leading to large decreases in user waiting time. Cross-validation is used to asses predictive performance and stability of solutions, the latter an issue of increasing concern given that there are often several solutions with similar predictive performance. Biological interpretation of results is enhanced because genes and signatures in models can be sent to other freely-available on-line tools for examination of PubMed references, GO terms, and KEGG and Reactome pathways of selected genes.

Conclusion: SignS is the first web-based tool for survival analysis of expression data, and one of the very few with biomedical researchers as target users. SignS is also one of the few bioinformatics web-based applications to extensively use parallelization, including fault tolerance and crash recovery. Because of its combination of methods implemented, usage of parallel computing, code availability, and links to additional data bases, SignS is a unique tool, and will be of immediate relevance to biomedical researchers, biostatisticians and bioinformaticians.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Fold increase in speed (ratio of user wall times) of R code from code changes to sequential code (in a) and parallelization (a and b). a) Timings from functions "gdcvpl" (original code) and its equivalent "tauBestP" (SignS), which use cross-validation to find the best parameters. b, c, d) Timings using analysis that include cross-validation of the final model. Numbers on top of points: user wall times in seconds. Benchmarks obtained in an otherwise idle cluster with 30 nodes, each with two dual-core AMD Opteron 2.2 GHz CPUs and 6 GB RAM, running Debian GNU/Linux and a stock 2.6.8 kernel, version 7.1.2 of LAM/MPI and version 2.1.4 (patched) of R. DLBCL data set from [4]; when number of arrays, n, ≤ 160 and number of genes, p, ≤ 7399, we use the first n arrays and the first p genes of the data set. For number of genes p > 7399 we expand the data set creating new genes from the previous (real) ones with Gaussian noise added.
Figure 2
Figure 2
User wall time of the web-based application. User wall time as a function of number of simultaneous users for two different (and real) data sets, obtained from [4]. To increase the realism of simultaneous accesses, there is delay of 5 seconds between simultaneous accesses, as might be expected, for example, from a classroom demonstration (i.e., when simulating 10 simultaneous users, the cluster is actually receiving new connections over a 10 * 5 second period, with one new connection every 5 seconds). Shown are box-plots of user wall times from several runs: 5 runs for 1 and 5 users, 10 runs for 10 users and 15 runs for 15 users. Hardware and software the same as in Figure 1.

Similar articles

Cited by

References

    1. Dave SS, Wright G, Tan B, Rosenwald A, Gascoyne RD, Chan WC, Fisher RI, Braziel RM, Rimsza LM, Grogan TM, Miller TP, LeBlanc M, Greiner TC, Weisenburger DD, Lynch JC, Vose J, Armitage JO, Smeland EB, Kvaloy S, Holte H, Delabie J, Connors JM, Lansdorp PM, Ouyang Q, Lister TA, Davies AJ, Norton AJ, Muller-Hermelink HK, Ott G, Campo E, Montserrat E, Wilson WH, Jaffe ES, Simon R, Yang L, Powell J, Zhao H, Goldschmidt N, Chiorazzi M, Staudt LM. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N Engl J Med. 2004;351:2159–2169. - PubMed
    1. Gui J, Li H. Threshold gradient descent method for censored data regression with applications in pharmacogenomics. Pac Symp Biocomput. 2005:272–283. - PubMed
    1. Hothorn T, Bühlmann P, Dudoit S, Molinaro A, van der Laan MJ. Survival Ensembles. Biostatistics. 2006;7:355–373. - PubMed
    1. Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2004;2 - PMC - PubMed
    1. Bair R, Hastie T, Paul D, Tibshirani R. Prediction by Supervised Principal Components. Journal American Statistical Association. 2006;101:119–137.

Publication types