Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Nov 1;35(22):4830-4833.
doi: 10.1093/bioinformatics/btz485.

SIGN: similarity identification in gene expression

Affiliations

SIGN: similarity identification in gene expression

Seyed Ali Madani Tonekaboni et al. Bioinformatics. .

Abstract

Motivation: High-throughput molecular profiles of human cells have been used in predictive computational approaches for stratification of healthy and malignant phenotypes and identification of their biological states. In this regard, pathway activities have been used as biological features in unsupervised and supervised learning schemes.

Results: We developed SIGN (Similarity Identification in Gene expressioN), a flexible open-source R package facilitating the use of pathway activities and their expression patterns to identify similarities between biological samples. We defined a new measure, the transcriptional similarity coefficient, which captures similarity of gene expression patterns, instead of quantifying overall activity, in biological pathways between the samples. To demonstrate the utility of SIGN in biomedical research, we establish that SIGN discriminates subtypes of breast tumors and patients with good or poor overall survival. SIGN outperforms the best models in DREAM challenge in predicting survival of breast cancer patients using the data from the Molecular Taxonomy of Breast Cancer International Consortium. In summary, SIGN can be used as a new tool for interrogating pathway activity and gene expression patterns in unsupervised and supervised learning schemes to improve prognostic risk estimation for cancer patients by the biomedical research community.

Availability and implementation: An open-source R package is available (https://cran.r-project.org/web/packages/SIGN/).

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic representation of Similarity Identification in Gene expressioN (SIGN) and transcriptional similarity coefficient (TSC). (A) Gene Ontology (GO) terms (BP: biological processes; MF: molecular functions; CC: cellular components), or other pathway datasets, are used to identify similarity between samples. (B) TSC is used to identify similarity of expression pattern of the genes within a given pathway between two sets of samples. (C) Collection of TSCs pathways between two sets of samples are used in unsupervised or supervised learning schemes
Fig. 2.
Fig. 2.
Patient stratification performance for discovery and validation cohort in Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). (A) Clustering of tumor samples in discovery cohort of METABRIC using ESR1 and ERBB2 gene signatures (Gendoo et al., 2016) and TSC as the measure of distance between samples. (B) Survival of patients under different treatment regimens in discovery cohort of METABRIC. (C) Performance of SIGN on METABRIC discovery cohort stratified by treatments (log-rank test). (D) Performance of SIGN on predicting survival of patients under Chemotherapy+Hormonal therapy (Chemo/Hormo) in discovery cohort of METABRIC. (E) Performance of SIGN on METABRIC validation cohort. (F) Performance of SIGN with respect to the top 10 ranked methods in DREAM challenge (Bilal et al., 2013) in predicting patient survival in validation cohort in METABRIC using Discovery cohort. Baseline is cox model of clinical features

References

    1. Barbie D.A. et al. (2009) Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462, 108–112. - PMC - PubMed
    1. Bilal E. et al. (2013) Improving breast cancer survival analysis through competition-based multidimensional modeling. PLoS Comput. Biol., 9, e1003047. - PMC - PubMed
    1. Bindra R.S. et al. (2005) Alterations in DNA repair gene expression under hypoxia: elucidating the mechanisms of hypoxia-induced genetic instability. Ann. N.Y. Acad. Sci., 1059, 184–195. - PubMed
    1. Campbell P.D., Marlow F.L. (2013) Temporal and tissue specific gene expression patterns of the zebrafish kinesin-1 heavy chain family, kif5s, during development. Gene Expr. Patterns, 13, 271–279. - PMC - PubMed
    1. Chatterjee R., Vinson C. (2012) CpG methylation recruits sequence specific transcription factors essential for tissue specific gene expression. Biochim. Biophys. Acta, 1819, 763–770. - PMC - PubMed

Publication types

LinkOut - more resources