Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 15;31(12):i293-302.
doi: 10.1093/bioinformatics/btv253.

Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival

Affiliations

Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival

A Grant Schissler et al. Bioinformatics. .

Abstract

Motivation: The conventional approach to personalized medicine relies on molecular data analytics across multiple patients. The path to precision medicine lies with molecular data analytics that can discover interpretable single-subject signals (N-of-1). We developed a global framework, N-of-1-pathways, for a mechanistic-anchored approach to single-subject gene expression data analysis. We previously employed a metric that could prioritize the statistical significance of a deregulated pathway in single subjects, however, it lacked in quantitative interpretability (e.g. the equivalent to a gene expression fold-change).

Results: In this study, we extend our previous approach with the application of statistical Mahalanobis distance (MD) to quantify personal pathway-level deregulation. We demonstrate that this approach, N-of-1-pathways Paired Samples MD (N-OF-1-PATHWAYS-MD), detects deregulated pathways (empirical simulations), while not inflating false-positive rate using a study with biological replicates. Finally, we establish that N-OF-1-PATHWAYS-MD scores are, biologically significant, clinically relevant and are predictive of breast cancer survival (P < 0.05, n = 80 invasive carcinoma; TCGA RNA-sequences).

Conclusion: N-of-1-pathways MD provides a practical approach towards precision medicine. The method generates the magnitude and the biological significance of personal deregulated pathways results derived solely from the patient's transcriptome. These pathways offer the opportunities for deriving clinically actionable decisions that have the potential to complement the clinical interpretability of personal polymorphisms obtained from DNA acquired or inherited polymorphisms and mutations. In addition, it offers an opportunity for applicability to diseases in which DNA changes may not be relevant, and thus expand the 'interpretable 'omics' of single subjects (e.g. personalome).

Availability and implementation: http://www.lussierlab.net/publications/N-of-1-pathways.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Method overview of N-of-1-pathways Mahalanobis Distance. (A) The input is represented by the gene expression of single patient paired samples (e.g. tumor versus normal tissue) filtered into a priori defined genesets (e.g. Gene Ontology Biological Processes: GO-BP pathways). (B) Calculation I is visualized by the bivariate relationship between normal and tumor gene expression values for a given geneset (e.g. GO-BP pathway). The vertical, signed Mahalanobis distance (MD), dj, is computed from each jth point (gene) to the diagonal line representing equal expression. (C) Calculation II: The mean MD represents the pathway-level deregulation from normal to tumor expression where a negative value indicates down-regulation and a positive value represents up-regulation. The gene indices are randomly resampled and the ‘average MD score’ is recomputed via bootstrapping (Chernick, 2008) to determine pathways with strong evidence of deregulation. (D) Calculation III: The bootstrap distribution of ‘average MD scores’. (E) The process results in pathway-level quantification of deregulation, an approach to obtain a Clinically Relevant Metric
Fig. 2.
Fig. 2.
Simulation study reveals that N-of-1-pathways MD powerfully detects artificially deregulated pathways. Each point represents one size of a simulated pathway generated by randomly selecting n genes and a ratio r of the deregulated genes within the pathway (Table 1 Dataset I, Section 2.6). The ratio r is artificially increased by a k-fold change in a simulated pathway generated from biological replicates, (k = 1.5, 2, 4). We then applied separately the N-of-1-pathways-Wilcoxon (bottom) and N-of-1-pathways-MD (top) methods to identify whether the truly deregulated pathway is detected. We repeated the process 5000 times at each combination of (n, k, r) to estimate the false negative error rate (Wilcoxon P values were Bonferroni adjusted with a 1% threshold). AAC, area above the curve, quantifies the proportion of simulated pathway combinations with false negative error less than 0.20 (the black curve labeled 0.20 is the reference for this measure). Higher AAC indicates a greater number of scenarios with at least 80% power to detect deregulated pathways. N-of-1-pathways-MD outperforms N-of-1-pathways-Wilcoxon at every fold-change, requiring fewer genes in the pathway and a smaller ratio of deregulated genes. Notably, the simulated false positive rate (0.0% deregulated genes; rate along the horizontal axis) is smaller for MD than Wilcoxon, averaging 0.14 and 0.94%, respectively. This rate can also be interpreted as the simulated rate of discovery when two non-tumor samples are paired. Legend, Sim. = simulated, AAC = area above curve
Fig. 3.
Fig. 3.
Evaluation of the false-positive rate of N-of-1-pathways MD compared to the Wilcoxon method. Pairs of biological replicates from breast cancer cell lines were used (Table 1 dataset III). 3228 GO-BP genesets were tested for each pair of biological replicates to find falsely deregulated pathways using both the N-of-1-pathways MD and Wilcoxon methods (Wilcoxon P values are Bonferroni adjusted and a 1% threshold is applied). Thin black lines are 95% pointwise Agresti-Coull intervals for the proportion of false positives; bar heights are the percentage of falsely identified deregulated pathways. Nof 1-pathways MD performs equally or better than Wilcoxon. Technical replicates showed similar results using GEO20194 (data not shown)
Fig. 4.
Fig. 4.
N-of-1-pathways MD GO-BP clinical importance metrics predict breast cancer patient survival. N-of-1-pathways MD was applied to n = 80 invasive breast carcinoma patients (TCGA_BRCA, RNA-seq, Table 1 dataset II) resulting in 3225 clinical importance metrics. Every patient has an N-of-1-pathways MD score for each of the identified deregulated pathways (2130 pathways identified in at least one patient) and we performed PCA and unsupervised clustering on these scores. As shown in the figure, unsupervised PAM clustering reveals distinct Kaplan–Meier survival curves (log-rank test P < 0.05). Additionally, the identified pathways can also be used to discover a fully specified classifier for good versus poor prognosis (Supplementary Table S5). Reducing dimensionality further, we constructed the clusters based on only the top 10 scored pathways and produced distinct survival curves (Supplementary Figure S7). When compared to gene expression, N-of-1-pathways performed similarly (Supplementary Figure S1). We found that pathway-level scores relate to pathologically determined stage (Wilcoxon P value between first principal component of MD Score = 0.02; data not shown), but did not identify receptor subtypes (ns; principal components 1–5 verified; data not shown)
Fig. 5.
Fig. 5.
N-of-1-pathways representation (star plot) of individual GO-BPs of diametric extreme patients. The top 15 most discriminating GO-BP terms were identified between the two groups of patients with diametric extreme phenotype (death of disease in less than 2.5 years, n = 5; at least 4 years of disease-free survival, n = 9; Section 2.10). (A) GO terms manually curated to interpretable categories. (B) The legend of the star plots, each edge corresponding to one GO term, each star reflects a single patient’s deregulation as measured by the MD CRM for each pathway. (C) A sample of eight patients’ star plots (four from each extreme). The white zone represents upregulated pathways (given the N-of-1-pathways direction of deregulation), while the grey zone stands for downregulation. The circle separating the gray and white areas represents nonderegulation (MD CRM = 0)

Similar articles

Cited by

References

    1. Ashburner M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25, 25–29. - PMC - PubMed
    1. Brown L.D., et al. (2001) Interval estimation for a binomial proportion. Stat. Sci., 16, 101–117.
    1. Chen J., et al. (2010) Protein interaction network underpins concordant prognosis among heterogeneous breast cancer signatures. J. Biomed. Informatics, 43, 385–396. - PMC - PubMed
    1. Chernick,M.R. (2008) Bootstrap Methods: A Guide for Practitioners and Researchers. John Wiley & Sons, Hoboken, New Jersey.
    1. Dillies M.-A., et al. (2013) A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform., 14, 671–683. - PubMed

Publication types