SLEPR: a sample-level enrichment-based pathway ranking method -- seeking biological themes through pathway-level consistency

Ming Yi¹, Robert M Stephens

Affiliations

PMID: 18818771
PMCID: PMC2546449
DOI: 10.1371/journal.pone.0003288

SLEPR: a sample-level enrichment-based pathway ranking method -- seeking biological themes through pathway-level consistency

Ming Yi et al. PLoS One. 2008.

. 2008 Sep 26;3(9):e3288.

doi: 10.1371/journal.pone.0003288.

Authors

Ming Yi¹, Robert M Stephens

Affiliation

¹ Advanced Biomedical Computing Center, Advanced Technology Program, SAIC-Frederick Inc, NCI-Frederick, Frederick, MD, USA.

PMID: 18818771
PMCID: PMC2546449
DOI: 10.1371/journal.pone.0003288

Abstract

Analysis of microarray and other high throughput data often involves identification of genes consistently up or down-regulated across samples as the first step in extraction of biological meaning. This gene-level paradigm can be limited as a result of valid sample fluctuations and biological complexities. In this report, we describe a novel method, SLEPR, which eliminates this limitation by relying on pathway-level consistencies. Our method first selects the sample-level differentiated genes from each individual sample, capturing genes missed by other analysis methods, ascertains the enrichment levels of associated pathways from each of those lists, and then ranks annotated pathways based on the consistency of enrichment levels of individual samples from both sample classes. As a proof of concept, we have used this method to analyze three public microarray datasets with a direct comparison with the GSEA method, one of the most popular pathway-level analysis methods in the field. We found that our method was able to reproduce the earlier observations with significant improvements in depth of coverage for validated or expected biological themes, but also produced additional insights that make biological sense. This new method extends existing analyses approaches and facilitates integration of different types of HTP data.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Schematic overview of SLEPR method (see Materials And Methods section for more details).**
The goal of SLEPR method is to use sample-level differentiated genes for each sample to capture the sample-level specificity for gene-level variance, and then use functional enrichment levels of these gene lists to evaluate pathway-level data consistency associated with the contrasted classes in study: Inclusion/Target class versus Exclusion/Background class (e.g., NGT versus DM2+IGT in the human type 2 diabetes mellitus (DM2) study [23]). Step 1 of SLEPR is to assign the samples to the Inclusion class (I) and Exclusion class (E). Then for each genes or features in study (i.e., G1, G2, G3…Gn), consider the data distribution and use median and MADe of data in samples of class E to set up the cutoff for sample-level differentiated genes for each genes (Step 2). Each gene Gi will have its own cutoff to determine if it is a sample-level differentiated gene. Gene Gi will be selected as the sample-level differentiated gene for a sample if the data of gene Gi in this sample is beyond the cutoff (Step 3). Each sample including samples from both I and E classes will have its own sample-level differentiated gene list (L1, L2, L3….) (Step 3). To determine the functional enrichment levels in any *a priori* defined gene sets, pathways, or functional categories (e.g., GO terms) for each of the sample-level differentiated lists, batch computation of Fisher's exact test based enrichment analysis is performed and the results are merged automatically into a matrix (e.g., Stanford format file) of enrichment scores which consists of enrichment scores of each sample from class I and E for each term (T1, T2, T3, …Tm), which are transformed as −log₁₀(p-value) of Fisher's exact test p-values (Step 4). To determine whether a gene set, pathway, or functional category (e.g., GO term) is significant in terms of how consistent it is enriched across samples, a pathway ranking algorithm is applied to the enrichment score matrix to obtain pathway ranking scores, which considers both positive contribution of class I and negative contribution of class E from individual sample-level enrichment level (see details in Materials And Methods section) (Step 5). To determine the statistical significance of actual ranking of a gene set or a GO term in the contrasted classes: I versus E, the entire procedure (steps 1 to 5) is repeated 1000 times or more by simply permutating the class labels for each selected samples (Step 6). The pathway ranking scores of each term from each permutation are pooled together and used to build the empirically derived distribution of pathway ranking scores from the permutation procedure. The permutated p-value for each term is calculated as the fraction of random trials resulting in permutated pathway ranking scores higher than the actual score from the original sample assignments.

**Figure 2. Heatmap of enrichment scores in all samples from NGT versus IGT and DM2 for the top 17 ranked terms of SLEPR result listed in Table S2.**
The enrichment scores, which in general derived from Fisher's exact test p-value using formula (−Log10(p-value)), were floored to 0 if the ListHits<2 or p-value>0.05. The rows of the heatmap are the ranked terms in the same order as in Table S2 (Top 7 of them shown in Table 1) from top to bottom with the higher ranks at the top. The gradient of red color in heatmap indicated the enrichment levels.

**Figure 3. Heatmap of enrichment scores of sample-level differentiated genes of all samples in human GNF tissue dataset**
for the top 8 ranked GO biological process terms shown in Table 7. The enrichment scores, which in general derived from Fisher's exact test p-value using formula (−Log10(p-value)), were floored to 0 if the ListHits<2 or p-value>0.05. The rows of the heatmap are the terms and columns are tissue samples from the dataset. The gradient of red color in heatmap indicated the enrichment levels.

See this image and copyright information in PMC

References

1. Eisen MB, Spellman PT, Brown PO, Botstein D. Clustering analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. - PMC - PubMed
1. Hartigan JA, Wong MA. A k-means clustering algorithm. Applied Statistics. 1979;28:100–108.
1. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, et al. Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999;96:2907–2912. - PMC - PubMed
1. Yi M, Horton JD, Cohen JC, Hobbs HH, Stephens RM. WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data. BMC Bioinformatics. 2006;7:30. - PMC - PubMed
1. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to ionizing radiation response. Proc Natl Acad Sci USA. 2001;98:5116–5121. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SLEPR: a sample-level enrichment-based pathway ranking method -- seeking biological themes through pathway-level consistency

Affiliation

SLEPR: a sample-level enrichment-based pathway ranking method -- seeking biological themes through pathway-level consistency

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources