Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov 29;5(11):103.
doi: 10.1186/gm509. eCollection 2013.

Comparison of methods to identify aberrant expression patterns in individual patients: augmenting our toolkit for precision medicine

Affiliations

Comparison of methods to identify aberrant expression patterns in individual patients: augmenting our toolkit for precision medicine

Daniel Bottomly et al. Genome Med. .

Abstract

Background: Patient-specific aberrant expression patterns in conjunction with functional screening assays can guide elucidation of the cancer genome architecture and identification of therapeutic targets. Since most statistical methods for expression analysis are focused on differences between experimental groups, the performance of approaches for patient-specific expression analyses are currently less well characterized. A comparison of methods for the identification of genes that are dysregulated relative to a single sample in a given set of experimental samples, to our knowledge, has not been performed.

Methods: We systematically evaluated several methods including variations on the nearest neighbor based outlying degree method, as well as the Zscore and a robust variant for their suitability to detect patient-specific events. The methods were assessed using both simulations and expression data from a cohort of pediatric acute B lymphoblastic leukemia patients.

Results: We first assessed power and false discovery rates using simulations and found that even under optimal conditions, high effect sizes (>4 unit differences) were necessary to have acceptable power for any method (>0.9) though high false discovery rates (>0.1) were pervasive across simulation conditions. Next we introduced a technical factor into the simulation and found that performance was reduced for all methods and that using weights with the outlying degree could provide performance gains depending on the number of samples and genes affected by the technical factor. In our use case that highlights the integration of functional assays and aberrant expression in a patient cohort (the identification of gene dysregulation events associated with the targets from a siRNA screen), we demonstrated that both the outlying degree and the Zscore can successfully identify genes dysregulated in one patient sample. However, only the outlying degree can identify genes dysregulated across several patient samples.

Conclusion: Our results show that outlying degree methods may be a useful alternative to the Zscore or Rscore in a personalized medicine context especially in small to medium sized (between 10 and 50 samples) expression datasets with moderate to high sample-to-sample variability. From these results we provide guidelines for detection of aberrant expression in a precision medicine context.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The outlying degree outperforms other methods in both high and low variability simulated datasets. (A) Expression data was simulated from two distributions (normal with mean of seven and standard deviation of one as well as a t-distribution with non-centrality parameter set to seven and the degrees of freedom equal to fifteen) that were at the extremes of what would be typically observed in microarray data with the distribution of hypothetical patient data situated somewhere in the middle. (B, C) The outlying degree (k = 9) significantly outperformed both the Zscore and Rscore method in terms of power and false discovery for all combinations of effect size and distribution type. However, all the methods were only effective when encountering high effect sizes (four to five) with low variability (normal distribution). The grey areas indicate 0.95 confidence intervals. Note that for the false discovery rate, the estimates were very stable and the grey area is not readily observable. OD, outlying degree.
Figure 2
Figure 2
The weighted outlying degree can attenuate the effect of sample-specific technical variability. (A) An example of a simulated dataset from the normal distribution with a technical factor affecting 2,500 of the 10,000 genes of sample one, making it divergent. The size of the effect is a two-unit decrease. (B, C) display power and false discovery rate estimates for the methods based on similar simulations to (A), where either 2,500 or 7,500 genes of one or three samples were affected. The effect size was kept at five units. The WODb method outperforms the others at least for the case where the number of divergent samples was equal to three. The grey areas indicate 0.95 confidence intervals. Note that for the false discovery rate, the estimates were very stable and the grey area is not readily observable. OD, outlying degree method; WODa, weighted outlying degree with weighting performed after nearest neighbor computations; WODb, weighted outlying degree with weighting performed before nearest neighbor computations.
Figure 3
Figure 3
The outlying degree is more robust to variability across samples than the Zscore in experimental data. (A) The top five genes for both the Zscore and outlying degree method were found for sample 09206. From comparison purposes we plotted the distribution of the expression levels of the 12 patient samples for the top five ranked genes in either method. It was clear that the Zscore ranked higher those genes where a single outlier was found with the remainder of the samples tightly grouped together whereas the outlying degree (k = 6) ranked higher those genes with large differences while tolerating more expression variability between the samples. Exon-level summary of the genes ranked the highest in (A) (Rank 1) are shown for both (B) the Zscore and (C) the outlying degree methods.

References

    1. Tyner JW, Deininger MW, Loriaux MM, Chang BH, Gotlib JR, Willis SG, Erickson H, Kovacsovics T, O'Hare T, Heinrich MC, Druker BJ. RNAi screen for rapid therapeutic target identification in leukemia patients. Proc Natl Acad Sci. 2009;5:8695–8700. doi: 10.1073/pnas.0903233106. - DOI - PMC - PubMed
    1. Maxson JE, Gotlib J, Pollyea DA, Fleischman AG, Agarwal A, Eide CA, Bottomly D, Wilmot B, McWeeney SK, Tognon CE, Pond JB, Collins RH, Goueli B, Oh ST, Deininger MW, Chang BH, Loriaux MM, Druker BJ, Tyner JW. Oncogenic CSF3R mutations in chronic neutrophilic leukemia and atypical CML. N Engl J Med. 2013;5:1781–1790. doi: 10.1056/NEJMoa1214514. - DOI - PMC - PubMed
    1. Vaske CJ, Benz SC, Sanborn JZ, Earl D, Szeto C, Zhu J, Haussler D, Stuart JM. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. 2010;5:i237–i245. doi: 10.1093/bioinformatics/btq182. - DOI - PMC - PubMed
    1. Gundem G, Lopez-Bigas N. Sample level enrichment analysis unravels shared stress phenotypes among multiple cancer types. Genome Med. 2012;5:28. doi: 10.1186/gm327. - DOI - PMC - PubMed
    1. Yi M, Stephens RM. SLEPR: A sample-level enrichment-based pathway ranking method—seeking biological themes through pathway-level consistency. PLoS One. 2008;5:e3288. doi: 10.1371/journal.pone.0003288. - DOI - PMC - PubMed