Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 6;103(6):907-917.
doi: 10.1016/j.ajhg.2018.10.025. Epub 2018 Nov 29.

OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data

Affiliations

OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data

Felix Brechtmann et al. Am J Hum Genet. .

Abstract

RNA sequencing (RNA-seq) is gaining popularity as a complementary assay to genome sequencing for precisely identifying the molecular causes of rare disorders. A powerful approach is to identify aberrant gene expression levels as potential pathogenic events. However, existing methods for detecting aberrant read counts in RNA-seq data either lack assessments of statistical significance, so that establishing cutoffs is arbitrary, or rely on subjective manual corrections for confounders. Here, we describe OUTRIDER (Outlier in RNA-Seq Finder), an algorithm developed to address these issues. The algorithm uses an autoencoder to model read-count expectations according to the gene covariation resulting from technical, environmental, or common genetic variations. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. The model is automatically fitted to achieve the best recall of artificially corrupted data. Precision-recall analyses using simulated outlier read counts demonstrated the importance of controlling for covariation and significance-based thresholds. OUTRIDER is open source and includes functions for filtering out genes not expressed in a dataset, for identifying outlier samples with too many aberrantly expressed genes, and for detecting aberrant gene expression on the basis of false-discovery-rate-adjusted p values. Overall, OUTRIDER provides an end-to-end solution for identifying aberrantly expressed genes and is suitable for use by rare-disease diagnostic platforms.

Keywords: RNA sequencing; aberrant gene expression; normalization; outlier detection; rare disease.

PubMed Disclaimer

Figures

Figure 1
Figure 1
OUTRIDER Overview (A) Context-dependent outlier detection. The algorithm identifies gene expression outliers whose read counts are significantly aberrant given the covariations typically observed across genes in an RNA-seq dataset. This is illustrated by a read count (left panel, fifth column, second row from the bottom) that is exceptionally high in the context of correlated samples (left six samples) but not in absolute terms for this given gene. To capture commonly seen biological and technical contexts, an autoencoder models covariations in an unsupervised fashion and predicts read-count expectations. Comparing the earlier mentioned read count with these context-dependent expectations reveals that it is exceptionally high (right panel). The lower panels illustrate the distribution of read counts before and after controlling for covariations for the relevant gene. The red dotted lines depict significance cutoffs. (B) Schema showing the differences in the experimental designs for differential expression analyses and outlier detection analyses; relevant analysis packages are mentioned.
Figure 2
Figure 2
Using the NB Distribution for Significance Assessment Normalized RNA-seq read counts plotted against their rank (A and C) and quantile-quantile plots of observed p values against expected p values with 95% confidence bands (B and D); outliers are shown in red (FDR < 0.05). Shown are data for TRIM33 (MIM: 605769) with no detected expression outlier (A and B) and data for SLC39A4 (MIM: 607059) with two expression outliers (C and D).
Figure 3
Figure 3
RNA-Seq Expression-Outlier Detection (A and B) Quantile-quantile plots for the GTEx (A) and Kremer datasets (B). Observed p values are plotted against the expected p values for three different methods. The diagonal marks the expected distribution under the null hypothesis with 95% confidence bands (gray). (C and D) Number of aberrant genes (FDR < 0.05) per sample for the data shown in (A) and (B) (C and D, respectively). The dashed line represents the abnormal sample cutoff (>0.5% aberrantly expressed genes). (E and F) p values versus Z scores for a representative abnormal sample in PEER (E) and the same sample in OUTRIDER (F). Genes with significantly aberrant read counts are marked in red.
Figure 4
Figure 4
Outlier-Detection Benchmark The proportion of simulated outliers among reported outliers (precision) plotted against the proportion of reported simulated outliers among all simulated outliers (recall) for increasing p values up to FDR < 0.05 (OUTRIDER) or decreasing absolute Z scores (PCA and PEER). Plots are provided for four simulated amplitudes (by row with simulated absolute Z scores of 2, 3, 4, and 6 from top to bottom) and for three simulation scenarios (by column from left to right: aberrantly high and low counts, aberrantly high counts, and aberrantly low counts). The read counts were controlled for gene covariation with OUTRIDER (green), PCA (orange), or PEER (blue). The ranking of outliers was bootstrapped to yield 95% confidence bands.
Figure 5
Figure 5
OUTRIDER SNP Enrichment Enrichment of rare (MAF < 0.05) moderate- and high-impact variants (according to the VEP) computed on genes found to be aberrantly expressed by OUTRIDER is plotted against enrichments computed on genes found to be aberrantly expressed by PCA or PEER for all GTEx tissues with three different p value cutoffs.

References

    1. Taylor J.C., Martin H.C., Lise S., Broxholme J., Cazier J.-B., Rimmer A., Kanapin A., Lunter G., Fiddy S., Allan C. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat. Genet. 2015;47:717–726. - PMC - PubMed
    1. Wortmann S.B., Koolen D.A., Smeitink J.A., van den Heuvel L., Rodenburg R.J. Whole exome sequencing of suspected mitochondrial patients in clinical practice. J. Inherit. Metab. Dis. 2015;38:437–443. - PMC - PubMed
    1. Wright C.F., FitzPatrick D.R., Firth H.V. Paediatric genomics: diagnosing rare disease in children. Nat. Rev. Genet. 2018;19:253–268. - PubMed
    1. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R., 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. - PMC - PubMed
    1. MacArthur D.G., Manolio T.A., Dimmock D.P., Rehm H.L., Shendure J., Abecasis G.R., Adams D.R., Altman R.B., Antonarakis S.E., Ashley E.A. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014;508:469–476. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources