Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 15;31(8):1235-42.
doi: 10.1093/bioinformatics/btu802. Epub 2014 Dec 4.

QuASAR: quantitative allele-specific analysis of reads

Affiliations

QuASAR: quantitative allele-specific analysis of reads

Chris T Harvey et al. Bioinformatics. .

Abstract

Motivation: Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls.

Results: We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available.

Availability and implementation: http://github.com/piquelab/QuASAR.

Contact: fluca@wayne.edu or rpique@wayne.edu

Supplementary information: Supplementary Material is available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Reference allele frequency from reads overlapping SNPs. (Left) Each dot represents an SNP covered by at least 15 RNA-seq reads. The y-axis represents the fraction of RNA-seq reads that match the reference allele (observed ρ^l). The x-axis represents the order of the SNP position in a chromosome. (Right) Histogram showing the distribution of ρ^l values across the genome. The three modes (ρ{1,0.5,0}) correspond, respectively, to the three possible genotypes: homozygous reference (RR), heterozygous under no ASE (RA), and homozygous alternate (AA)
Fig. 2.
Fig. 2.
Empirical power in detecting heterozygous SNPs as a function of sequencing depth. Each point represents a single input dataset to QuASAR: either as a single experiment replicate and time point (red dot), combining multiple time points (2 = green, 3 = blue, 6 = purple) or combining replicates (1 = dot, 2 = triangle). The x-axis represents the total number of RNA-seq reads in the fastq input files. The y-axis represents the log10 of the total number of SNPs that are determined to be heterozygous
Fig. 3.
Fig. 3.
Empirical power in detecting ASE as a function of the number of heterozygous SNPs detected. Each point represents a single input dataset to QuASAR as in Figure 2. The x-axis represents the total number of SNPs that are determined to be heterozygous. The y-axis represents the log10 of the number of SNPs that have a significant P value for ASE at 10% FDR
Fig. 4.
Fig. 4.
QQplot comparing the P value distribution of three alternative methods for determining ASE. The x-axis shows the log10 quantiles of the P values expected from the null distribution. The y-axis shows the log10 quantiles of the P values computed from the real data using three different methods: (i) binomial (black) assumes M= no overdispersion; (ii) beta-binomial (green) considers overdispersion but does not consider uncertainty in the genotype and (iii) QuASAR (blue) uses the beta-binomial distribution and uncertainty in the genotype calls. In all three cases, the same set of SNPs is considered. The shaded area in gray indicates a 95% confidence band for the null distribution

References

    1. Barreiro L.B., et al. (2012) Deciphering the genetic architecture of variation in the immune response to Mycobacterium tuberculosis infection. Proc. Natl Acad. Sci. USA, 109, 1204–1209. - PMC - PubMed
    1. Cowper-Sal lari R., et al. (2012) Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat. Genet., 44, 1191–1198. - PMC - PubMed
    1. Degner J.F., et al. (2009) Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics , 25,3207–3212. - PMC - PubMed
    1. Degner J.F., et al. (2012) DNaseI sensitivity QTLs are a major determinant of human expression variation. Nature , 482, 390–394. - PMC - PubMed
    1. DePristo M.A., et al. (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet., 43, 491–498. - PMC - PubMed

Publication types