Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 8:7:12817.
doi: 10.1038/ncomms12817.

Rare variant phasing and haplotypic expression from RNA sequencing with phASER

Affiliations

Rare variant phasing and haplotypic expression from RNA sequencing with phASER

Stephane E Castel et al. Nat Commun. .

Abstract

Haplotype phasing of genetic variants is important for clinical interpretation of the genome, population genetic analysis and functional genomic analysis of allelic activity. Here we present phASER, an accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA sequencing (RNA-seq), which often span multiple exons due to splicing. Using diverse RNA-seq data we demonstrate that this provides more accurate phasing of rare variants compared with population-based phasing and allows phasing of variants in the same gene up to hundreds of kilobases away that cannot be obtained from DNA sequencing (DNA-seq) reads. We show that in the context of medical genetic studies this improves the resolution of compound heterozygotes. Additionally, phASER provides measures of haplotypic expression that increase power and accuracy in studies of allelic expression. In summary, phasing using RNA-seq and phASER is accurate and improves studies where rare variant haplotypes or allelic expression is needed.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Read backed haplotype phasing that incorporates RNA-seq using phASER.
(a) phASER produces accurate variant phasing through the use of combined DNA and RNA read backed phasing integrated with population phasing. Due to splicing, RNA-seq reads often span exons and UTRs, allowing read backed phasing over long ranges, while high coverage exome and whole genome sequencing can phase close proximity variants. For each group of read connected variants a local haplotype is produced by testing all possible phase configurations, and selecting the configuration with the most support (Supplementary Fig. 1). Local haplotype blocks can be phased relative to one another when population data is available by anchoring the phase to common variants, where the population phase is likely correct. (b) Concordance of read backed phasing across sequencing assays and population phasing with phasing by transmission using the Illumina NA12878 Platinum Genome as a function of variant minor allele frequency. Concordance is defined per variant as the percentage of variant—variant phase events that are correct as compared with the known transmission phase. (c) Percentage of phased variants that can be phased at greater than or equal to increasing genomic distances using WES, WGS, paired-end 75 and 250 RNA-seq data in two tissues (whole blood and LCLs) of four GTEx individuals. Solid lines represent the means, and dotted lines the standard error. (d,e) Contribution of read backed phasing at rare coding (MAF≤1%) variants (d) and all rare variants (e) across sequencing assays and GTEx RNA-seq tissue types for four individuals. Values shown are the mean percentage of rare variants within an individual that can be assigned a genome wide phase using phase anchoring. Error bars show the standard error. The fold increase in the number of rare variants that can be phased using DNA-seq with the addition of combined RNA-seq libraries is indicated.
Figure 2
Figure 2. Application of RNA-seq based haplotype phasing to studies of functional variants and allelic expression analysis.
(a) Instances of compound heterozygosity involving rare (MAF<0.01) loss of function (L), probably damaging (D) or possibly damaging (P) coding variants called using phase data generated by phASER with either RNA-seq reads, exome-seq reads, or both for 345 1,000 Genomes European individuals with Geuvadis LCL RNA-seq data. The fold increases in the number of compound heterozygotes resolved when RNA-seq data is included are indicated. (b) Example application of phASER to prioritize rare (alternative AF<0.01 in 1,000 Genomes) recessive alleles in a medical genetics study that includes both WES and RNA-seq in a tissue of clinical relevance. Boxplots show the number of heterozygous alleles per individual after these successive filtering steps were applied: CADD phred score≥15, expressed in fibroblast RNA-seq data, phased with read backed phasing, involved in either trans or cis interactions with another deleterious variant (CADD≥15) using RNA and exome data (RNA+WES) or exome alone (WES). The fold increases from including RNA-seq data are indicated. (c) The difference in percentage of individuals with significant allelic imbalance (binomial test, FDR<0.05) for each gene with a known heterozygous cis expression quantitative trait loci (eQTL) calculated by either summing all single variant read counts across haplotypes using population phasing, or by summing phASER haplotype blocks phased relative to each other with phase anchoring (Supplementary Fig. 7). Genes where an increase in the percentage of individuals with significant allelic imbalance is observed when summing single variant counts are coloured red, representing false positives, while those with a decrease, representing false negatives, are coloured blue. The bar plot above indicates the percentage of the 1,118 genes where allelic expression was measured that fall into each category.

References

    1. Roach J. C. et al.. Chromosomal haplotypes by genetic phasing of human families. Am. J. Hum. Genet. 89, 382–397 (2011). - PMC - PubMed
    1. Delaneau O., Marchini J. & Zagury J.-F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012). - PubMed
    1. Browning S. R. & Browning B. L. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011). - PMC - PubMed
    1. Kuleshov V. et al.. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32, 261–266 (2014). - PMC - PubMed
    1. Pendleton M. et al.. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015). - PMC - PubMed

Publication types