Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Dec 18:2025.12.16.25341855.
doi: 10.64898/2025.12.16.25341855.

Improved Identification of Large-effect Rare Genetic Variants using Haplotype Aggregated Allele-specific Expression Data

Affiliations

Improved Identification of Large-effect Rare Genetic Variants using Haplotype Aggregated Allele-specific Expression Data

Kaushik Ram Ganapathy et al. medRxiv. .

Abstract

Allele-specific expression (ASE) outlier detection is a powerful tool for identifying genes affected by large effect rare genetic regulatory variants but suffers from data sparsity and noisy signal in low-count genes. Genome phasing can be utilized to aggregate ASE signal along haplotypes to alleviate both sparsity and noise. Yet statistical tools for utilizing haplotype-level ASE data for rare variant interpretation are lacking. Here, we present ANEVA-h, to quantify the amount of genetic variation in gene expression from haplotype-level ASE data in a population, enabling more accurate and comprehensive detection of regulatory effects. We apply ANEVA-h to GTEx project data, along with a compatible dosage outlier test, to show an over 2-fold increase in the number of testable genes, reduction of spurious outlier calls, and improved enrichment for rare high-impact variants. In clinical cohorts of neuromuscular and congenital heart disease, it enhances gene prioritization and identifies candidate diagnoses missed by DROP-MAE and ANEVA. Finally, we analyze globally diverse populations to characterize the impact of ancestry background in reference and the test population. We provide tools and data necessary to facilitate integration of haplotype level ASE outlier testing in rare variant interpretation pipelines.

PubMed Disclaimer

Conflict of interest statement

P.M. was supported by the National Institutes of Health under award number R01GM140287. T.L. is an advisor to and owns equity in Variant Bio. AT is a co-founder and equity share holder of GeneXwell Inc and an advisor to InsideTracker.

Figures

Figure 1
Figure 1. Allele-Specific Expression (ASE) Analysis and Haplotype Aggregation
(A) Schematic demonstrating haplotype aggregation of ASE reads in a RNA-seq library using SNPs in a phased haplotype block [21]. (B) Average number of protein-coding genes per sample with detectable ASE (total expression count > 10) exclusively at variant level, haplotype level, and those common to both across 49 GTEx tissues. Error bars represent the 95% confidence intervals (CI) of the average number of genes for each tissue. (C) Expression gain of ASE, calculated as the ratio of haplotype-level to variant-level total counts as a function of available ASE SNPs among commonly available protein coding genes. Error bars represent the 95% CI of the mean gain among 49 GTEx tissues.
Figure 2
Figure 2. ANEVA-h provides accurate estimates of genetically regulated variation in gene expression in GTEx project data.
(A) Distribution of genes with VG estimates per GTEx tissue. (B) Increase in number of genes with VGs as a function of GTEx tissue sample size (C) total genes with VG estimates across 1–49 GTEx tissues. (D) Average percentage of genes with VG estimates by median TPM (normalized to bins). Error bars represent the 95% CI of the mean among 49 GTEx tissues. (E-F) Distribution of the average number of DOT Tested genes (E) and DOT outliers (q-value ≤ 0.05) per sample (F).
Figure 3
Figure 3. Improvements in Rare Variant Enrichment from Haplotype Aggregation in GTEx
(A) Relative risk estimates for rare variants (MAF < 0.1%) stratified by Variant Effect Predictor (VEP) consequence, comparing five models: Variant-level ANEVA, Haplotype-level ANEVA-h, DROP-MAE, Beta-Binomial, and Binomial models, in GTEx v8 adipose subcutaneous tissue (B-C) Rare variant relative risk stratified by statistical significance (FDR-corrected) (B) and VEP outlier classification (C) in variant-level and haplotype-level data. (D) Number of false-positive (F.P.) prone genes excluded per tissue in variant-level and haplotype-level ASE analyses across 49 GTEx tissues.
Figure 4
Figure 4. ANEVA-h Improves Diagnostic Utility in Rare Inherited Muscle Disease Cases
(A-E) Analysis in muscular disease cohorts: (A) Number of genes tested per sample with whole-exome sequencing (WES); (B) Number of genes tested per sample with whole-genome sequencing (WGS); (C) Positive MDM genes identified per sample; (D) ANEVA-DOT test positivity rates. (E) Number of true-positive cases captured across variant, haplotype and DROP-MAE pipelines
Figure 5
Figure 5. ANEVA-h Identifies Ancestry-Specific Variability in VG Estimates
(A) Hierarchical clustering of ANEVA-h VG confidence intervals across commonly tested genes from 25 population groups. Population codes correspond to standard abbreviations used within the 1000 genomes project. (B) Number of median ANEVA-DOT outliers across all tested ASE population-VG combinations.

References

    1. Cleary S. and Seoighe C., Perspectives on allele-specific expression. Annual Review of Biomedical Data Science, 2021. 4: p. 101–122.
    1. Lappalainen T., et al. , Transcriptome and genome sequencing uncovers functional variation in humans. Nature, 2013. 501(7468): p. 506–511. - PMC - PubMed
    1. Yépez V.A., et al. , Clinical implementation of RNA sequencing for Mendelian disease diagnostics. Genome medicine, 2022. 14(1): p. 38. - PMC - PubMed
    1. Byron S.A., et al. , Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nature Reviews Genetics, 2016. 17(5): p. 257–271.
    1. Mohammadi P., et al. , Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome research, 2017. 27(11): p. 1872–1884. - PMC - PubMed

Publication types

LinkOut - more resources