Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 13;38(10):2773-2780.
doi: 10.1093/bioinformatics/btac212.

Airpart: interpretable statistical models for analyzing allelic imbalance in single-cell datasets

Affiliations

Airpart: interpretable statistical models for analyzing allelic imbalance in single-cell datasets

Wancen Mu et al. Bioinformatics. .

Abstract

Motivation: Allelic expression analysis aids in detection of cis-regulatory mechanisms of genetic variation, which produce allelic imbalance (AI) in heterozygotes. Measuring AI in bulk data lacking time or spatial resolution has the limitation that cell-type-specific (CTS), spatial- or time-dependent AI signals may be dampened or not detected.

Results: We introduce a statistical method airpart for identifying differential CTS AI from single-cell RNA-sequencing data, or dynamics AI from other spatially or time-resolved datasets. airpart outputs discrete partitions of data, pointing to groups of genes and cells under common mechanisms of cis-genetic regulation. In order to account for low counts in single-cell data, our method uses a Generalized Fused Lasso with Binomial likelihood for partitioning groups of cells by AI signal, and a hierarchical Bayesian model for AI statistical inference. In simulation, airpart accurately detected partitions of cell types by their AI and had lower Root Mean Square Error (RMSE) of allelic ratio estimates than existing methods. In real data, airpart identified differential allelic imbalance patterns across cell states and could be used to define trends of AI signal over spatial or time axes.

Availability and implementation: The airpart package is available as an R/Bioconductor package at https://bioconductor.org/packages/airpart.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of airpart framework. airpart takes as input allele-specific read counts, quantified upstream of our method. Known cell annotation or cell clusters derived from total counts are also part of the input to airpart. Following QC steps, clustering is performed on genes based on their allelic signal over cells. Then during the modeling step, a partition of the cell groups is generated as shown in heatmap, either by a GFL or a non-parametric method. Estimated coefficients of this gene cluster using GFL inform the prior of hierarchical Bayesian model. Finally, airpart outputs estimates of allelic ratio for each gene and cell group, as well as s-value or adjusted P-value for AI and DAI test, respectively. Multiple visualizations of input data, gene clustering and fitted parameters are available as functions within airpart software
Fig. 2.
Fig. 2.
Performance comparison of airpart variants and scDALI on simulation datasets. (A) Boxplot of partition accuracy among three variants of airpart. y-axis is ARI among 200 iterations. cnt, the higher mean count; n, number of genes within a gene cluster. (B) Boxplot of RMSE per gene for estimation of the allelic ratio for n =40 cells among 400 iterations. Each gene has an underlying U-shape pattern described in the Section 2.6. (C) Boxplot demonstrating airpart without cell-type grouping step and scDALI performance on each cell type at DAI = 0.2. The highlighted dots inside the boxes represent the simulated allelic ratios
Fig. 3.
Fig. 3.
Evaluation of airpart on two scRNA-seq experiments. (A) Violin plot of estimated allelic ratio on Larsson’s dataset with n indicating the number of cells. Color represents different partition groups. (B) Forest plot for Larsson’s dataset, showing top 40 genes with smallest s-value. Dotted line denotes allelic ratio = 0.5 (C) Step plot and heatmap of results for Deng’s dataset. This gene cluster partitioned cell types into five groups denoted by highlighted dots in the step plot. (D–G) Selected genes displaying airpart fitted model on Gutierrez-Arcelus’s data: (D) decreasing trend, (E) increasing trend, (F) up-down pattern and (G) down-up pattern

Similar articles

Cited by

  • Disentangling sex-dependent effects of APOE on diverse trajectories of cognitive decline in Alzheimer's disease.
    Ma H, Shi Z, Kim M, Liu B, Smith PJ, Liu Y, Wu G; Alzheimer's Disease Neuroimaging Initiative (ADNI). Ma H, et al. Neuroimage. 2024 Apr 15;292:120609. doi: 10.1016/j.neuroimage.2024.120609. Epub 2024 Apr 12. Neuroimage. 2024. PMID: 38614371 Free PMC article.
  • Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics.
    Tang W, Jørgensen ACS, Marguerat S, Thomas P, Shahrezaei V. Tang W, et al. Bioinformatics. 2023 Jul 1;39(7):btad395. doi: 10.1093/bioinformatics/btad395. Bioinformatics. 2023. PMID: 37354494 Free PMC article.
  • Single-cell genomics and regulatory networks for 388 human brains.
    Emani PS, Liu JJ, Clarke D, Jensen M, Warrell J, Gupta C, Meng R, Lee CY, Xu S, Dursun C, Lou S, Chen Y, Chu Z, Galeev T, Hwang A, Li Y, Ni P, Zhou X; PsychENCODE Consortium; Bakken TE, Bendl J, Bicks L, Chatterjee T, Cheng L, Cheng Y, Dai Y, Duan Z, Flaherty M, Fullard JF, Gancz M, Garrido-Martín D, Gaynor-Gillett S, Grundman J, Hawken N, Henry E, Hoffman GE, Huang A, Jiang Y, Jin T, Jorstad NL, Kawaguchi R, Khullar S, Liu J, Liu J, Liu S, Ma S, Margolis M, Mazariegos S, Moore J, Moran JR, Nguyen E, Phalke N, Pjanic M, Pratt H, Quintero D, Rajagopalan AS, Riesenmy TR, Shedd N, Shi M, Spector M, Terwilliger R, Travaglini KJ, Wamsley B, Wang G, Xia Y, Xiao S, Yang AC, Zheng S, Gandal MJ, Lee D, Lein ES, Roussos P, Sestan N, Weng Z, White KP, Won H, Girgenti MJ, Zhang J, Wang D, Geschwind D, Gerstein M. Emani PS, et al. bioRxiv [Preprint]. 2024 Mar 30:2024.03.18.585576. doi: 10.1101/2024.03.18.585576. bioRxiv. 2024. Update in: Science. 2024 May 24;384(6698):eadi5199. doi: 10.1126/science.adi5199. PMID: 38562822 Free PMC article. Updated. Preprint.
  • Computational methods for allele-specific expression in single cells.
    Qi G, Battle A. Qi G, et al. Trends Genet. 2024 Nov;40(11):939-949. doi: 10.1016/j.tig.2024.07.003. Epub 2024 Aug 10. Trends Genet. 2024. PMID: 39127549 Review.
  • Single-cell genomics and regulatory networks for 388 human brains.
    Emani PS, Liu JJ, Clarke D, Jensen M, Warrell J, Gupta C, Meng R, Lee CY, Xu S, Dursun C, Lou S, Chen Y, Chu Z, Galeev T, Hwang A, Li Y, Ni P, Zhou X; PsychENCODE Consortium‡; Bakken TE, Bendl J, Bicks L, Chatterjee T, Cheng L, Cheng Y, Dai Y, Duan Z, Flaherty M, Fullard JF, Gancz M, Garrido-Martín D, Gaynor-Gillett S, Grundman J, Hawken N, Henry E, Hoffman GE, Huang A, Jiang Y, Jin T, Jorstad NL, Kawaguchi R, Khullar S, Liu J, Liu J, Liu S, Ma S, Margolis M, Mazariegos S, Moore J, Moran JR, Nguyen E, Phalke N, Pjanic M, Pratt H, Quintero D, Rajagopalan AS, Riesenmy TR, Shedd N, Shi M, Spector M, Terwilliger R, Travaglini KJ, Wamsley B, Wang G, Xia Y, Xiao S, Yang AC, Zheng S, Gandal MJ, Lee D, Lein ES, Roussos P, Sestan N, Weng Z, White KP, Won H, Girgenti MJ, Zhang J, Wang D, Geschwind D, Gerstein M; PsychENCODE Consortium. Emani PS, et al. Science. 2024 May 24;384(6698):eadi5199. doi: 10.1126/science.adi5199. Epub 2024 May 24. Science. 2024. PMID: 38781369 Free PMC article.

References

    1. Andergassen D. et al. (2017) Mapping the mouse allelome reveals tissue-specific regulation of allelic expression. Elife, 6, e25125. - PMC - PubMed
    1. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol., 57, 289–300.
    1. Castel S.E. et al. (2015) Tools and best practices for data processing in allelic expression analysis. Genome Biol., 16, 1–12. - PMC - PubMed
    1. Castel S.E. et al.; GTEx Consortium. (2020) A vast resource of allelic expression data spanning human tissues. Genome Biol., 21, 1–12. - PMC - PubMed
    1. Choi K. et al. (2019) A Bayesian mixture model for the analysis of allelic expression in single cells. Nat. Commun., 10, 1–11. - PMC - PubMed

Publication types