Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022;117(537):38-51.
doi: 10.1080/01621459.2021.1933495. Epub 2021 Jul 21.

Prioritizing Autism Risk Genes using Personalized Graphical Models Estimated from Single Cell RNA-seq Data

Affiliations

Prioritizing Autism Risk Genes using Personalized Graphical Models Estimated from Single Cell RNA-seq Data

Jianyu Liu et al. J Am Stat Assoc. 2022.

Abstract

Hundreds of autism risk genes have been reported recently, mainly based on genetic studies where these risk genes have more de novo mutations in autism subjects than healthy controls. However, as a complex disease, autism is likely associated with more risk genes and many of them may not be identifiable through de novo mutations. We hypothesize that more autism risk genes can be identified through their connections with known autism risk genes in personalized gene-gene interaction graphs. We estimate such personalized graphs using single cell RNA sequencing (scRNA-seq) while appropriately modeling the cell dependence and possible zero-inflation in the scRNA-seq data. The sample size, which is the number of cells per individual, ranges from 891 to 1,241 in our case study using scRNA-seq data in autism subjects and controls. We consider 1,500 genes in our analysis. Since the number of genes is larger or comparable to the sample size, we perform penalized estimation. We score each gene's relevance by applying a simple graph kernel smoothing method to each personalized graph. The molecular functions of the top-scored genes are related to autism diseases. For example, a candidate gene RYR2 that encodes protein ryanodine receptor 2 is involved in neurotransmission, a process that is impaired in ASD patients. While our method provides a systemic and unbiased approach to prioritize autism risk genes, the relevance of these genes needs to be further validated in functional studies.

Keywords: Cell dependence; Hurdle model; Poison-LogNormal distribution; Zero-inflation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Left: Histograms of the expression proportions of the top 1500 genes which are expressed in most cells in the Velmeshev dataset (individual number 5387) and the Gierahn dataset. Middle: The actual zero proportions versus the expected zero proportions under the fitted PLN model. Right: Histograms of the P-values of Pearson correlation tests for all cell pairs.
Fig. 2
Fig. 2
Performance of different graph estimation methods under the non-zero-inflated setting (15). (a) Banded graph: μ1 = ⋯ =μp = 2, c = 2, 157 edges; (b) Hub graph: μ1 = ⋯ = μp = 2, c = 2, 72 edges; (c) Random graph: μ1 = ⋯ = μp = 3, c = 1/2, 79 edges. The x-axes and y-axes represent FPR and TPR respectively. The sparsity of estimated graphs by each method varies by its specific tuning parameter.
Fig. 3
Fig. 3
Performance of different graph estimation methods under the zero-inflated setting (16). (a) Banded graph: μ1 = ⋯ = μp = 3, c = 2, γ0 =1, γ1 = 0.5, 157 edges; (b) Hub graph: μ1 = ⋯ = μp = 3, c = 2, γ0 = 0, γ1 = 0.5, 72 edges; (c) Random graph: μ1 = ⋯ =μp = 3, c = 0.5, γ0 = 0.5, γ1 = 0.5, 79 edges. The x-axes and y-axes represent FPR and TPR respectively. The sparsity of estimated graphs by each method varies by its specific tuning parameter.
Fig. 4
Fig. 4
Accuracy evaluation of graph estimation with real scRNA-seq datasets based on “catalysis precedes” relationship benchmark (left: with dataset from Velmeshev et al. (2019); right: with dataset from Gierahn et al. (2017))
Fig. 5
Fig. 5
Use gene-gene interaction graph estimated from scRNA-seq data of 1,241 neuron cells of individual 5278 (Velmeshev et al., 2019) and autism risk genes by Simons Foundation Autism Research Initiative (SFARI) to prioritize novel autism risk genes. (A) SFARI risk score after neighborhood aggregation by 1-step neighbors (SFARIscore_1s) or 2-step neighbors (SFARIscore_2s). (B) A scatter plot of SFARI risk scores after 1-step versus 2-step neighborhood aggregation. (C-F) Four genes with degree larger than 5 and SFARIscore_1s larger or equal to 1.

Similar articles

References

    1. Allen GI and Liu Z (2013). A local poisson graphical model for inferring networks from sequencing data. IEEE transactions on nanobioscience 12, 189–198. - PubMed
    1. Anttila V, Bulik-Sullivan B, Finucane HK, Walters RK, Bras J, Duncan L, Escott-Price V, Falcone GJ, Gormley P, Malik R, et al. (2018). Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757. - PMC - PubMed
    1. Arnold BC and Press SJ (1989). Compatible conditional distributions. Journal of the American Statistical Association 84, 152–156.
    1. Athanas KM, Mauney SL, and Woo T-UW (2015). Increased extracellular clusterin in the prefrontal cortex in schizophrenia. Schizophrenia Research 169, 381–385. - PMC - PubMed
    1. Aubin-Frankowski P-C and Vert J-P (2018). Gene regulation inference from single-cell RNA-seq data with linear differential equations and velocity inference. BioRxiv page 464479. - PubMed

LinkOut - more resources