Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 4;25(1):930.
doi: 10.1186/s12864-024-10855-5.

Leveraging explainable deep learning methodologies to elucidate the biological underpinnings of Huntington's disease using single-cell RNA sequencing data

Affiliations

Leveraging explainable deep learning methodologies to elucidate the biological underpinnings of Huntington's disease using single-cell RNA sequencing data

Shichen Gao et al. BMC Genomics. .

Abstract

Background: Huntington's disease (HD) is a hereditary neurological disorder caused by mutations in HTT, leading to neuronal degeneration. Traditionally, HD is associated with the misfolding and aggregation of mutant huntingtin due to an extended polyglutamine domain encoded by an expanded CAG tract. However, recent research has also highlighted the role of global transcriptional dysregulation in HD pathology. However, understanding the intricate relationship between mRNA expression and HD at the cellular level remains challenging. Our study aimed to elucidate the underlying mechanisms of HD pathology using single-cell sequencing data.

Results: We used single-cell RNA sequencing analysis to determine differential gene expression patterns between healthy and HD cells. HD cells were effectively modeled using a residual neural network (ResNet), which outperformed traditional and convolutional neural networks. Despite the efficacy of our approach, the F1 score for the test set was 96.53%. Using the SHapley Additive exPlanations (SHAP) algorithm, we identified genes influencing HD prediction and revealed their roles in HD pathobiology, such as in the regulation of cellular iron metabolism and mitochondrial function. SHAP analysis also revealed low-abundance genes that were overlooked by traditional differential expression analysis, emphasizing its effectiveness in identifying biologically relevant genes for distinguishing between healthy and HD cells. Overall, the integration of single-cell RNA sequencing data and deep learning models provides valuable insights into HD pathology.

Conclusion: We developed the model capable of analyzing HD at single-cell transcriptomic level.

Keywords: Deep residual model; Huntington’s disease; Single-cell RNA-seq; Unveiling potentially key genes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Graphical presentation of the residual neural network structure, a deep learning architecture for analyzing single-cell gene expression data, focuses on distinguishing between normal and Huntington’s disease samples. It consists of input layers, performer layers with CNN blocks, and residual blocks containing linear layers, activations, and convolutions. This network can classify samples as either normal or Huntington’s disease. Residual connections help train deeper networks effectively
Fig. 2
Fig. 2
t-SNE visualization of the integrated single-cell RNA sequencing data. The left panel depicts the distribution of healthy and Huntington’s disease samples within the t-SNE plot, while the right panel illustrates the distribution of various cell types in the same t-SNE plot
Fig. 3
Fig. 3
Enrichment in biological processes for marker genes in each cluster and associated p-values were generated using clusterProfiler. Each point in the plot corresponds to a specific biological process, and its size indicates the count of genes associated with that process. Each point is color-coded according to the significance level (p.adjust) involved in that particular biological process, with darker red points representing more statistically significant enrichments. The x-axis labels represent different cell types or neuron subpopulations, including D1 and D2 + spiny neurons, D1 + cholinergic striatal neurons, D1 + spiny neurons, immature neurons, mature neurons, NPCs (Neural Progenitor Cells), and NSCs (Neural Stem Cells). The y-axis lists the enriched biological processes. For a complete list of GO enrichment results, see Supplemental Table S1
Fig. 4
Fig. 4
The workflow of an interpretable deep learning framework includes data preprocessing, model training, model evaluation, and model interpretation
Fig. 5
Fig. 5
Performance comparison of the three artificial neural network architectures (residual neural network, standard neural network, convolutional neural network). Figures a to c show the F1 score curves for the three architectures over epochs on the training set (blue curve) and validation set (orange curve). Figure d displays the ROC curves on the validation set, with the blue solid line representing the ROC curve of the residual neural network, the green dashed line representing the ROC curve of the convolutional neural network, and the orange dashed line representing the ROC curve of the standard neural network. Figure e illustrates the accuracy of the three architectures in predicting labels for Huntington and Normal on the same dataset
Fig. 6
Fig. 6
Performance of the residual neural network across three datasets. Figures a to c show the accuracy curves over epochs for the iPSC-derived neurons dataset, astrocytes dataset, and neurons dataset, respectively, with the blue curve representing performance on the training set and the orange curve representing performance on the validation set. Figures d to f display the BCELoss curves over epochs for the iPSC-derived neurons dataset, astrocytes dataset, and neurons dataset, respectively, with the blue curve representing performance on the training set and the orange curve representing performance on the validation set
Fig. 7
Fig. 7
SHAP summary plot containing the 25 features with the trained residual neural network across the three datasets. The SHAP contributions for each data point are summed for each computed gene score
Fig. 8
Fig. 8
The left panel shows the Gene Ontology pathway enrichment analysis of the SHAP genes. The x-axis represents fold enrichment, indicating how much more likely the given biological process is represented among the genes of interest compared to the background population. The y-axis shows the biological processes themselves, sorted in decreasing order of significance. Each circle in the chart corresponds to a specific biological process, with the size of the circle representing the number of genes involved in each process, ranging from 100 to 350 genes. The color of the circles corresponds to the log10(FDR) values, which measure the statistical significance of the enrichment. Higher values indicate lower FDR, meaning greater confidence in the observed enrichment.The right panel displays a volcano plot showing differentially expressed genes between HD and healthy cells. Blue circles represent genes with significantly downregulated expression, red circles represent genes with significantly upregulated expression, and gray circles denote genes with nonsignificant differences. The significance threshold was set at P < 0.05
Fig. 9
Fig. 9
Feature plot of cells that are positive for each of the four high-ranking SHAP genes, respectively, in the t-SNE projection
Fig. 10
Fig. 10
The SHAP value to feature value plot for the RPS6 gene, with the respective other gene (FTH1, MALAT1, MT-CO1, etc.) values color-coded. Positive SHAP values indicate an association with the high expression class, while negative SHAP values indicate an association with the low expression class of genes

References

    1. Sharma V, Sharma P, Deshmukh R. Huntington’s disease: clinical complexities and therapeutic strategies. J Adv Sci Res. 2012;3(02):30–6.
    1. Yu MS, Tanese N. Huntingtin is required for neural but not cardiac/pancreatic progenitor differentiation of mouse embryonic stem cells in vitro. Front Cell Neurosci. 2017;11:33. - PMC - PubMed
    1. Biglan KM, et al. Refining the diagnosis of Huntington disease: the PREDICT-HD study. Front Aging Neurosci. 2013;5:12. - PMC - PubMed
    1. Ament SA, et al. Transcriptional regulatory networks underlying gene expression changes in Huntington’s disease. Mol Syst Biol. 2018;14(3):e7435. - PMC - PubMed
    1. Trepte P, Strempel N, Wanker EE. Spontaneous self-assembly of pathogenic huntingtin exon 1 protein into amyloid structures. Essays Biochem. 2014;56:167–80. - PubMed

LinkOut - more resources