. 2024 Feb 1;40(2):btae067.

doi: 10.1093/bioinformatics/btae067.

Phenotype prediction from single-cell RNA-seq data using attention-based neural networks

Yuzhen Mao¹, Yen-Yi Lin^{2

3}, Nelson K Y Wong⁴, Stanislav Volik³, Funda Sar^{2

3}, Colin Collins^{2

3}, Martin Ester^{1

3}

Affiliations

¹ School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
² Department of Urologic Sciences, University of British Columbia, Vancouver BC V5Z 1M9, Canada.
³ Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada.
⁴ Department of Experimental Therapeutics, BC Cancer, Vancouver BC V5Z 1L3, Canada.

PMID: 38390963
PMCID: PMC10902676
DOI: 10.1093/bioinformatics/btae067

Phenotype prediction from single-cell RNA-seq data using attention-based neural networks

Yuzhen Mao et al. Bioinformatics. 2024.

. 2024 Feb 1;40(2):btae067.

doi: 10.1093/bioinformatics/btae067.

Authors

Yuzhen Mao¹, Yen-Yi Lin^{2

3}, Nelson K Y Wong⁴, Stanislav Volik³, Funda Sar^{2

3}, Colin Collins^{2

3}, Martin Ester^{1

3}

Affiliations

¹ School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.
² Department of Urologic Sciences, University of British Columbia, Vancouver BC V5Z 1M9, Canada.
³ Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada.
⁴ Department of Experimental Therapeutics, BC Cancer, Vancouver BC V5Z 1L3, Canada.

PMID: 38390963
PMCID: PMC10902676
DOI: 10.1093/bioinformatics/btae067

Abstract

Motivation: A patient's disease phenotype can be driven and determined by specific groups of cells whose marker genes are either unknown or can only be detected at late-stage using conventional bulk assays such as RNA-Seq technology. Recent advances in single-cell RNA sequencing (scRNA-seq) enable gene expression profiling in cell-level resolution, and therefore have the potential to identify those cells driving the disease phenotype even while the number of these cells is small. However, most existing methods rely heavily on accurate cell type detection, and the number of available annotated samples is usually too small for training deep learning predictive models.

Results: Here, we propose the method ScRAT for phenotype prediction using scRNA-seq data. To train ScRAT with a limited number of samples of different phenotypes, such as coronavirus disease (COVID) and non-COVID, ScRAT first applies a mixup module to increase the number of training samples. A multi-head attention mechanism is employed to learn the most informative cells for each phenotype without relying on a given cell type annotation. Using three public COVID datasets, we show that ScRAT outperforms other phenotype prediction methods. The performance edge of ScRAT over its competitors increases as the number of training samples decreases, indicating the efficacy of our sample mixup. Critical cell types detected based on high-attention cells also support novel findings in the original papers and the recent literature. This suggests that ScRAT overcomes the challenge of missing marker genes and limited sample number with great potential revealing novel molecular mechanisms and/or therapies.

Availability and implementation: The code of our proposed method ScRAT is published at https://github.com/yuzhenmao/ScRAT.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
An overview of ScRAT, which consists of three main modules: Sample Mixup, Attention Layer, and Phenotype Classifier. It takes a scRNA-seq sample (a set of cells) as input, and outputs the predicted phenotype for the input sample.

**Figure 2.**
Comparison of different methods on four different tasks. For each task, we report the prediction results of all methods using AUC $\pm$ 95% confidence intervals for 10 different training ratios. ScRAT outperforms other methods in all settings, followed by vanilla attention (the P-value of t-test between ScRAT and vanilla attention $≪$ 0.01 in all but the SC4-Severity tasks at Training Ratio = 9%). The performance edge of ScRAT over vanilla attention increases as the training ratio decreases, especially for the Combat datasets. See Supplementary Figs S3 and S4 for more information.

See this image and copyright information in PMC

References

1. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. 2014 Sep 1.
1. Beltagy I, Peters ME, Cohan A. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150. 2020 Apr 10.
1. Brbić M, Zitnik M, Wang S. et al. MARS: discovering novel cell types across heterogeneous single-cell experiments. Nat Methods 2020;17:1200–6. - PubMed
1. Cancer Genome Atlas Research Network. The molecular taxonomy of primary prostate cancer. Cell 2015;163:1011–25. - PMC - PubMed
1. Carratino L, Cissé M, Jenatton R, Vert JP. On mixup regularization. J Mach Learn Res 2022;23:14632–62.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

PJT-175238/CIHR/Canada

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Phenotype prediction from single-cell RNA-seq data using attention-based neural networks

Affiliations

Phenotype prediction from single-cell RNA-seq data using attention-based neural networks

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources