Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023;17(3):173902.
doi: 10.1007/s11704-022-2011-y. Epub 2022 Oct 26.

AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction

Affiliations

AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction

Shuchang Zhao et al. Front Comput Sci (Berl). 2023.

Abstract

Single-cell RNA sequencing (scRNA-seq) technology has become an effective tool for high-throughout transcriptomic study, which circumvents the averaging artifacts corresponding to bulk RNA-seq technology, yielding new perspectives on the cellular diversity of potential superficially homogeneous populations. Although various sequencing techniques have decreased the amplification bias and improved capture efficiency caused by the low amount of starting material, the technical noise and biological variation are inevitably introduced into experimental process, resulting in high dropout events, which greatly hinder the downstream analysis. Considering the bimodal expression pattern and the right-skewed characteristic existed in normalized scRNA-seq data, we propose a customized autoencoder based on a two-part-generalized-gamma distribution (AE-TPGG) for scRNA-seq data analysis, which takes mixed discrete-continuous random variables of scRNA-seq data into account using a two-part model and utilizes the generalized gamma (GG) distribution, for fitting the positive and right-skewed continuous data. The adopted autoencoder enables AE-TPGG to captures the inherent relationship between genes. In addition to the ability of achieving low-dimensional representation, the AE-TPGG model also provides a denoised imputation according to statistical characteristic of gene expression. Results on real datasets demonstrate that our proposed model is competitive to current imputation methods and ameliorates a diverse set of typical scRNA-seq data analyses.

Electronic supplementary material: Supplementary material is available in the online version of this article at 10.1007/s11704-022-2011-y.

Keywords: TPGG; autoencoder; data imputation; dimensionality reduction; scRNA-seq.

PubMed Disclaimer

References

    1. Potter S S. Single-cell RNA sequencing for the study of development, physiology and disease. Nature Reviews Nephrology. 2018;14(8):479–492. doi: 10.1038/s41581-018-0021-7. - DOI - PMC - PubMed
    1. Li H, Courtois E T, Sengupta D, Tan Y, Chen K H, Goh J J L, Kong S L, Chua C, Hon L K, Tan W S, Wong M, Choi P J, Wee L J K, Hillmer A M, Tan I B, Robson P, Prabhakar S. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nature Genetics. 2017;49(5):708–718. doi: 10.1038/ng.3818. - DOI - PubMed
    1. Cao Y, Su B, Guo X, Sun W, Deng Y, Bao L, Zhu Q, Zhang X, Zheng Y, Geng C, Chai X, He R, Li X, Lv Q, Zhu H, Deng W, Xu Y, Wang Y, Qiao L, Tan Y, Song L, Wang G, Du X, Gao N, Liu J, Xiao J, Su X, Du Z, Feng Y, Qin C, Qin C, Jin R, Xie X S. Potent neutralizing antibodies against SARS-CoV-2 identified by high-throughput single-cell sequencing of convalescent patients’ B cells. Cell. 2020;182(1):73–84.e16. doi: 10.1016/j.cell.2020.05.025. - DOI - PMC - PubMed
    1. Kharchenko P V, Silberstein L, Scadden D T. Bayesian approach to single-cell differential expression analysis. Nature Methods. 2014;11(7):740–742. doi: 10.1038/nmeth.2967. - DOI - PMC - PubMed
    1. Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek A K, Slichter C K, Miller H W, Mcelrath M J, Prlic M, Linsley P S, Gottardo R. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biology. 2015;16(1):278. doi: 10.1186/s13059-015-0844-5. - DOI - PMC - PubMed

LinkOut - more resources