Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 11:19:2018-2026.
doi: 10.1016/j.csbj.2021.04.021. eCollection 2021.

DRscDB: A single-cell RNA-seq resource for data mining and data comparison across species

Affiliations

DRscDB: A single-cell RNA-seq resource for data mining and data comparison across species

Yanhui Hu et al. Comput Struct Biotechnol J. .

Abstract

With the advent of single-cell RNA sequencing (scRNA-seq) technologies, there has been a spike in studies involving scRNA-seq of several tissues across diverse species including Drosophila. Although a few databases exist for users to query genes of interest within the scRNA-seq studies, search tools that enable users to find orthologous genes and their cell type-specific expression patterns across species are limited. Here, we built a new search database, DRscDB (https://www.flyrnai.org/tools/single_cell/web/), to address this need. DRscDB serves as a comprehensive repository for published scRNA-seq datasets for Drosophila and relevant datasets from human and other model organisms. DRscDB is based on manual curation of Drosophila scRNA-seq studies of various tissue types and their corresponding analogous tissues in vertebrates including zebrafish, mouse, and human. Of note, our search database provides most of the literature-derived marker genes, thus preserving the original analysis of the published scRNA-seq datasets. Finally, DRscDB serves as a web-based user interface that allows users to mine gene expression data from scRNA-seq studies and perform cell cluster enrichment analyses pertaining to various scRNA-seq studies, both within and across species.

Keywords: Cross-species analysis; Data mining; Model organisms; single-cell RNA-seq.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Curation and processing of scRNA-seq datasets from the literature. DRscDB is built based on curation of published scRNA-seq literature. During the curation process, curators extract the information about experimental design, sample information, and marker genes from each publication, and organize the information in a standard template. Data wranglers retrieve the data files (cell expression matrix and metadata file) from GEO and calculate the expression statistics of each gene at the cluster level (Supplementary Fig. S1). Subsequently, data files and annotation files are processed by a software engineer for database upload.
Fig. 2
Fig. 2
Use of DRscDB for data mining. At the DRscDB search page, a user can enter a gene of interest with or without specifying the tissue of interest, and results are summarized in a table format listing the number of datasets expressing the gene of interest as well as the orthologous genes. Next, the user can find more detailed information such as the relevant clusters expressing the gene of interest. The statistics about the percent of cells expressing the gene, as well as the average expression level, can be visualized by dot plot, bar graph, or heatmap. If a gene is identified as one of the marker genes for any of the clusters, the statistics of fold enrichment as well as P value are also displayed by bar graph.
Fig. 3
Fig. 3
Use of DRscDB for enrichment analysis. At the DRscDB enrichment analysis page, a user can input a list of genes and find the clusters for which the input genes are significantly enriched among the top 100 marker genes. In addition, at this page, a user can also enter multiple gene lists and compare each input gene list (for example, 15 lists) with every cluster of a selected study (for example, 9 clusters). The enrichment results are visualized by a heatmap, consisting in this example of a 9x15 matrix, with columns representing each input gene list and rows represents each cluster from the selected study. The darkness of the color represents similarity (-log10 P value or fold enrichment).
Fig. 4
Fig. 4
Unsupervised hierarchical clustering of enrichment results comparing top markers from for publications on the Drosophila immune system. Clustering of the top 100 marker genes per cluster from , with 2 other published immune datasets , . The results reveal that similar cell types tend to cluster together from these three immune datasets; therefore, it is reasonable to suggest that for newly generated datasets DRscDB can be used to assign cell types.
Fig. 5
Fig. 5
DRscDB facilitates comparison of cell clusters across datasets and species. A. Comparison of the top 10 marker genes per cluster derived from Drosophila or mosquito blood scRNA-seq datasets. B. Comparison of the top 20 marker genes per cluster from the Drosophila gut study by Hung et al., 2020 with published human intestinal cell clusters .

Similar articles

Cited by

References

    1. Han X., Wang R., Zhou Y., Fei L., Sun H., Lai S. Mapping the mouse cell atlas by microwell-seq. Cell. 2018;172(5):1091–1107.e17. - PubMed
    1. Regev A. The Human Cell Atlas. Elife. 2017:6. - PMC - PubMed
    1. Rozenblatt-Rosen O., Stubbington M.J.T., Regev A., Teichmann S.A. The human cell atlas: From vision to reality. Nature. 2017;550(7677):451–453. - PubMed
    1. Li H. Single-cell RNA sequencing in Drosophila: Technologies and applications. Wiley Interdiscip Rev Dev Biol. 2020 - PMC - PubMed
    1. Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36(5):411–420. - PMC - PubMed

LinkOut - more resources