Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 27:12:737332.
doi: 10.3389/fimmu.2021.737332. eCollection 2021.

Full-Length Transcriptome: A Reliable Alternative for Single-Cell RNA-Seq Analysis in the Spleen of Teleost Without Reference Genome

Affiliations

Full-Length Transcriptome: A Reliable Alternative for Single-Cell RNA-Seq Analysis in the Spleen of Teleost Without Reference Genome

Lixing Huang et al. Front Immunol. .

Abstract

Fish is considered as a supreme model for clarifying the evolution and regulatory mechanism of vertebrate immunity. However, the knowledge of distinct immune cell populations in fish is still limited, and further development of techniques advancing the identification of fish immune cell populations and their functions are required. Single cell RNA-seq (scRNA-seq) has provided a new approach for effective in-depth identification and characterization of cell subpopulations. Current approaches for scRNA-seq data analysis usually rely on comparison with a reference genome and hence are not suited for samples without any reference genome, which is currently very common in fish research. Here, we present an alternative, i.e. scRNA-seq data analysis with a full-length transcriptome as a reference, and evaluate this approach on samples from Epinephelus coioides-a teleost without any published genome. We show that it reconstructs well most of the present transcripts in the scRNA-seq data achieving a sensitivity equivalent to approaches relying on genome alignments of related species. Based on cell heterogeneity and known markers, we characterized four cell types: T cells, B cells, monocytes/macrophages (Mo/MΦ) and NCC (non-specific cytotoxic cells). Further analysis indicated the presence of two subsets of Mo/MΦ including M1 and M2 type, as well as four subsets in B cells, i.e. mature B cells, immature B cells, pre B cells and early-pre B cells. Our research will provide new clues for understanding biological characteristics, development and function of immune cell populations of teleost. Furthermore, our approach provides a reliable alternative for scRNA-seq data analysis in teleost for which no reference genome is currently available.

Keywords: full-length transcriptome; immune cell population; infection; scRNA-seq; teleost.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Overall work flow diagram of cell classification and single cell sequencing analysis.
Figure 2
Figure 2
Overall work flow diagram of (A) Sequel full length transcriptome cDNA library construction, and (B) full-length transcriptome data analysis.
Figure 3
Figure 3
Read number and length distribution after full length transcriptomic sequencing. (A) Subread length distribution: the abscissa represents the length of subreads, and the ordinate is the number of subreads. (B) CCS length distribution: the x-axis represents the length of the reads, and the y-axis on the left represents the coordinates of the column graph, indicating the number of reads whose length is within a certain range (x-axis); the y-axis on the right is the coordinate of the graph, indicating the number of reads whose length is greater than a certain value (x-axis). (C) CCS passes distribution: the abscissa represents the number of full passes and the ordinate represents the number of CCS sequences with corresponding full passes. (D) Consistent sequence length distribution: the abscissa represents the length of the consistent sequence, the left ordinate represents the number of sequences with the length, and the right ordinate represents the number of sequences with the length greater than a certain value (x-axis). (E) Mean mass distribution map of consistency series: the abscissa represents the quality values of high- and low-quality sequences, and the ordinate represents the number of consistent sequences of the respective quality values. (F) Isoform length distribution: the x-axis represents the length of isoforms, and the y-axis on the left represents the coordinate of the column graph, representing the number of isoforms whose length is within a certain range (x-axis); the y-axis on the right is the coordinate of the graph, indicating the number of isoforms whose length is greater than a certain value (x-axis).
Figure 4
Figure 4
Functional annotations of the full-length transcripts with Nr, SwissProt, GO and KEGG. (A) Venn analysis of annotation results of four databases: Nr, Swiss Prot, KEGG and COG/KOG. (B) Statistical map of species distribution (only the top 10 species are shown): after comparing isoform sequences with the Nr database by BlastX, the sequence with the best (lowest E value) hit to the isoform in the Nr database was taken as the corresponding homologous sequence to determine homologous sequences of the species. The number of homologous sequences of each species was statistically compared. (C) Go function classification chart. (D) Distributions of the KEGG pathways.
Figure 5
Figure 5
Prediction of the coding sequences and transcription factors. (A) The length of 3’ UTR. (B) The length of 5’ UTR. (C) TF family distribution (top 10).
Figure 6
Figure 6
Cell sorting based on full-length transcriptome and E. lanceolatus genome. (A, B) The tSNE nonlinear clustering was used to visualize the classification results of E. coioides spleen cell populations based on full-length transcriptome (A) and the whole E. lanceolatus genome (B). (C, D) Statistical histogram of the number of up-regulated genes in each sub-cluster based on full-length transcriptome (C) and E. lanceolatus genome (D). (E, F) Heatmap of the top 5 up-regulated expression genes from each cluster as a marker gene based on full-length transcriptome (E) and E. lanceolatus genome (F). Each column in the figure represents a cell, and each row represents a gene. The expression levels of genes in different cells are indicated by different colors. The more yellow, the higher the expression level, and the more purple, the lower the expression level.
Figure 7
Figure 7
Categorization of cell types based on full-length transcriptome and the whole E. lanceolatus genome. Bubble plots of expression of the 5 marker cell types in all clusters based on full-length transcriptome (A–E) and E. lanceolatus genome (F–J). X-axis depicts the name of the marker gene and Y-axis the name of the cell subpopulation; the size of the bubble represents the ratio of the sum of the expression of the marker gene in a certain subgroup to the sum of its total expression (all cells); the color of the bubble represents the average expression abundance of the marker gene in the cell subgroup; the more red the bubble color, the higher the average expression of the marker gene in the respective subgroup. (K, L) Identification of 5 cell subpopulations based on marker molecules. The results based on full-length transcriptome and E. lanceolatus genome are displayed in (K, L), respectively. The 5 cell populations are represented by different colors (B cell: orange, T cell: blue, Mo/MΦ: green, NCC: red). (M, N) The heatmap of the top 5 up-regulated expression genes from each cell subpopulation as a marker gene based on the full-length transcriptome (M) and E. lanceolatus genome (N).
Figure 8
Figure 8
Identification of Mo/MΦ and B cell subpopulations. (A) Bubble plot refers to molecular expression marker in Mo/MΦ clusters. X-axis is the name of the marker gene and Y-axis the name of the Mo/MΦ subpopulation; the size of the bubble represents the ratio of the sum of the marker gene expression in a certain subpopulation to the sum of its expression in all Mo/MΦ cells; the color of the bubble represents the average expression abundance of the marker gene in the Mo/MΦ subpopulation; the more red the bubble, the higher the average expression level of the marker gene in the Mo/MΦ subpopulation. (B) Identification of the 2 Mo/MΦ subpopulations based on marker molecules. Cell subpopulations are represented by different colors (M1 type Mo/MΦ: blue, M2 type Mo/MΦ: red). (C) Bubble plot refers to molecular expression marker in all B cell clusters. X-axis is the name of the marker gene and Y-axis the name of the B cell subpopulation; the size of the bubble represents the ratio of the sum of the marker gene expression in a certain B cell subpopulation to the sum of its expression in all B cells; the color of the bubble represents the average expression abundance of the marker gene in the B cell subpopulations; the more red the bubble, the higher the average expression level of the marker gene in the B cell subpopulation. (D) Identification of 4 B cell subpopulations based on the identified marker genes. The 4 cell subpopulations are represented by different colors (Mature-B cell: blue, pre B cell: orange, Immature-B cell: red, early-pre B cell: green). (E) Pseudo-time analysis of B cell subpopulations. Each subpanel corresponds to previously identified subpopulations as shown in (D). (F) Bubble plot refers to molecular expression marker in mature B cell clusters. X-axis is the name of the marker gene and Y-axis the name of the mature B cell subpopulation; the size of the bubble represents the ratio of the sum of the marker gene expression in a certain mature B cell subpopulation to the sum of its expression in all mature B cells; the color of the bubble represents the average expression abundance of the marker gene in the mature B cell subpopulations; the more red the bubble, the higher the average expression level of the marker gene in the mature B cell subpopulation. (G) Identification of 3 mature B cell subpopulations based on the identified marker genes. The 3 cell subpopulations are represented by different colors (IgM+: blue, IgD+: red, IgZ+: green).

References

    1. Food and Agriculture Organization of the United Nations (FAO) . The State of World Fisheries and Aquaculture. Rome, Italy: FAO; (2016).
    1. Zhong Y, Qi W, Xu W, Zhao L, Xiao B, Yan Q, F. Insights Into Mesophilic Virulence, Antibiotic Resistant and Human Pathogenicity: A Genomics Study on the Aeromonas Salmonicida SRW-OG1 Newly Isolated From the Asian Fish Epinephelus Coioides. Aquaculture (2021) 539:736630. doi: 10.1016/j.aquaculture.2021.736630 - DOI
    1. Sudhagar A, Kumar G, El-Matbouli M. Transcriptome Analysis Based on RNA-Seq in Understanding Pathogenic Mechanisms of Diseases and the Immune System of Fish: A Comprehensive Review. Int J Mol Sci (2018) 19(1):245. doi: 10.3390/ijms19010245 - DOI - PMC - PubMed
    1. Huang L, Zuo Y, Qin Y, Zhao L, Lin M, Yan Q. The Zinc Nutritional Immunity of Epinephelus Coioides Contributes to the Importance of znuC During Pseudomonas Plecoglossicida Infection. Front Immunol (2021) 12:678699. doi: 10.3389/fimmu.2021.678699 - DOI - PMC - PubMed
    1. He R, Zuo Y, Zhao L, Ma Y, Yan Q, Huang L. Copper Stress by Nutritional Immunity Activates the CusS-CusR Two-Component System That Contributes to Vibrio Alginolyticus Anti-Host Response But Affects Virulence-Related Properties. Aquaculture (2021) 532:736012. doi: 10.1016/j.aquaculture.2020.736012 - DOI

Publication types

MeSH terms

LinkOut - more resources