Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 9:8:e9320.
doi: 10.7717/peerj.9320. eCollection 2020.

SMRT sequencing of the full-length transcriptome of the white-backed planthopper Sogatella furcifera

Affiliations

SMRT sequencing of the full-length transcriptome of the white-backed planthopper Sogatella furcifera

Jing Chen et al. PeerJ. .

Abstract

The white-backed planthopper Sogatella furcifera is an economically important rice pest distributed throughout Asia. It damages rice crops by sucking phloem sap, resulting in stunted growth and plant virus transmission. We aimed to obtain the full-length transcriptome data of S. furcifera using PacBio single-molecule real-time (SMRT) sequencing. Total RNA extracted from S. furcifera at various developmental stages (egg, larval, and adult stages) was mixed and used to generate a full-length transcriptome for SMRT sequencing. Long non-coding RNA (lncRNA) identification, full-length coding sequence prediction, full-length non-chimeric (FLNC) read detection, simple sequence repeat (SSR) analysis, transcription factor detection, and transcript functional annotation were performed. A total of 12,514,449 subreads (15.64 Gbp, clean reads) were generated, including 630,447 circular consensus sequences and 388,348 FLNC reads. Transcript cluster analysis of the FLNC reads revealed 251,109 consensus reads including 29,700 high-quality reads. Additionally, 100,360 SSRs and 121,395 coding sequences were identified using SSR analysis and ANGEL software, respectively. Furthermore, 44,324 lncRNAs were annotated using four tools and 1,288 transcription factors were identified. In total, 95,495 transcripts were functionally annotated based on searches of seven different databases. To the best of our knowledge, this is the first study of the full-length transcriptome of the white-backed planthopper obtained using SMRT sequencing. The acquired transcriptome data can facilitate further studies on the ecological and viral-host interactions of this agricultural pest.

Keywords: Full-length transcriptome; SMRT sequencing; Sogatella furcifera.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Read length distribution of SMRT sequencing.
(A) Distribution of the number and length of 12,514,449 subread sequences. (B) Distribution of the number and length of 388,348 FLNC sequences. (C) Distribution of the number and length of 251,109 consensus isoforms.
Figure 2
Figure 2. SSR density of different types of SSRs.
(i) SSR motif unit: number of repeating bases. (ii) “Mono-”: repeat unit of a single base. (iii) “Di-”: two bases. (iv) “Tri-”: three bases. (v) “Tetra-”: four bases. (vi) “Penta-”: five bases and “Hexa-”: six bases. The specific number of repetitions should correspond to the legend according to the color.
Figure 3
Figure 3. Transcript family distribution of TFs.
Different types of transcript family were plotted along the x-axis, while the number of transcription factors were plotted along the y-axis.
Figure 4
Figure 4. Length distribution of CDSs.
Length of predicted CDS was plotted along the x-axis, while number of CDS transcripts was plotted along the left y-axis. The yellow line that represents the percentage of CDS length was plotted along the right y-axis.
Figure 5
Figure 5. Candidate lncRNAs identified using CNCI, Pfam, Plek, and CPC.
Non-overlapping areas indicate the number of lncRNAs identified by a single tool. Overlapping areas indicate the total number of lncRNAs identified by several tools.
Figure 6
Figure 6. Functional annotation of the corrected isoforms.
(A) Function annotation of transcripts in all databases. GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; Nr, Non-Redundant Protein Database; COG, Cluster of Orthologous Groups of proteins. (B) Species of highest scoring blastp match in Nr. (C) KEGG pathway assignment of transcripts. (D) COG annotation of transcript sequences. (E) Distribution of GO terms for all annotated transcripts in cellular component, biological process and molecular function.

References

    1. Alba R, Payton P, Fei Z, McQuinn R, Debbie P, Martin GB, Tanksley SD, Giovannoni JJ. Transcriptome and selected metabolite analyses reveal multiple points of ethylene control during tomato fruit development. The Plant Cell. 2005;17:2954–2965. doi: 10.1105/tpc.105.036053. - DOI - PMC - PubMed
    1. Allen SL, Delaney EK, Kopp A, Chenoweth SF. Single-molecule sequencing of the drosophila serrata genome. G3 (Bethesda) 2017;7:781–788. doi: 10.1534/g3.116.037598. - DOI - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Bao YY, Qu LY, Zhao D, Chen LB, Jin HY, Xu LM, Cheng JA, Zhang CX. The genome- and transcriptome-wide analysis of innate immunity in the brown planthopper, Nilaparvata lugens. BMC Genomics. 2013;14:160. doi: 10.1186/1471-2164-14-160. - DOI - PMC - PubMed
    1. Beier S, Thiel T, Munch T, Scholz U, Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. - DOI - PMC - PubMed