Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 17;18(1):796.
doi: 10.1186/s12864-017-4147-y.

An analysis of 67 RNA-seq datasets from various tissues at different stages of a model insect, Manduca sexta

Affiliations

An analysis of 67 RNA-seq datasets from various tissues at different stages of a model insect, Manduca sexta

Xiaolong Cao et al. BMC Genomics. .

Abstract

Background: Manduca sexta is a large lepidopteran insect widely used as a model to study biochemistry of insect physiological processes. As a part of its genome project, over 50 cDNA libraries have been analyzed to profile gene expression in different tissues and life stages. While the RNA-seq data were used to study genes related to cuticle structure, chitin metabolism and immunity, a vast amount of the information has not yet been mined for understanding the basic molecular biology of this model insect. In fact, the basic features of these data, such as composition of the RNA-seq reads and lists of library-correlated genes, are unclear. From an extended view of all insects, clear-cut tempospatial expression data are rarely seen in the largest group of animals including Drosophila and mosquitoes, mainly due to their small sizes.

Results: We obtained the transcriptome data, analyzed the raw reads in relation to the assembled genome, and generated heatmaps for clustered genes. Library characteristics (tissues, stages), number of mapped bases, and sequencing methods affected the observed percentages of genome transcription. While up to 40% of the reads were not mapped to the genome in the initial Cufflinks gene modeling, we identified the causes for the mapping failure and reduced the number of non-mappable reads to <8%. Similarities between libraries, measured based on library-correlated genes, clearly identified differences among tissues or life stages. We calculated gene expression levels, analyzed the most abundantly expressed genes in the libraries. Furthermore, we analyzed tissue-specific gene expression and identified 18 groups of genes with distinct expression patterns.

Conclusion: We performed a thorough analysis of the 67 RNA-seq datasets to characterize new genomic features of M. sexta. Integrated knowledge of gene functions and expression features will facilitate future functional studies in this biochemical model insect.

Keywords: Insect genome; Tobacco hornworm; Transcriptome.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
M. sexta life cycle and the 67 Illumina RNA-seq datasets. Bars in the circle represent different life stages of M. sexta, which are proportional to time periods of the insect raised with artificial diet as previously described [1]. Color-coded library identifications (1–67) are placed outside the circle at the corresponding developmental stage. The first part of the library names (on the right) indicates that the libraries are made from head (formula image), fat body (formula image), whole body (formula image), midgut (formula image), Malpighian tubule (formula image), muscle (formula image), testis (formula image), ovary (formula image), and antenna (formula image). The second part indicates major stages of the insect, i.e. embryo (E), 1st to 5th instar larvae (L1 − L5), pupae (P), and adults (A). In the third part, “D” stands for day, “h” for hour, “preW” for pre-wandering, “W” for wandering, “M” for male, and “F” for female. “S” in the last part of library names indicates single-end sequencing; no “S” in the end indicates paired-end sequencing. The cDNA libraries represent the following tissues and stages: head (H) [1. 2nd (instar) L (larvae), D1 (day 1); 2. 3rd L, D1; 3. 4th L, 12 h (hour); 4. 4th L, late; 5. 5th L, D0.5; 6. 5th L, D2; 7. 5th L, preW (pre-wandering); 8. P (pupae), late; 9. A (adults), D1; 10. A, D3; 11. A, D7], fat body (formula image) [12. 4th L, late; 13. 5th L, D1; 14. 5th L, preW; 15. 5th L, W (wandering); 16. P, D1–3; 17. P, D15–18; 18. A, D1–3; 19. A, D7–9], whole body (W) [20. E (embryo), 3 h; 21. E, late; 22. 1st L; 23. 2nd L; 24. 3rd L], midgut (formula image) (25. 2nd L; 26. 3rd L; 27. 4th L, 0 h; 28. 4th L, 12 h; 29. 4th L, late; 30. 5th L, 1–3 h; 31. 5th L, 24 h; 32. 5th L, preW; 3334. 5th L, W; 35. P, D1; 36. P, D15–18; 37. A, D3–5), Malpighian tubule (formula image) (38. 5th L, preW; 39. A, D1; 40. A, D3), muscle (formula image) (41. 4th L, late; 4243. 5th L, 12 h; 4445. 5th L, preW; 4647. 5th L, W), testis (formula image) (48. P, D3; 49. P, D15–18; 50. A, D1–3), ovary (formula image) (51. P, D15–18; 52. A, D1), head (formula image) [5356. A, D1, F (female); 5760, A, D1, M (male)], antenna (formula image) (6163, 5th L; 6466, A, F; 67, A, M)
Fig. 2
Fig. 2
Overview of the 67 cDNA libraries. a Total read numbers in the libraries. As defined in Fig. 1, bar colors represent the tissue sources of libraries 1–67. Black and cyan IDs indicate the libraries were determined by paired- and single-end sequencing, respectively. b Up-boundaries represent percentages of the total reads after trimming (green) and mapping by STAR (yellow) and TopHat (blue), with the total reads (grey) in each library set at 100%. The library names and their color codes are the same as in Fig. 1. c and d. Box-plots of survived read numbers and percentages after trimming in categories P (paired-end, 33 of the first 52 libraries), S (single-end, 19 of libraries 1–52), H (head, single-end, 53–60), and A (antenna, single-end, 61–67). e Percentages of trimming-survived reads mapped to the genome using STAR and TopHat in the four categories. f Percentages of TopHat-mapped reads corresponding to mitochondrial (blue), protein-coding (white), noncoding (green), and rRNA (red) genes. g and h Box-plots of percentages of trimming-survived reads mapped to mitochondrial and rRNA genes in categories P, S, H and A. The first 52 libraries were sequenced as a part of the genome project [28], the next 8 were for detecting sex-biased genes expression in brain [14], and the last 7 were used to study chemosensory receptor gene expression [27]
Fig. 3
Fig. 3
Features of gene transcription revealed by alignment of reads in the cDNA libraries. a Relationship between aligned bases (x-axis) and percentages (y-axis) of the genome overlaid with reads using TopHat. Each colored symbol represents one library, with their library IDs labeled (Fig. 1). Squares for paired-end libraries; circles for single-end ones. The dashed lines are linear regression of the data from the paired- and single-end libraries. b Box-plot of percentages of the mapped genome in library categories P (paired-end, 33), S (single-end, 19), H (head, single-end, 8) and A (antenna, single-end, 7) (Fig. 2). c Heatmap of z-scores in each group of base range. BPKM values were used for sorting into 19 groups. Group 1 has the highest BPKM values 1–400; Groups 2 to 19 correspond to BPKMs ranked 401–800, 801–1600, 1601–3200, … 400 × 2n + 1 to 400 × 2n + 1, where n equals 0 to 17. The heatmap is colored based on the z-score of average BPKM in each group. Libraries with black and cyan IDs were determined by paired- and single-end sequencing, respectively. d Percentage of aligned bases for each BPKM group in the total aligned bases for a specific library. The library names and their color codes are described in Fig. 1
Fig. 4
Fig. 4
Features of the unmapped reads with BLASTN hits in the 67 libraries. a Relationship between ratios of STAR-unmapped reads (x-axis) and percentages of the total unmapped reads with BLASTN match (y-axis) for all the RNA-seq libraries. Each colored symbol represents one library, with their library IDs labeled (Fig. 1). Squares for paired-end libraries; circles for single-end ones. b Distribution (left y-axis) of unmapped reads with hits in the 7 categories in different colors. Black line shows the total number of unmapped reads (right y-axis) in a library. c Box-plot of percentages of rRNA reads in total unmapped reads with BLASTN hits in library categories P, S, H and A (Fig. 2). The library IDs, names, and color codes are same as in Fig. 1
Fig. 5
Fig. 5
Pairwise comparison of the 67 cDNA libraries and number of library-correlated genes. a Mapping scores of library pairs. Value in a cell represents log2(mapping score). If higher than 4 (i.e. mapping score > 16), two libraries were closely similar or related. b Number of the correlated genes in each library, with grey bars indicating those with FPKM value >100
Fig. 6
Fig. 6
Expression profiles of 69 highly expressed genes in the 67 cDNA libraries. A non-redundant collection of the three genes with highest expression in each library are on the right. Their expression patterns are organized according to the results of cluster analysis (left). Their log2(FPKM + 1) values, representing mRNA levels, are shown in a rainbow color gradient in the heatmap. Library names (top), IDs (bottom), and color codes are described in Fig. 1
Fig. 7
Fig. 7
Library-specific expression of different genes in OGS2.0. Z-scores for highly expressed genes were calculated from FPKM values. Genes were clustered based on z-scores and divided to different groups manually based on the expression pattern. Significantly enriched GO terms (p < 0.05) for different clusters were labeled on the right, with GO numbers in red, green and blue represent Biological process, Cellular component and Molecular function, respectively

Similar articles

  • Integrated modeling of protein-coding genes in the Manduca sexta genome using RNA-Seq data from the biochemical model insect.
    Cao X, Jiang H. Cao X, et al. Insect Biochem Mol Biol. 2015 Jul;62:2-10. doi: 10.1016/j.ibmb.2015.01.007. Epub 2015 Jan 20. Insect Biochem Mol Biol. 2015. PMID: 25612938 Free PMC article.
  • The immune signaling pathways of Manduca sexta.
    Cao X, He Y, Hu Y, Wang Y, Chen YR, Bryant B, Clem RJ, Schwartz LM, Blissard G, Jiang H. Cao X, et al. Insect Biochem Mol Biol. 2015 Jul;62:64-74. doi: 10.1016/j.ibmb.2015.03.006. Epub 2015 Apr 7. Insect Biochem Mol Biol. 2015. PMID: 25858029 Free PMC article.
  • Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta.
    Kanost MR, Arrese EL, Cao X, Chen YR, Chellapilla S, Goldsmith MR, Grosse-Wilde E, Heckel DG, Herndon N, Jiang H, Papanicolaou A, Qu J, Soulages JL, Vogel H, Walters J, Waterhouse RM, Ahn SJ, Almeida FC, An C, Aqrawi P, Bretschneider A, Bryant WB, Bucks S, Chao H, Chevignon G, Christen JM, Clarke DF, Dittmer NT, Ferguson LCF, Garavelou S, Gordon KHJ, Gunaratna RT, Han Y, Hauser F, He Y, Heidel-Fischer H, Hirsh A, Hu Y, Jiang H, Kalra D, Klinner C, König C, Kovar C, Kroll AR, Kuwar SS, Lee SL, Lehman R, Li K, Li Z, Liang H, Lovelace S, Lu Z, Mansfield JH, McCulloch KJ, Mathew T, Morton B, Muzny DM, Neunemann D, Ongeri F, Pauchet Y, Pu LL, Pyrousis I, Rao XJ, Redding A, Roesel C, Sanchez-Gracia A, Schaack S, Shukla A, Tetreau G, Wang Y, Xiong GH, Traut W, Walsh TK, Worley KC, Wu D, Wu W, Wu YQ, Zhang X, Zou Z, Zucker H, Briscoe AD, Burmester T, Clem RJ, Feyereisen R, Grimmelikhuijzen CJP, Hamodrakas SJ, Hansson BS, Huguet E, Jermiin LS, Lan Q, Lehman HK, Lorenzen M, Merzendorfer H, Michalopoulos I, Morton DB, Muthukrishnan S, Oakeshott JG, Palmer W, Park Y, Passarelli AL, Rozas J, Schwartz LM, Smith W, Southgate A, Vilcinskas A, Vogt R, Wang P, Werren J, Yu XQ, Zhou JJ, Brown SJ, Sc… See abstract for full author list ➔ Kanost MR, et al. Insect Biochem Mol Biol. 2016 Sep;76:118-147. doi: 10.1016/j.ibmb.2016.07.005. Epub 2016 Aug 12. Insect Biochem Mol Biol. 2016. PMID: 27522922 Free PMC article.
  • [RNA-Seq and its applications: a new technology for transcriptomics].
    Qi YX, Liu YB, Rong WH. Qi YX, et al. Yi Chuan. 2011 Nov;33(11):1191-202. doi: 10.3724/sp.j.1005.2011.01191. Yi Chuan. 2011. PMID: 22120074 Review. Chinese.
  • Putting the genome in insect phylogenomics.
    Johnson KP. Johnson KP. Curr Opin Insect Sci. 2019 Dec;36:111-117. doi: 10.1016/j.cois.2019.08.002. Epub 2019 Aug 13. Curr Opin Insect Sci. 2019. PMID: 31546095 Review.

Cited by

References

    1. Reinecke JP, Buckner J, Grugel S. Life cycle of laboratory-reared tobacco hornworms, Manduca sexta, a study of development and behavior, using time-lapse cinematography. Biol Bull. 1980;158(1):129–140. doi: 10.2307/1540764. - DOI
    1. Dittmer NT, Tetreau G, Cao X, Jiang H, Wang P, Kanost MR. Annotation and expression analysis of cuticular proteins from the tobacco hornworm, Manduca sexta. Insect Biochem Mol Biol. 2015;62:100–113. doi: 10.1016/j.ibmb.2014.12.010. - DOI - PMC - PubMed
    1. Tetreau G, Cao X, Chen YR, Muthukrishnan S, Jiang H, Blissard GW, Kanost MR, Wang P. Overview of chitin metabolism enzymes in Manduca sexta: identification, domain organization, phylogenetic analysis and gene expression. Insect Biochem Mol Biol. 2015;62:114–126. doi: 10.1016/j.ibmb.2015.01.006. - DOI - PubMed
    1. Hiruma K, Riddiford LM. Developmental expression of mRNAs for epidermal and fat body proteins and hormonally regulated transcription factors in the tobacco hornworm, Manduca sexta. J Insect Physiol. 2010;56(10):1390–1395. doi: 10.1016/j.jinsphys.2010.03.029. - DOI - PubMed
    1. Martin JP, Beyerlein A, Dacks AM, Reisenman CE, Riffell JA, Lei H, Hildebrand JG. The neurobiology of insect olfaction: sensory processing in a comparative context. Prog Neurobiol. 2011;95(3):427–447. doi: 10.1016/j.pneurobio.2011.09.007. - DOI - PubMed