Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan;18(1):172-7.
doi: 10.1101/gr.6984908. Epub 2007 Nov 21.

Gene expression profiling by massively parallel sequencing

Affiliations

Gene expression profiling by massively parallel sequencing

Tatiana Teixeira Torres et al. Genome Res. 2008 Jan.

Abstract

Massively parallel sequencing holds great promise for expression profiling, as it combines the high throughput of SAGE with the accuracy of EST sequencing. Nevertheless, until now only very limited information had been available on the suitability of the current technology to meet the requirements. Here, we evaluate the potential of 454 sequencing technology for expression profiling using Drosophila melanogaster. We show that short (< approximately 80 bp) and long (> approximately 300-400 bp) cDNA fragments are under-represented in 454 sequence reads. Nevertheless, sequencing of 3' cDNA fragments generated by nebulization could be used to overcome the length bias of the 454 sequencing technology. Gene expression measurements generated by restriction analysis and nebulization for fragments within the 80- to 300-bp range showed correlations similar to those reported for replicated microarray experiments (0.83-0.91); 97% of the cDNA fragments could be unambiguously mapped to the genomic DNA, demonstrating the advantage of longer sequence reads. Our analyses suggest that the 454 technology has a large potential for expression profiling, and the high mapping accuracy indicates that it should be possible to compare expression profiles across species.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the methods used to generate 3′ cDNA fragments. Double-stranded cDNA was fragmented using one of the two different strategies: restriction enzyme treatment or nebulization (step 3). 3′ Fragments were recovered (step 4) and ligated to specific linkers (step 5). cDNA fragments were then sequenced using 454 sequencing technology (step 6).
Figure 2.
Figure 2.
Under-representation of short and long 3′ cDNA fragments in 454 sequencing reads. The frequency distribution of 3′ cDNA fragment lengths obtained from in silico digestion of all D. melanogaster transcripts (release 5.1.) is shown in gray. The black line indicates the frequency distribution of 3′ cDNAs obtained from 454 sequencing reads. Independently of the actual counts obtained by the 454 sequencing, each transcript was considered only once. To compare the two datasets that are on different scales, the number of fragments in each class was divided by their root mean square (Becker et al. 1988). After scaling, both samples had a mean of zero and a standard deviation of one. Regardless of which restriction enzyme was used, we noted a pronounced under-representation of short (< ∼80 bp) and long (> ∼300 bp) fragments.
Figure 3.
Figure 3.
Length distribution of 3′ cDNA fragments after nebulization among different size classes of full-length transcripts (as inferred from the available genome annotation). The bold line indicates the median. The lower hinge gives the 25% quantile, and the upper hinge the 75% quantile. Whiskers (dashed lines) extend to the maximum and minimum sizes. Outliers are not shown.
Figure 4.
Figure 4.
Cumulative distribution of the difference in BLAST bit scores of the best and second-best hits. The dashed, dotted, and solid lines show the cumulative distribution of 20, 50, and 100 bp, respectively. The plots are based on the 454 sequencing reads that provided at least 100 bp sequence. The BLAST searches were performed by using the 5′-most 20, 50, and 100 bp. The BLAST search was performed against the D. melanogaster genomic sequence without filtering regions of low complexity.

References

    1. Bainbridge M.N., Warren R.L., Hirst M., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Warren R.L., Hirst M., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Hirst M., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V., Delaney A., Griffith M., Hickenbotham M., Magrini V., Griffith M., Hickenbotham M., Magrini V., Hickenbotham M., Magrini V., Magrini V., et al. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genomics. 2006;7:246. doi: 10.1186/1471-2164-7-246. - DOI - PMC - PubMed
    1. Becker R.A., Chambers J.M., Wilks A.R., Chambers J.M., Wilks A.R., Wilks A.R. The new S language: A programming environment for data analysis and graphics. Wadsworth & Brooks/Cole; Pacific Grove, CA: 1988.
    1. Chen J., Lee S., Zhou G., Wang S.M., Lee S., Zhou G., Wang S.M., Zhou G., Wang S.M., Wang S.M. High-throughput GLGI procedure for converting a large number of serial analysis of gene expression tag sequences into 3′ complementary DNAs. Genes Chromosomes Cancer. 2002;33:252–261. - PubMed
    1. Chen J., Rattray M., Rattray M. Analysis of tag-position bias in MPSS technology. BMC Genomics. 2006;7:77. doi: 10.1186/1471-2164-7-77. - DOI - PMC - PubMed
    1. Emrich S.J., Barbazuk W.B., Li L., Schnable P.S., Barbazuk W.B., Li L., Schnable P.S., Li L., Schnable P.S., Schnable P.S. Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 2007;17:69–73. - PMC - PubMed

Publication types

Substances

LinkOut - more resources