Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan;17(1):108-16.
doi: 10.1101/gr.5488207. Epub 2006 Nov 29.

Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines

Affiliations

Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines

Jaswinder Khattra et al. Genome Res. 2007 Jan.

Abstract

We describe the details of a serial analysis of gene expression (SAGE) library construction and analysis platform that has enabled the generation of >298 high-quality SAGE libraries and >30 million SAGE tags primarily from sub-microgram amounts of total RNA purified from samples acquired by microdissection. Several RNA isolation methods were used to handle the diversity of samples processed, and various measures were applied to minimize ditag PCR carryover contamination. Modifications in the SAGE protocol resulted in improved cloning and DNA sequencing efficiencies. Bioinformatic measures to automatically assess DNA sequencing results were implemented to analyze the integrity of ditag structure, linker or cross-species ditag contamination, and yield of high-quality tags per sequence read. Our analysis of singleton tag errors resulted in a method for correcting such errors to statistically determine tag accuracy. From the libraries generated, we produced an essentially complete mapping of reliable 21-base-pair tags to the mouse reference genome sequence for a meta-library of approximately 5 million tags. Our analyses led us to reject the commonly held notion that duplicate ditags are artifacts. Rather than the usual practice of discarding such tags, we conclude that they should be retained to avoid introducing bias into the results and thereby maintain the quantitative nature of the data, which is a major theoretical advantage of SAGE as a tool for global transcriptional profiling.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of library construction, broken down by organism (A) and by library construction method (B). Numbers of libraries are indicated.
Figure 2.
Figure 2.
Distribution of potential cross-species inter-library contamination. The bars indicate the number of libraries (y-axis) contained within bins composed of libraries with contamination frequencies that fall within the ranges shown for each bin (x-axis). Percentages indicate the proportion of sequences within a library that match tag sequences we deduced to be exclusive to other species. This is a sensitive and probably maximal estimate of cross-species contamination. Five percent of the libraries assayed have contamination frequencies of ≥1%. These are the libraries we classified as potentially contaminated (red bars).
Figure 3.
Figure 3.
Box-and-whisker plots indicating the proportion of error tags observed in libraries constructed using LongSAGE, SAGELite, and PCR-SAGE. Error categories, on the x-axis, refer to the error attributed only to sequencing (for example, “LongSAGE Sequencing”) or to all contributing sources of error, including sequencing (for example, “LongSAGE Total”). Boxes encompass the lower and upper quartiles. (Horizontal line drawn through each box) Median error value for each category, (red crosses and circles) possible outliers. Comparing the three “Total” categories to one another and the three “Sequencing” categories to one another indicates that LongSAGE data exhibit lower proportions of error tags than the other methods. PCR-SAGE and SAGE-Lite were the methods of choice when RNA quantities were limiting. Both approaches involve additional amplification steps, compared with the LongSAGE method. It is possible that this additional amplification contributes to the observed increased proportion of error tags in these libraries.
Figure 4.
Figure 4.
Coverage of the mouse Reference Sequence (RefSeq) data set. The proportion of entries in the mouse RefSeq database represented by a LongSAGE tag in a deeply sampled LongSAGE library (sm104 kidney, >800,000 tags; blue diamonds) and in a meta-library of Mouse Atlas tags (>11,000,000 tags; red squares) is shown. Plotted on the y-axis is the proportion of RefSeq covered. The number of tags is plotted on the x-axis. (A) With sequential sampling, performed computationally, both data sets exhibit rapid coverage of a subset of RefSeq as tag count increases. As expected, the meta-library exhibits superior coverage. (B) The same data, but with an expanded scale along the x-axis to better display the kidney library data.

References

    1. Akmaev V.R., Wang C.J., Wang C.J. Correction of sequence-based artifacts in serial analysis of gene expression. Bioinformatics. 2004;20:1254–1263. - PubMed
    1. Angelastro J.M., Klimaschewski L.P., Vitolo O.V., Klimaschewski L.P., Vitolo O.V., Vitolo O.V. Improved NlaIII digestion of PAGE-purified 102 bp ditags by addition of a single purification step in both the SAGE and microSAGE protocols. Nucleic Acids Res. 2000;28:E62. - PMC - PubMed
    1. Beissbarth T., Hyde L., Smyth G.K., Job C., Boon W.M., Tan S.S., Scott H.S., Speed T.P., Hyde L., Smyth G.K., Job C., Boon W.M., Tan S.S., Scott H.S., Speed T.P., Smyth G.K., Job C., Boon W.M., Tan S.S., Scott H.S., Speed T.P., Job C., Boon W.M., Tan S.S., Scott H.S., Speed T.P., Boon W.M., Tan S.S., Scott H.S., Speed T.P., Tan S.S., Scott H.S., Speed T.P., Scott H.S., Speed T.P., Speed T.P. Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics. 2004;20:I31–I39. - PubMed
    1. Bennett S.T., Barnes C., Cox A., Davies L., Brown C., Barnes C., Cox A., Davies L., Brown C., Cox A., Davies L., Brown C., Davies L., Brown C., Brown C. Toward the $1000 human genome. Pharmacogenomics. 2005;6:373–382. - PubMed
    1. Chen J., Sadowski I., Sadowski I. Identification of the mismatch repair genes PMS2 and MLH1 as p53 target genes by using serial analysis of binding elements. Proc. Natl. Acad. Sci. 2005;102:4813–4818. - PMC - PubMed

Publication types

LinkOut - more resources