Next-generation tag sequencing for cancer gene expression profiling

A Sorana Morrissy¹, Ryan D Morin, Allen Delaney, Thomas Zeng, Helen McDonald, Steven Jones, Yongjun Zhao, Martin Hirst, Marco A Marra

Affiliations

PMID: 19541910
PMCID: PMC2765282
DOI: 10.1101/gr.094482.109

Next-generation tag sequencing for cancer gene expression profiling

A Sorana Morrissy et al. Genome Res. 2009 Oct.

. 2009 Oct;19(10):1825-35.

doi: 10.1101/gr.094482.109. Epub 2009 Jun 18.

Authors

A Sorana Morrissy¹, Ryan D Morin, Allen Delaney, Thomas Zeng, Helen McDonald, Steven Jones, Yongjun Zhao, Martin Hirst, Marco A Marra

Affiliation

¹ Genome Sciences Centre, Vancouver, British Columbia, Canada.

PMID: 19541910
PMCID: PMC2765282
DOI: 10.1101/gr.094482.109

Abstract

We describe a new method, Tag-seq, which employs ultra high-throughput sequencing of 21 base pair cDNA tags for sensitive and cost-effective gene expression profiling. We compared Tag-seq data to LongSAGE data and observed improved representation of several classes of rare transcripts, including transcription factors, antisense transcripts, and intronic sequences, the latter possibly representing novel exons or genes. We observed increases in the diversity, abundance, and dynamic range of such rare transcripts and took advantage of the greater dynamic range of expression to identify, in cancers and normal libraries, altered expression ratios of alternative transcript isoforms. The strand-specific information of Tag-seq reads further allowed us to detect altered expression ratios of sense and antisense (S-AS) transcripts between cancer and normal libraries. S-AS transcripts were enriched in known cancer genes, while transcript isoforms were enriched in miRNA targeting sites. We found that transcript abundance had a stronger GC-bias in LongSAGE than Tag-seq, such that AT-rich tags were less abundant than GC-rich tags in LongSAGE. Tag-seq also performed better in gene discovery, identifying >98% of genes detected by LongSAGE and profiling a distinct subset of the transcriptome characterized by AT-rich genes, which was expressed at levels below those detectable by LongSAGE. Overall, Tag-seq is sensitive to rare transcripts, has less sequence composition bias relative to LongSAGE, and allows differential expression analysis for a greater range of transcripts, including transcripts encoding important regulatory molecules.

PubMed Disclaimer

Figures

**Figure 1.**
Outline of Tag-seq library generation. Each mRNA (brown) underwent double-stranded cDNA synthesis using oligo(dT) beads, to capture polyadenylated RNA. cDNA (gold) is digested with the NlaIII anchoring restriction enzyme (vertical red arrows), leaving a 4-bp overhang (GTAC). Only cDNA fragments anchored to oligo(dT) beads are retained. Adapter A (green) is ligated to the overhang and adds a recognition site for the TypeIIS tagging enzyme MmeI. Following MmeI digestion (red vertical arrow), a second adapter is ligated (Adapter B, blue) to the resulting 2-bp overhang. PCR primers (horizontal red arrows) annealing to adapters A and B are used to enrich tags. Cluster generation and sequencing (horizontal brown arrow) is performed on the Illumina cluster station and analyzer. The resulting image files are processed to extract the read sequences, and 21-bp SAGE tags are further extracted from the reads. Tags consist of the 4-bp NlaIII recognition sites and 17 bp of unique sequence, and constitute a total of 21 bases that can be mapped back to the original mRNA (brown).

**Figure 2.**
Average number (A) and proportion (B) of Ensembl genes unambiguously identified in Tag-seq and LongSAGE libraries as a function of sampling depth. Error bars represent the SD of the average number of identified genes in 77 LongSAGE libraries and 35 Tag-seq libraries. The largest LongSAGE libraries were ∼300,000 tags, while the largest Tag-seq libraries were ∼10 million tags.

**Figure 3.**
GC-bias of Tag-seq and LongSAGE libraries was calculated in units of the number of SDs by which the observed bias differed from the expected bias (see text). Positive units represent libraries with more AT-rich tag sequences than expected (AT-bias), while negative units represent libraries with more GC-rich tag sequences than expected (GC-bias). Calculated bias is shown for all quality filtered Tag-seq and all LongSAGE tag sequences, at increasing thresholds of tag expression (x-axis).

**Figure 4.**
GC-content biases in Tag-seq and LongSAGE technical replicate libraries. (A) Comparison of the GC-content and average count of tag sequences found either in common or by each of the Tag-seq and LongSAGE replicate libraries. (B) Pearson correlations were calculated for tags binned by GC-content. Bins are labeled with the range of the observed GC-content, and the number of binned tags (x-axis). (C) Average expression of tag sequences in each GC-content bin was calculated for both Tag-seq and LongSAGE, and the log of each average was plotted. An asterisk (*) denotes bins between which the expression of tag sequences was significantly different (measured using a t-test, P < 0.01).

**Figure 5.**
The proportion of the average number of genes detected by tags in LongSAGE and Tag-seq libraries is shown at a series of expression thresholds (tags per million). Bars represent the proportion of the average number of genes with intronic tags (A), antisense tags (B), and DNA-binding domains (transcription factors) (C) in Tag-seq and LongSAGE libraries.

**Figure 6.**
Detection of exonic, intronic, and antisense tags in the Tag-seq and LongSAGE hESC replicates. Tag sequences from the Tag-seq technical replicate, the in silico derived sub_Tag-seq, and the LongSAGE replicate were mapped to the introns, exons, and antisense strands of Ensembl genes. The proportions of distinct tag sequences (A) and tag abundance (B) are reported relative to all mapped quality-filtered tags. Average tag counts (±SD) are reported for all tag sequences found in common between the three libraries (C).

See this image and copyright information in PMC

References

1. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, et al. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science. 1991;252:1651–1656. - PubMed
1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene Ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. - PMC - PubMed
1. Bentley DR. Whole-genome re-sequencing. Curr Opin Genet Dev. 2006;16:545–552. - PubMed
1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed
1. Birney E, Andrews TD, Bevan P, Caccamo M, Chen Y, Clarke L, Coates G, Cuff J, Curwen V, Cutts T, et al. An overview of Ensembl. Genome Res. 2004;14:925–928. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Next-generation tag sequencing for cancer gene expression profiling

Affiliation

Next-generation tag sequencing for cancer gene expression profiling

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous