Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(10):e48096.
doi: 10.1371/journal.pone.0048096. Epub 2012 Oct 25.

Combining next-generation sequencing and microarray technology into a transcriptomics approach for the non-model organism Chironomus riparius

Affiliations

Combining next-generation sequencing and microarray technology into a transcriptomics approach for the non-model organism Chironomus riparius

Marino Marinković et al. PLoS One. 2012.

Abstract

Whole-transcriptome gene-expression analyses are commonly performed in species that have a sequenced genome and for which microarrays are commercially available. To do such analyses in species with no or limited genome data, i.e. non-model organisms, necessary transcriptomics resources, i.e. an annotated transcriptome and a validated gene-expression microarray, must first be developed. The aim of the present study was to establish an advanced approach for developing transcriptomics resources for non-model organisms by combining next-generation sequencing (NGS) and microarray technology. We applied our approach to the non-biting midge Chironomus riparius, an ecologically relevant species that is widely used in sediment ecotoxicity testing. We sampled extensively covering all C. riparius developmental stages as well as toxicant exposed larvae and obtained from a normalized cDNA library 1.5 M NGS reads totalling 501 Mbp. Using the NGS data we developed transcriptomics resources in several steps. First, we designed 844 k probes directly on the NGS reads, as well as 76 k probes targeting expressed sequence tags of related species. These probes were tested for their affinity to C. riparius DNA and mRNA, by performing two biological experiments with a 1 M probe-selection microarray that contained the entire probe-library. Subsequently, the 1.5 M NGS reads were assembled into 23,709 isotigs and 135,082 singletons, which were associated to ~55 k, respectively, ~61 k gene ontology terms and which corresponded together to 22,593 unique protein accessions. An algorithm was developed that took the assembly and the probe affinities to DNA and mRNA into account, what resulted in 59 k highly-reliable probes that targeted uniquely 95% of the isotigs and 18% of the singletons. Concluding, our approach allowed the development of high-quality transcriptomics resources for C. riparius, and is applicable to any non-model organism. It is expected, that these resources will advance ecotoxicity testing with C. riparius as whole-transcriptome gene-expression analysis are now possible with this species.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Strategy to obtain non-model organism transcriptomics resources.
NGS: Next-generation sequencing; ESTs: Expressed Sequence Tags; aCGH: array-based Comparative Genomic Hybridization; GE: Gene Expression; GO: Gene Ontology; EC: Enzyme Commission numbers. * adapted from http://extension.missouri.edu/explorepdf/agguides/pests/g07402.pdf.
Figure 2
Figure 2. Array-based comparative genomic hybridization (aCGH) experiment.
(A) Box- and-whisker plot summarizing the obtained log2 signal intensity distributions for the four indicated probes collections, with the light grey boxes representing the C. riparius aCGH signal and the dark grey the aCGH A. gambiae signal. (B) MA-plot of the aCGH data. The dots with the different shades of grey represent the entire probe-library (with a GC-content below 50%). The three defined signal-intensity parameters are indicated by the dashed blue line and the captions I, II, III. The three categories containing the selected probes are indicated by different shades of grey and the letters A, B and C. The red dots are the negative control probes and the green dots the positive control (A. gambiae EST) probes.
Figure 3
Figure 3. Array-based gene expression (aGE) experiment.
(A) Schematic representation of the two mRNA linear amplification protocols. The coloured bar represents the mRNA with the 3′ polyA tail indicated by the stretch of A’s. The arrows represent the amplified cDNA products obtained for the regular procedure and the modified procedure, with the length of the arrows indicating the length of the synthesized cDNA’s. (B) MA-plot of the aGE data. The light grey dots represent all aCGH selected probes. The three coloured regions are expected to contain probes targeting transcripts at the 3′ side (blue), probes targeting the middle of the transcripts (red) and probes targeting the 5′side as well as probes with no target transcripts (green). (C) Density plot where the relative position of the three probe populations on the isotigs is demonstrated. The colours of the lines correspond to the colours used in panels A and B. The black line represents a random selection of probes that covers, as expected, the isotigs evenly over the entire length.
Figure 4
Figure 4. Taxonomic distribution of the best blastx hits matching C. riparius transcripts.
Distribution of the best blastx hits that were matched to the isotigs (black) and the singletons (light grey) according to their taxonomic origin. (A) All transcripts (isotigs n = 16,824; singletons n = 24,129) that were matched to a BLASTX hit. (B) Transcripts (isotigs n = 16,537; singletons n = 4,7539) that were matched to a BLASTX hit and that are targeted by the final aGE microarray.
Figure 5
Figure 5. Gene Ontology (GO) terms obtained for C. riparius transcripts.
The data represents the distribution of the annotated isotigs (black) and the annotated singletons (light grey) over the various level-2 GO terms. Each bar represent the number of annotated transcripts associated with the specified level-2 GO term as a percentage of the total number of annotated transcripts belonging to the higher-ranked GO category, i.e. cellular component (isotigs n = 8,380; singletons n = 9,277), molecular function (isotigs n = 10,663; singletons n = 11,359) and biological process (isotigs n = 6,249; singletons n = 7,343).

Similar articles

Cited by

References

    1. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467–470. - PubMed
    1. MAQC Consortium (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24: 1151–1161. - PMC - PubMed
    1. Shendure J (2008) The beginning of the end for microarrays? Nat Methods 5: 585–587. - PubMed
    1. Hanssen KD, Wu Z, Irizarry RA, Leek JT (2011) Sequencing technology does not eliminate biological variability. Nat Biotechnol 29: 572–573. - PMC - PubMed
    1. van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, et al. (2002) Gene expression profiling predicts outcome of breast cancer. Nature 415: 530–536. - PubMed

Publication types

Associated data