Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 27;181(1):68-89.
doi: 10.1093/toxsci/kfab009.

High-Throughput Transcriptomics Platform for Screening Environmental Chemicals

Affiliations

High-Throughput Transcriptomics Platform for Screening Environmental Chemicals

Joshua A Harrill et al. Toxicol Sci. .

Abstract

New approach methodologies (NAMs) that efficiently provide information about chemical hazard without using whole animals are needed to accelerate the pace of chemical risk assessments. Technological advancements in gene expression assays have made in vitro high-throughput transcriptomics (HTTr) a feasible option for NAMs-based hazard characterization of environmental chemicals. In this study, we evaluated the Templated Oligo with Sequencing Readout (TempO-Seq) assay for HTTr concentration-response screening of a small set of chemicals in the human-derived MCF7 cell model. Our experimental design included a variety of reference samples and reference chemical treatments in order to objectively evaluate TempO-Seq assay performance. To facilitate analysis of these data, we developed a robust and scalable bioinformatics pipeline using open-source tools. We also developed a novel gene expression signature-based concentration-response modeling approach and compared the results to a previously implemented workflow for concentration-response analysis of transcriptomics data using BMDExpress. Analysis of reference samples and reference chemical treatments demonstrated highly reproducible differential gene expression signatures. In addition, we found that aggregating signals from individual genes into gene signatures prior to concentration-response modeling yielded in vitro transcriptional biological pathway altering concentrations (BPACs) that were closely aligned with previous ToxCast high-throughput screening assays. Often these identified signatures were associated with the known molecular target of the chemicals in our test set as the most sensitive components of the overall transcriptional response. This work has resulted in a novel and scalable in vitro HTTr workflow that is suitable for high-throughput hazard evaluation of environmental chemicals.

Keywords: TempO-Seq; computational toxicology; high-throughput screening; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors declare no conflict of interest. This manuscript has been reviewed by the Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, and approved for publication. Approval does not signify that the contents reflect the view of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

Figures

Figure 1.
Figure 1.. Overview of high-throughput transcriptomics workflow.
(A) Diagram of initial plating of test and reference chemicals on a TempO-Seq dose plate, followed by randomization of chemical exposures to test plates. The first column of each test plate is not loaded with cells and is reserved for dispensing of QC samples. Grey wells on the test plate are reserved for internal positive controls added and verified by the contractor. (B) Diagram of bioinformatics workflow. Light yellow boxes indicate raw data received from contractor after targeted RNA-Seq assay completion. Light green circles indicate steps performed using existing open-source methods. Light blue circles indicate novel methods developed as part of this work. Bioinformatic analysis is generally split into two phases. Raw data processing up to probe-level count matrix and samplelevel QC metrics is performed across entire data set. Subsequent analysis is performed separately for each chemical against plate-matched vehicle controls. Intermediate processing results are stored in a database layer to facilitate analysis.
Figure 2.
Figure 2.. Quality assessment of high-throughput transcriptomics data.
(A-E) Distributions of all sample-level QC metrics, split by sample type. Dashed lines indicate thresholds for masking samples from further analysis. (F) Proportion of samples passing all QC thresholds by type. Blank = Lysis buffer negative controls containing no cellular material; QC Sample = samples prepared in larger batches and added to each plate prior to conduct of TempO-Seq assay; DMSO Control = vehicle control for all other wells; Ref Chem = single dose reference chemical treatments; Test Sample = wells treated with a test chemical.
Figure 3.
Figure 3.. Reproducibility of high-throughput transcriptomics data.
(A-B) Pairwise correlations of log2 CPM values by treatment group. All correlations were calculated between individual samples of the same treatment group as indicated. The BLTSA:BLDMSO group shows the correlation between samples from different treatment conditions. (C-D) Density distribution of the D-statistic by treatment group. (E) Correlation of L2FC (orange bars) and ssGSEA signature scores (blue bars) in each of the three test plates to median L2FC and ssGSEA signature scores across all test plates. L2FC values and ssGSEA scores for each treatment group were calculated relative to the bulk lysate DMSO (for bulk lysate TSA treatments) or DMSO (for Genistein, Sirolimus, or Trichostatin A treatments) controls as described in the methods. BLDMSO = bulk lysate DMSO control; BLTSA = bulk lysate Trichostatin A; DMSO = DMSO control; GEN = Genistein; SIRO = Sirolimus; TSA = Trichostatin A.
Figure 4.
Figure 4.. Signature set enrichment of reference chemical treatments.
(A) Distributions of absolute ssGSEA signature scores, calculated from the DESeq2 moderated log2 fold changes across the three test plates, for specific molecular target signatures across each treatment group. (B) Table of the top 5 highest ranked signatures by absolute signature score for Genistein (red), Sirolimus (green), or Trichostatin A (blue). GEN = Genistein; SIRO = Sirolimus; TSA = Trichostatin A; NULL = 1000 simulated chemicals derived from the null distribution (see methods).
Figure 5.
Figure 5.. Transcriptional perturbations produced by chemical treatments.
(A) DEG accumulation scores for each test chemical (rows) at each concentration (columns). The color of each cell indicates the number of probes that were significantly differentially expressed at or below the corresponding concentration. Chemical concentrations that were masked from transcriptomic analysis due to cytotoxicity are shown as black with a red ‘X’ on the heatmap. (B) Distribution of gene response effect sizes. For each chemical, the maximum absolute L2FC value was computed for each probe across all concentrations and the distribution for all probes is represented as a boxplot. Blue boxplots indicate the distribution using “raw” L2FCs computed directly from mean log2(CPM) values. Red boxplots indicate the distributions of moderated L2FCs returned by DESeq2 analysis of raw counts across all plates.
Figure 6.
Figure 6.. Concentration-dependent transcriptomic perturbations of estrogen receptor target genes.
(A) Heatmaps showing the log2-fold change (log2fc) values for genes in an example signature (CMAP fulvestrant 1e−08 1417 100. The naming convention for the CMAP signatures include the chemical name, the concentration in molar units, a sequential index and the number of genes.). The left-hand heatmap shows the most significantly down-regulated 100 genes and the right-hand panel shows the 100 most significantly up-regulated genes for this fulvestrant CMAP sample. Each horizontal block shows the results for the 8 separate concentrations of a given chemical where concentration increases from top to bottom. The first 4 chemicals are ER agonists and the final 3 are ER antagonists. Chemical abbreviations are: BPA: bisphenol A, BPB: bisphenol B, 4NP: 4-nonylphenol, branched, 4CP: 4-cumylphenol, 4hT: 4-hydroxytamoxifen, Fulv: fulvestrant, Clom: clomiphene citrate (1:1). (B) Signature-level concentration-response data for the same chemicals for this signature. Each panel shows the data points with 95% confidence intervals based on the fitting model error estimate, the winning concentration-response curve, the noise band (gray band spanning zero), the BMD (green vertical line) and its95% confidence interval (green box).The Y-axis is in arbitrary response units. Note that the response for the agonists is negative and for the antagonists is positive. All chemicals except 4-cumylphenol have a continuous hitcall > 0.9.
Figure 7.
Figure 7.. Assigning putative molecular targets based on connectivity with CMAP chemicals and signatures. BMD distribution histograms for example chemicals.
Each active pathway is represented by an element of the histogram at the corresponding BMD value. Colors in the bar chart indicate the signature target classes: green = estrogen, beige = thyroid, blue = CYP P450, tan = ion channel, purple = HMGCR/cholesterol, orange = mitochondria, red = cell stress, yellow = PPAR, black = random, gray = other. The color of the rectangle in the top right indicates the target class of the chemical. Of note are (1) the burst of activity at high concentrations; (2) most of the stress or random activity being observed at high concentrations in a burst; (3) the estrogenic chemicals showing estrogenic pathway activity at the lower concentrations (specificity). The number of signatures with continuous hitcalls >0.5 and number of signatures tested is listed in each panel.
Figure 8.
Figure 8.. Comparison of transcriptomic and HTS-derived BPACs.
The black triangle and confidence intervals correspond to BPACSig and associated upper and lower 95% confidence bounds of BPACSig respectively; yellow diamonds correspond to BPACBMDX; red diamonds correspond to BPACHTS; green up and down triangles indicate potencies from the EPA ER Pathway Model (https://www.epa.gov/endocrine-disruption/endocrine-disruptor-screening-programedsp-estrogen-receptor-bioactivity). The names of chemicals are colored red if the BPACHTS is within the BPACSig confidence limits; colored black if the BPACHTS is lower than BPACSig; and colored blue if the BPACHTS is above the BPACSig. The BPACHTS is the lower 5th percentile of the active AC50 values for assays that passed a series of quality filters (Paul Friedman et al. 2020).

References

    1. Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, Berlin AM, Sivachenko A, Thompson DA, Wysoker A, Fennell T et al. 2013. Comparative analysis of rna sequencing methods for degraded or low-input samples. Nat Methods. 10(7):623–629. - PMC - PubMed
    1. Akaike H 1974. New look at statistical-model identification. Ieee T Automat Contr. Ac19(6):716–723.
    1. Arcaro A, Wymann MP. 1993. Wortmannin is a potent phosphatidylinositol 3-kinase inhibitor: The role of phosphatidylinositol 3,4,5-trisphosphate in neutrophil responses. Biochem J. 296 ( Pt 2):297–301. - PMC - PubMed
    1. Bal-Price A, Hogberg HT, Crofton KM, Daneshian M, FitzGerald RE, Fritsche E, Heinonen T, Hougaard Bennekou S, Klima S, Piersma AH et al. 2018. Recommendation on test readiness criteria for new approach methods in toxicology: Exemplified for developmental neurotoxicity. ALTEX. 35(3):306–352. - PMC - PubMed
    1. Banga S, Patil GP, Taillie C. 2002. Direct calculation of likelihood-based benchmark dose levels for quantitative responses. Environ Ecol Stat. 9(3):295–315.

Publication types