Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Feb 15;34(2):189-216.
doi: 10.1021/acs.chemrestox.0c00264. Epub 2020 Nov 3.

The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology

Affiliations
Review

The Tox21 10K Compound Library: Collaborative Chemistry Advancing Toxicology

Ann M Richard et al. Chem Res Toxicol. .

Abstract

Since 2009, the Tox21 project has screened ∼8500 chemicals in more than 70 high-throughput assays, generating upward of 100 million data points, with all data publicly available through partner websites at the United States Environmental Protection Agency (EPA), National Center for Advancing Translational Sciences (NCATS), and National Toxicology Program (NTP). Underpinning this public effort is the largest compound library ever constructed specifically for improving understanding of the chemical basis of toxicity across research and regulatory domains. Each Tox21 federal partner brought specialized resources and capabilities to the partnership, including three approximately equal-sized compound libraries. All Tox21 data generated to date have resulted from a confluence of ideas, technologies, and expertise used to design, screen, and analyze the Tox21 10K library. The different programmatic objectives of the partners led to three distinct, overlapping compound libraries that, when combined, not only covered a diversity of chemical structures, use-categories, and properties but also incorporated many types of compound replicates. The history of development of the Tox21 "10K" chemical library and data workflows implemented to ensure quality chemical annotations and allow for various reproducibility assessments are described. Cheminformatics profiling demonstrates how the three partner libraries complement one another to expand the reach of each individual library, as reflected in coverage of regulatory lists, predicted toxicity end points, and physicochemical properties. ToxPrint chemotypes (CTs) and enrichment approaches further demonstrate how the combined partner libraries amplify structure-activity patterns that would otherwise not be detected. Finally, CT enrichments are used to probe global patterns of activity in combined ToxCast and Tox21 activity data sets relative to test-set size and chemical versus biological end point diversity, illustrating the power of CT approaches to discern patterns in chemical-activity data sets. These results support a central premise of the Tox21 program: A collaborative merging of programmatically distinct compound libraries would yield greater rewards than could be achieved separately.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Approximate timeline for constructing the full Tox21 compound library. Plates A, B, and C (and revised/reprocured plates A′ and B′) refer to construction of 1536-well plates containing up to 1408 compounds each, for each of the three component partner libraries, Tox21_EPA, Tox21_NTP, and Tox21_NCATS.
Figure 2
Figure 2
Schematic of the EPA’s full Tox21 partner library plate set contribution consisting of 10 copies of three distinct 4 × 384-well stock plate sets (denoted A, B, and C), where each 384-well plate set (A, B, or C) was processed onto two triplicate sets of 1536-well plates (denoted 1–3, differing by shifted overlay pattern), and three sets of triplicate 1536-well plates (A, B, and C) comprised a full partner contribution to the active Tox21 screening library. All remaining plate copies were stored frozen at −80 °C until needed, and one full library plate set (A, B, and C) in 384-well format was reserved for analytical QC analyses.
Figure 3
Figure 3
Tox21 partner plate set contributions to the full Tox21 library indicating the approximate chemical × assay overlap totals of the Tox21 Phase I NTP HTS Plate A (2007) and a portion of the EPA’s ToxCast library (2009) available at the start of Tox21 Phase II screening.
Figure 4
Figure 4
Tox21 chemical testing and data processing workflow, from initial DSSTox structure curation to plating and screening the Tox21 library at NCATS and to data distribution from the NCATS Tripod website (https://tripod.nih.gov/tox21) and PubChem (https://pubchem.ncbi.nlm.nih.gov/), with separate pipelined data analyses by both the EPA and NTP and data distribution through the EPA’s ToxCast website (https://www.epa.gov/chemical-research/exploring-toxcast-data-downloadable-data), CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard), and NTP’s CEBS website (ftp://anonftp.niehs.nih.gov/ntp-cebs/datatype/Tox21/).
Figure 5
Figure 5
Various chemical representations of the full Tox21 compound library and the corresponding totals of unique identifiers at each level, ranging from Tox21_ID (stock solutions), to generic substances (DTXSID), structures (DTXCID), QSAR-ready structures, and the subset of ToxPrint chemical features (https://toxprint.org/) represented one or more times in the full Tox21 structure library (out of 729 total possible).
Figure 6
Figure 6
A representation of the unique and overlapping DSSTox substance content in the three Tox21 partner libraries, with library totals indicated in parentheses, and total overlapping content across each of the three libraries indicated. A single unique occurrence of the 88 replicate compound set is included in the EPA’s Tox21 partner library totals for comparison purposes.
Figure 7
Figure 7
Maximum percent of hits (hit%) across Tox21 assays within a stereofamily (green bars) relative to the difference in hit% (diff) within the stereofamily (orange bars) for a total of 67 sets of stereoisomer chemicals.
Figure 8
Figure 8
Plot of the largest enrichment of hits (difference in hit%) for either the HCl salt (red bars) or parent chemical (blue bars) relative to the other in the pair, that is, difference in % hits, across Tox21 assays for a total of 58 pairs of chemicals having >3% maximum hit rate.
Figure 9
Figure 9
Comparison of chemical list coverages across the three non-overlapping Tox21 partner libraries: the full Tox21 EPA library (green, 4078 total), the portion of the Tox21 NTP library not overlapping with EPA (pink, 1861 total), and the portion of the Tox21 NCATS library not overlapping with either EPA or NTP (purple, 3005 total). Chemical lists include DRUGS (DrugBank, + Tox21_NCATS library + 134 EPA ToxCast donated pharma), OPPIN pesticides (EPA’s Pesticide Program Information Network list), TSCA environmental and industrial chemicals (EPA’s Toxic Substances Control Act list), REACH (NORMAN Network’s list of REACH chemicals for use in suspect screening), and COSMOS+ (partially curated COSMOS DB, cosmetic ingredients and personal care products list, and European Food Safety Authority’s (EFSA) OpenFoodTox list).
Figure 10
Figure 10
Total Tox21 library coverage of various lists where the bar color is unique to the list (and matches colors in Figure 9) and the total bar length indicates the total number of compounds in full list, whereas the light blue portion of bar indicates the number of list chemicals included in the Tox21 library. Lists include: DRUGS (DrugBank, + Tox21_NCATS library + 134 EPA ToxCast donated pharma), OPPIN pesticides (EPA’s Pesticide Program Information Network list), TSCA environmental and industrial chemicals (EPA’s Toxic Substances Control Act list), REACH (NORMAN Network’s list of REACH chemicals for use in suspect screening), and COSMOS+ (partially curated COSMOS DB, cosmetic ingredients and personal care products list, and the European Food Safety Authority’s (EFSA) OpenFoodTox list).
Figure 11
Figure 11
Comparison of numbers of predicted toxicants within the three non-overlapping Tox21 partner library sections: the full Tox21 EPA library (green, 4078 total), the portion of the Tox21 NTP library not overlapping with EPA (pink, 1861 total), and the portion of the Tox21 NCATS library not overlapping with either EPA or NTP (purple, 3005 total). DevTox (developmental toxicity) and Mutag (mutagenicity) were predicted from EPA T.E.S.T. models accessed from the EPA CompTox Chemicals Dashboard using 0.08 confidence threshold for DevTox and 0.5 forMutag. RatCarc (rat carcinogenicity) was predicted by the LHASA Derek Nexus software, v2.2.2 using the “Plausible” threshold (see Section 4.2 for details).
Figure 12
Figure 12
Histogram comparisons of computed properties (total chemical counts versus property range bins) for the combined EPA + NTP structure library versus the non-overlapping portion of the NCATS library (Venn diagram in center): Oral rat LD50 representing lethality to 50% of dosed rats predicted with the EPA T.E.S.T. QSAR model; complexity, based on atom, bond and path features, computed using the CORINA software (Molecular Networks GmbH); vapor pressure predicted using the OPERA QSAR models; and MolWt = molecular weight (see Section 4.2 for further details).
Figure 13
Figure 13
Frequency plot (heat map) of the number of chemicals containing each of 624 unique ToxPrint chemotypes sorted in the NCATS inventory from high (>100 chemicals per ToxPrint represented by the darkest shades of purple, red, and green), from 3 to 100 chemicals per ToxPrint represented by corresponding lighter color shades, and <3 chemicals per ToxPrint represented by light gray for all 3 inventories). The bottom portion of the figure indicates the numbers of unique ToxPrints present in three or more chemicals in each of the three total partner inventories as well as in the combined EPA + NTP inventory.
Figure 14
Figure 14
Subset of ToxPrint chemotypes showing the highest enrichment within a single partner inventory relative to the remaining inventories, comparing the incidence of each ToxPrint within the NCATS inventory (purple bars, far left) to the EPA (green) and NTP (pink) inventories (right panel), where enrichment is defined by ≥5 ToxPrint chemicals in the enriched library and ≤3 (grayed out) in each of the other partner libraries.
Figure 15
Figure 15
ToxPrint CTs with significantly greater representation in the NCTC/NCATS library (light purple bars) relative to the EPA + NTP library (light green bars), which were also found to be significantly enriched in the active chemical region of one or more Tox21 qHTS assays. Sample ToxPrint images are shown for starred names.
Figure 16
Figure 16
ToxPrint CTs with significantly greater representation in the EPA + NTP library (light green bars) relative to the NCATS library (light purple bars), which were also found to be significantly enriched in the active chemical region of one or more Tox21 qHTS assays. Sample ToxPrint images are shown for starred names.
Figure 17
Figure 17
Plot of the average number of enriched CTs within each assay group (red line) superimposed on a bar plot of the average number of tested chemicals per assay (blue bars) within each assay group, where the number of assay end points per group is listed in parentheses beside the assay group name.
Figure 18
Figure 18
Plot of the total number of unique CTs enriched in assay actives (purple line) for all assays within the indicated assay platform (e.g., CEETOX) or testing program (All ToxCast), superimposed on a bar plot of the total number of assays contained within each assay platform or testing program (yellow bar), where the number of assay end points per assay platform is also listed in parentheses beside the platform name. The dashed purple line represents the total number of unique enriched CTs adjusted for the average total number of tested chemicals within the assay group (scaled to All ToxCast).

References

    1. Eldridge G. R.; Vervoort H. C.; Lee C. M.; Cremin P. A.; Williams C. T.; Hart S. M. S.M.; Goering M. G.; O’Neil-Johnso M.; Zeng L. (2002) High-throughput method for the production and analysis of large natural product libraries for drug discovery. Anal. Chem. 74 (16), 3963–3971. 10.1021/ac025534s. - DOI - PubMed
    1. Geysen H. M.; Schoenen F.; Wagner D.; Wagner R. (2003) Combinatorial compound libraries for drug discovery: an ongoing challenge. Nat. Rev. Drug Discovery 2 (3), 222–230. 10.1038/nrd1035. - DOI - PubMed
    1. Austin C. P.; Brady L. S.; Insel T. R.; Collins F. S. (2004) NIH molecular libraries initiative. Science 306, 1138–1139. 10.1126/science.1105511. - DOI - PubMed
    1. Bolton E. E., Wang Y., Thiessen P. A., and Bryant S. H. (2008) PubChem: integrated platform of small molecules and biological activities. In Annual reports in computational chemistry, Vol. 4, pp 217–241, Elsevier, Amsterdam.
    1. Kim S.; Thiessen P. A.; Bolton E. E.; Chen J.; Fu G.; Gindulyte A.; Han L.; He J.; He S.; Shoemaker B. A.; Wang J.; Yu B.; Zhang J.; Bryant S. H. (2016) PubChem substance and compound databases. Nucleic Acids Res. 44 (D1), D1202–D1213. 10.1093/nar/gkv951. - DOI - PMC - PubMed

Publication types

Substances