Bayesian gene set benchmark dose estimation for "omic" responses

Daniel Zilber^{1

2}, Kyle P Messier^{1

2}, John House¹, Fred Parham², Scott S Auerbach², Matthew W Wheeler¹

Affiliations

¹ Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, United States.
² Division of Translational Toxicology, Predictive Toxicology Branch, National Institute of Environmental Health Sciences, Durham, NC 27713, United States.

PMID: 39786864
PMCID: PMC11783320
DOI: 10.1093/bioinformatics/btaf008

Bayesian gene set benchmark dose estimation for "omic" responses

Daniel Zilber et al. Bioinformatics. 2024.

. 2024 Dec 26;41(1):btaf008.

doi: 10.1093/bioinformatics/btaf008.

Authors

Daniel Zilber^{1

2}, Kyle P Messier^{1

2}, John House¹, Fred Parham², Scott S Auerbach², Matthew W Wheeler¹

Affiliations

¹ Division of Intramural Research, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, United States.
² Division of Translational Toxicology, Predictive Toxicology Branch, National Institute of Environmental Health Sciences, Durham, NC 27713, United States.

PMID: 39786864
PMCID: PMC11783320
DOI: 10.1093/bioinformatics/btaf008

Abstract

Motivation: Estimating a toxic reference point using tools like the benchmark dose (BMD) is a critical step in setting policy to regulate pollution and ensure safe environments. Toxicity can be measured for different endpoints, including changes in gene expression and histopathology for various tissues, and is typically explored one gene or tissue at a time in a univariate setting that ignores correlation. In this work, we develop a multivariate estimation procedure to estimate the BMD for specified gene sets. Our approach extends the foundational univariate approach by accounting for correlation in a statistically principled way.

Results: We illustrate the method using data from a 5-day rat study and Hallmark gene sets and compare to existing BMD results computed by the EPA for both gene sets and apical histopathology endpoints. In contrast to previous ad-hoc methods, our principled approach provides the needed extension to bring the foundational univariate method into the multivariate world of transcriptomics. In addition to use in a regulatory setting, our method can provide hypothesis generation when gene sets correspond to mechanistic pathways.

Availability and implementation: BS-BMD is implemented in R and C++ and available at https://github.com/NIEHS/BS-BMD.

Published by Oxford University Press 2025.

PubMed Disclaimer

Figures

**Figure 1.**
The probabilistic method for determining BMD based on Crump (1995). Under the assumption of Gaussian error, the top 1% is fixed as the cutoff for an adverse response, where the cutoff is illustrated with the dotted arrow. The BMR is set to 10% and the BMD is the dose where the change in response yields the BMR change above the cutoff. The change in response is equal to the difference of quantiles. **Inset**: The smaller bullseye pattern illustrates samples from a bivariate uncorrelated Gaussian with a random variable for each axis. The cutoff for a multivariate extension of Crump can be understood as the difference between the shells corresponding to the 98th and 78th quantiles.

**Figure 2.**
Fenofibrate BS-BMD simulations by each Hallmark gene set for kidney (left) and liver (right). The distribution of 5000 posterior samples for the BS-BMD (X-axis) are shown for each Hallmark gene set (Y-axis) as well as the median (diamonds), and 5th percentile (triangles). The density is shown when the probability mass is at least 0.01.

**Figure 3.**
Weak signal in the liver with Fenofibrate. There are 17 genes in the TGF Beta Signaling set and the difference between the 98th and 78th quantile in the noise distribution is roughly 10, represented by the dotted line. The aggregate effect of all the genes, which nearly reaches a value of 20, has a moderate BMD before correcting for the spline model variance. After correcting, the risk statistic has a partially negative response and does not cross the cutoff, so the BMD is set to the maximum dose. In some cases the risk statistic still crosses the cutoff, suggesting a true effect but at a higher dose. The top three genes in the stack are shown with the top gene coinciding with the aggregate effect, but the top genes vary with each iteration.

**Figure 4.**
Correlation matrix of genes in the liver exposed to fenofibrate. Genes are correlated through latent factors rather than directly through the response, so this does not represent an empirical covariance matrix. The covariance is computed by averaging over 1000 posterior samples after burning 2000. The rows and columns (i.e. genes) are ordered using a hierarchical clustering with a distance given by 1 minus the correlation, but only every 5th gene is shown because the details are too fine to print.

See this image and copyright information in PMC

References

1. Aleksander SA, Balhoff J, Carbon S. et al. ; Gene Ontology Consortium. The gene ontology knowledgebase in 2023. Genetics 2023;224:iyad031. - PMC - PubMed
1. Ashburner M, Ball CA, Blake JA. et al. Gene ontology: tool for the unification of biology. Nat Genet 2000;25:25–9. - PMC - PubMed
1. Barry WT, Nobel AB, Wright FA.. A statistical framework for testing functional categories in microarray data. Ann Appl Stat 2008;2:286–315.
1. Basili D, Reynolds J, Houghton J. et al. Latent variables capture pathway-level points of departure in high-throughput toxicogenomic data. Chem Res Toxicol 2022;35:670–83. - PMC - PubMed
1. Bhattacharya A, Dunson DB.. Sparse bayesian infinite factor models. Biometrika 2011;98:291–306. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bayesian gene set benchmark dose estimation for "omic" responses

Affiliations

Bayesian gene set benchmark dose estimation for "omic" responses

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials