Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 26;41(1):btaf008.
doi: 10.1093/bioinformatics/btaf008.

Bayesian gene set benchmark dose estimation for "omic" responses

Affiliations

Bayesian gene set benchmark dose estimation for "omic" responses

Daniel Zilber et al. Bioinformatics. .

Abstract

Motivation: Estimating a toxic reference point using tools like the benchmark dose (BMD) is a critical step in setting policy to regulate pollution and ensure safe environments. Toxicity can be measured for different endpoints, including changes in gene expression and histopathology for various tissues, and is typically explored one gene or tissue at a time in a univariate setting that ignores correlation. In this work, we develop a multivariate estimation procedure to estimate the BMD for specified gene sets. Our approach extends the foundational univariate approach by accounting for correlation in a statistically principled way.

Results: We illustrate the method using data from a 5-day rat study and Hallmark gene sets and compare to existing BMD results computed by the EPA for both gene sets and apical histopathology endpoints. In contrast to previous ad-hoc methods, our principled approach provides the needed extension to bring the foundational univariate method into the multivariate world of transcriptomics. In addition to use in a regulatory setting, our method can provide hypothesis generation when gene sets correspond to mechanistic pathways.

Availability and implementation: BS-BMD is implemented in R and C++ and available at https://github.com/NIEHS/BS-BMD.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The probabilistic method for determining BMD based on Crump (1995). Under the assumption of Gaussian error, the top 1% is fixed as the cutoff for an adverse response, where the cutoff is illustrated with the dotted arrow. The BMR is set to 10% and the BMD is the dose where the change in response yields the BMR change above the cutoff. The change in response is equal to the difference of quantiles. Inset: The smaller bullseye pattern illustrates samples from a bivariate uncorrelated Gaussian with a random variable for each axis. The cutoff for a multivariate extension of Crump can be understood as the difference between the shells corresponding to the 98th and 78th quantiles.
Figure 2.
Figure 2.
Fenofibrate BS-BMD simulations by each Hallmark gene set for kidney (left) and liver (right). The distribution of 5000 posterior samples for the BS-BMD (X-axis) are shown for each Hallmark gene set (Y-axis) as well as the median (diamonds), and 5th percentile (triangles). The density is shown when the probability mass is at least 0.01.
Figure 3.
Figure 3.
Weak signal in the liver with Fenofibrate. There are 17 genes in the TGF Beta Signaling set and the difference between the 98th and 78th quantile in the noise distribution is roughly 10, represented by the dotted line. The aggregate effect of all the genes, which nearly reaches a value of 20, has a moderate BMD before correcting for the spline model variance. After correcting, the risk statistic has a partially negative response and does not cross the cutoff, so the BMD is set to the maximum dose. In some cases the risk statistic still crosses the cutoff, suggesting a true effect but at a higher dose. The top three genes in the stack are shown with the top gene coinciding with the aggregate effect, but the top genes vary with each iteration.
Figure 4.
Figure 4.
Correlation matrix of genes in the liver exposed to fenofibrate. Genes are correlated through latent factors rather than directly through the response, so this does not represent an empirical covariance matrix. The covariance is computed by averaging over 1000 posterior samples after burning 2000. The rows and columns (i.e. genes) are ordered using a hierarchical clustering with a distance given by 1 minus the correlation, but only every 5th gene is shown because the details are too fine to print.

References

    1. Aleksander SA, Balhoff J, Carbon S. et al. ; Gene Ontology Consortium. The gene ontology knowledgebase in 2023. Genetics 2023;224:iyad031. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA. et al. Gene ontology: tool for the unification of biology. Nat Genet 2000;25:25–9. - PMC - PubMed
    1. Barry WT, Nobel AB, Wright FA.. A statistical framework for testing functional categories in microarray data. Ann Appl Stat 2008;2:286–315.
    1. Basili D, Reynolds J, Houghton J. et al. Latent variables capture pathway-level points of departure in high-throughput toxicogenomic data. Chem Res Toxicol 2022;35:670–83. - PMC - PubMed
    1. Bhattacharya A, Dunson DB.. Sparse bayesian infinite factor models. Biometrika 2011;98:291–306. - PMC - PubMed