Optimizing representations for integrative structural modeling using Bayesian model selection

doi:10.1101/2023.12.12.571227

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Dec 13:2023.12.12.571227.

doi: 10.1101/2023.12.12.571227.

Optimizing representations for integrative structural modeling using Bayesian model selection

Shreyas Arvindekar¹, Aditi S Pathak¹, Kartik Majila¹, Shruthi Viswanath¹

Affiliations

PMID: 38168172
PMCID: PMC10760022
DOI: 10.1101/2023.12.12.571227

Optimizing representations for integrative structural modeling using Bayesian model selection

Shreyas Arvindekar et al. bioRxiv. 2023.

[Preprint]. 2023 Dec 13:2023.12.12.571227.

doi: 10.1101/2023.12.12.571227.

Authors

Shreyas Arvindekar¹, Aditi S Pathak¹, Kartik Majila¹, Shruthi Viswanath¹

Affiliation

¹ National Center for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India 560065.

PMID: 38168172
PMCID: PMC10760022
DOI: 10.1101/2023.12.12.571227

Update in

Optimizing representations for integrative structural modeling using Bayesian model selection.
Arvindekar S, Pathak AS, Majila K, Viswanath S. Arvindekar S, et al. Bioinformatics. 2024 Mar 4;40(3):btae106. doi: 10.1093/bioinformatics/btae106. Bioinformatics. 2024. PMID: 38391029 Free PMC article.

Abstract

Motivation: Integrative structural modeling combines data from experiments, physical principles, statistics of previous structures, and prior models to obtain structures of macromolecular assemblies that are challenging to characterize experimentally. The choice of model representation is a key decision in integrative modeling, as it dictates the accuracy of scoring, efficiency of sampling, and resolution of analysis. But currently, the choice is usually made ad hoc, manually.

Results: Here, we report NestOR (Nested Sampling for Optimizing Representation), a fully automated, statistically rigorous method based on Bayesian model selection to identify the optimal coarse-grained representation for a given integrative modeling setup. Given an integrative modeling setup, it determines the optimal representations from given candidate representations based on their model evidence and sampling efficiency. The performance of NestOR was evaluated on a benchmark of four macromolecular assemblies.

Availability: NestOR is implemented in the Integrative Modeling Platform (https://integrativemodeling.org) and is available at https://github.com/isblab/nestor.

Keywords: Bayes factors; Bayesian model selection; coarse-grained representation; integrative modeling; macromolecular assemblies; model evidence; nested sampling.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest None declared.

Figures

**Figure 1.. Effects of using sub-optimal representations for integrative modeling**
Integrative models of the nucleosome deacetylase (NuDe) complex were produced using two coarse-grained representations, one comprising 1-residue per bead and another comprising 50-residues per bead. The former representation was sub-optimal based on the fit to EM map of the resulting models and the sampling efficiency. Localization probability densities (LPDs) of protein domains of the NuDe complex modeled using coarse-grained representations with A. 1-residue beads (sub-optimal representation) and B. 50-residue beads (optimal representation). The densities are superposed on the input EM map (grey, EMDB: 22904) contoured at the recommended threshold. The LPDs are contoured at 10% of their maximum threshold values. C. Time for production sampling of models based on the two representations.

**Figure 2:. Schematic of nested sampling method for optimizing integrative model representation (NestOR)**
A. A schematic representation showing the application of Nested Sampling (NS) to a two-dimensional problem. The iso-likelihood contours for the points with likelihoods L₁, L₂, L₃, and L₄ are shown in the left panel. Their mapping to corresponding prior mass values, X₁, X₂, 𝑋₃, and 𝑋₄, respectively, is shown in the right panel. The panel to the right represents the L versus X plot for these points. B. Flowchart describing an individual nested sampling run. Initialized with the modeling protocol, nested sampling parameters, and the number of cores per run, each NestOR run iteratively accumulates model evidence till nested sampling converges. Once converged, it returns the model evidence and measures of efficiency: time taken for a single MCMC step in IMP using the representation (per-step MCMC sampling time), and time taken by NestOR for the run (NestOR process time). C. Flowchart describing the overall parallelized workflow of NestOR. Given an integrative modeling setup with candidate representations (𝑅), their modeling protocol, the number of runs per representation (𝑛_{𝑟𝑢𝑛𝑠}), and maximum usable threads, NestOR computes the mean model evidence and the mean per-step model sampling time for all candidate representations in parallel. The results of each independent run per representation, computed in the orange box; described in panel B, are aggregated to produce the mean values from the overall workflow in panel C.

**Figure 3:. Performance of NestOR on the benchmark**
The output of NestOR, *i.e.,* the mean of log model evidence and its standard error (blue), and the mean time per Replica Exchange MCMC step (green) is plotted for each system (A. gTuSC, B. RNA polymerase II, C. MHM and D. NuDe). Based on these two criteria, the optimal representation(s) inferred from NestOR are highlighted in orange dashed boxes. The tables accompanying each plot show the results from full-length production sampling for each candidate representation for each system: the time required per independent sampling run, model precision, and fit to data based on the average crosslink score in the major cluster, and the cross-correlation of the EM map with the localization densities of the major cluster. The optimal representations based on the results from full-length production sampling are highlighted in green, whereas representations for which sampling was not exhaustive in the given time are in red. All times are on an AMD Ryzen Threadripper 3990X 64-Core Processor with 256 GB RAM and 2.2 GHz clock speed. Four computing threads were used for each system, except for gTuSC where six threads were used.

**Figure 4.. NestOR efficiency**
The total time required for full-length production sampling of models using all candidate representations for each system (blue) is compared with the total time required by NestOR (orange). Production sampling consisted of 50 (28) independent Replica Exchange MCMC runs for gTuSC, MHM, and NuDe (RNA polymerase II). NestOR was run with previously set parameters (5 runs, 50 live points, 50 RE-MCMC steps per iteration) for each candidate representation till a convergence criterion was met. All times are on a AMD Ryzen Threadripper 3990X 64-Core Processor with 256 GB RAM and 2.2 GHz clock speed.

**Figure 5.. Robustness to the choice of prior**
NestOR outputs, *i.e.*, the evidence estimates and associated uncertainties, were compared for three different priors (orange, green, blue), on two systems, A. MHM, and B. NuDe. Each prior comprised a random subset of 30% of a set of input crosslinks, in addition to stereochemistry restraints.

See this image and copyright information in PMC

References

1. Alber F, Dokudovskaya S, Veenhoff LM et al. Determining the architectures of macromolecular assemblies. Nature 2007;450:683–94. - PubMed
1. Arvindekar S, Jackman MJ, Low JKK et al. Molecular architecture of nucleosome remodeling and deacetylase sub-complexes by integrative structure determination. Protein Science 2022;31:e4387. - PMC - PubMed
1. Ashton G, Bernstein N, Buchner J et al. Nested sampling for physical scientists. Nat Rev Methods Primers 2022;2:1–22.
1. Bonomi M, Heller GT, Camilloni C et al. Principles of protein structural ensemble determination. Current Opinion in Structural Biology 2017;42:106–16. - PubMed
1. Brilot AF, Lyon AS, Zelter A et al. CM1-driven assembly and activation of yeast γ-tubulin small complex underlies microtubule nucleation. Carter AP, Akhmanova A (eds.). eLife 2021;10:e65168. - PMC - PubMed

Publication types

Actions

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Alber F, Dokudovskaya S, Veenhoff LM et al. Determining the architectures of macromolecular assemblies. Nature 2007;450:683–94. - PubMed

[2] Alber F, Dokudovskaya S, Veenhoff LM et al. Determining the architectures of macromolecular assemblies. Nature 2007;450:683–94. - PubMed

[3] Arvindekar S, Jackman MJ, Low JKK et al. Molecular architecture of nucleosome remodeling and deacetylase sub-complexes by integrative structure determination. Protein Science 2022;31:e4387. - PMC - PubMed

[4] Arvindekar S, Jackman MJ, Low JKK et al. Molecular architecture of nucleosome remodeling and deacetylase sub-complexes by integrative structure determination. Protein Science 2022;31:e4387. - PMC - PubMed

[5] Ashton G, Bernstein N, Buchner J et al. Nested sampling for physical scientists. Nat Rev Methods Primers 2022;2:1–22.

[6] Ashton G, Bernstein N, Buchner J et al. Nested sampling for physical scientists. Nat Rev Methods Primers 2022;2:1–22.

[7] Bonomi M, Heller GT, Camilloni C et al. Principles of protein structural ensemble determination. Current Opinion in Structural Biology 2017;42:106–16. - PubMed

[8] Bonomi M, Heller GT, Camilloni C et al. Principles of protein structural ensemble determination. Current Opinion in Structural Biology 2017;42:106–16. - PubMed

[9] Brilot AF, Lyon AS, Zelter A et al. CM1-driven assembly and activation of yeast γ-tubulin small complex underlies microtubule nucleation. Carter AP, Akhmanova A (eds.). eLife 2021;10:e65168. - PMC - PubMed

[10] Brilot AF, Lyon AS, Zelter A et al. CM1-driven assembly and activation of yeast γ-tubulin small complex underlies microtubule nucleation. Carter AP, Akhmanova A (eds.). eLife 2021;10:e65168. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Optimizing representations for integrative structural modeling using Bayesian model selection

Affiliation

Optimizing representations for integrative structural modeling using Bayesian model selection

Authors

Affiliation

Update in

Abstract

Conflict of interest statement

Figures

Similar articles

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources