Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct;32(10):e4772.
doi: 10.1002/pro.4772.

Structural characterization of an intrinsically disordered protein complex using integrated small-angle neutron scattering and computing

Affiliations

Structural characterization of an intrinsically disordered protein complex using integrated small-angle neutron scattering and computing

Serena H Chen et al. Protein Sci. 2023 Oct.

Abstract

Characterizing structural ensembles of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) of proteins is essential for studying structure-function relationships. Due to the different neutron scattering lengths of hydrogen and deuterium, selective labeling and contrast matching in small-angle neutron scattering (SANS) becomes an effective tool to study dynamic structures of disordered systems. However, experimental timescales typically capture measurements averaged over multiple conformations, leaving complex SANS data for disentanglement. We hereby demonstrate an integrated method to elucidate the structural ensemble of a complex formed by two IDRs. We use data from both full contrast and contrast matching with residue-specific deuterium labeling SANS experiments, microsecond all-atom molecular dynamics (MD) simulations with four molecular mechanics force fields, and an autoencoder-based deep learning (DL) algorithm. From our combined approach, we show that selective deuteration provides additional information that helps characterize structural ensembles. We find that among the four force fields, a99SB-disp and CHARMM36m show the strongest agreement with SANS and NMR experiments. In addition, our DL algorithm not only complements conventional structural analysis methods but also successfully differentiates NMR and MD structures which are indistinguishable on the free energy surface. Lastly, we present an ensemble that describes experimental SANS and NMR data better than MD ensembles generated by one single force field and reveal three clusters of distinct conformations. Our results demonstrate a new integrated approach for characterizing structural ensembles of IDPs.

Keywords: deep learning; force field; intrinsically disordered protein; small-angle neutron scattering; structural ensemble.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

FIGURE 1
FIGURE 1
Selective deuteration and SANS provide representative R g values for NCBD/ACTR structural characterization. (a) Model of the NCBD/ACTR complex illustrating the selectively deuterated Ala and Leu residues of NCBD used for neutron experiments. NCBD (black ribbon) was selectively deuterated at all five Ala and seven Leu positions (yellow spheres). ACTR (gray ribbon) is also shown. (b) Linear–linear plot of SANS curves to illustrate the contrast matched NCBD/ACTR control experiment. SANS curves are normalized by concentration, I(Q)/c. The all‐hydrogenated, h‐NCBD/ACTR complex at 40% D2O (black, open symbols) is contrast matched, showing no scattering signal above the background. The measurable scattering from selectively labeled, d A,L‐NCBD/ACTR in 40% D2O (red, solid symbols) is shown for comparison. (c) Pair‐distribution profiles calculated from the SANS data for h‐NCBD/ACTR in 100% D2O (blue) and d A,L‐NCBD/ACTR in 40% D2O buffer (red). Additional profile calculations, with varying D max , are also shown for each in light blue and red, respectively (also see Figure S1e,f). The R g values from the representative profiles are listed in the legend (also see Table S1). (d) Dimensionless Kratky plots for the full contrast h‐NCBD/ACTR in 100% D2O (blue) and selectively labeled d A,L‐NCBD/ACTR in 40% D2O (red). The dotted lines represent the reference (x, y)‐values (√3, 1.104), where folded proteins exhibit a local maximum near the intersection. (e) Ab initio shape reconstructions from the two SANS contrast conditions superimposed onto the PDB 1KBH structure. The views are rotated 90° (indicated by the arrow) from top to bottom. The middle and right views also highlight the NCBD selectively deuterated Ala and Leu residues (yellow). The right view shows the merge overlay of the left and middle shape envelopes.
FIGURE 2
FIGURE 2
Comparison of NCBD/ACTR force field‐based structural ensembles reveals that a99SB‐disp and C36m show the strongest agreement with SANS and NMR experiments. (a–c) Comparison of calculated average SANS curves by CRYSON (Svergun et al., 1998) to experimental SANS curves. (a) SANS curves normalized by concentration, I(Q)/c, against Q from 0.02 Å1 to 0.30 Å1 for h‐NCBD/ACTR in 100% D2O and against Q from 0.02 Å1 to 0.25 Å1 for d A,L‐NCBD/ACTR in 40% D2O. Experimental SANS data are shown as black points, where error bars represent standard deviation. Computed average SANS curves are depicted as lines, where NMR is in navy, a99SB in tan, a99SB‐ILDN in pink, a99SB‐disp in dark orange, and C36m in light blue. Note that the calculated I(Q)/c is normalized by the experimental I(0)/c of P(r) listed in Table S1, that is, 12.2 cm2/g  and 0.596 cm2/g for h‐NCBD/ACTR in 100% D2O and d A,L‐NCBD/ACTR in 40% D2O, respectively. The standard deviations of computed SANS curves are shown but lie within the line plots. (b,c) Reduced chi‐square values (χ 2) of NMR structures and MD ensembles referenced to experimental SANS curves for (b) h‐NCBD/ACTR in 100% D2O and (c) d A,L‐NCBD/ACTR in 40% D2O. The χ2 is plotted against the increasing Q range, in which the lower bound is 0.02 Å1 and the upper bound (Qupperbound) is from 0.03 Å1 to 0.30 Å1 in b and to 0.25 Å1 in c, with an interval of 0.01 Å1. Shaded error bars represent 95% confidence interval of the mean. Error bars of the χ2 for the MD ensembles are shown but most of them lie within the line plots. The χ2 value for h‐NCBD/ACTR in 100% D2O is higher than that for d A,L‐NCBD/ACTR in 40% D2O, which is attributed to the higher scattering contrast of h‐NCBD/ACTR in 100% D2O. (d) Radius of gyration (R g ) of MD structures are computed using atomic coordinates and masses in GROMACS (Lindahl et al., 2001) (MD) as well as using CRYSON with and without selective deuteration (CRYSON d A,L‐NCBD/ACTR in 40% D2O vs. CRYSON, h‐NCBD/ACTR in 100% D2O). Average R g values of the 20 NMR structures using CRYSON are included for comparison. R g values derived from the experimental SANS curves of h‐NCBD/ACTR in 100% D2O and d A,L‐NCBD/ACTR in 40% D2O are shown as dashed lines in blue and red, respectively. Error bars represent the standard deviation among the structures of each ensemble. (e) Comparison of MD‐derived and experimental chemical shifts by (left) Pearson's correlation coefficients and (right) RMSEs. Chemical shifts of MD structures are computed using SPARTA+ (Shen & Bax, 2010).
FIGURE 3
FIGURE 3
Comparison of NCBD/ACTR R g –based structural ensembles reveals that 100+40+ has the best agreement to SANS experiments while all R g –based ensembles have better agreement to NMR experiments than the force field–based ensembles. (a–c) Comparison of calculated average SANS curves by CRYSON to experimental SANS curves for 10040, 10040+, 100+40, and 100+40+ structures. (a) SANS curves normalized by concentration, I(Q)/c, against Q as in Figure 2a. Experimental SANS data are shown as black points, where error bars represent standard deviation. Computed average SANS curves are depicted as lines, where 100+40+ is in purple, 100+40 in teal, 10040+ in yellow, and 10040 in gray. Note that the calculated I(Q)/c is normalized by the experimental I(0)/c of P(r) listed in Table S1, that is, 12.2 cm2/g  and 0.596 cm2/g for h‐NCBD/ACTR and d A,L‐NCBD/ACTR, respectively. The standard deviations of computed SANS curves are shown but lie within the line plots. (b,c) Reduced chi‐square values (χ 2) of the SANS curves between experiments and each of the R g –based ensembles for (b) h‐NCBD/ACTR in 100% D2O and (c) d A,L‐NCBD/ACTR in 40% D2O. The χ 2 is plotted against increasing Q range as in Figure 2b,c. Shaded error bars represent a 95% confidence interval of the mean. Note that error bars of the χ 2 for the ensembles are shown but most of them lie within the line plots. The χ 2 value for h‐NCBD/ACTR in 100% D2O is higher than that for d A,L‐NCBD/ACTR in 40% D2O, which is attributed to the higher scattering contrast of h‐NCBD/ACTR in 100% D2O. (d) Radius of gyration (R g ) is computed using atomic coordinates and masses in GROMACS (MD) as well as using CRYSON with and without selective deuteration (CRYSON d A,L‐NCBD/ACTR in 40% D2O vs. CRYSON, h‐NCBD/ACTR in 100% D2O). R g values derived from the experimental SANS curves of h‐NCBD/ACTR in 100% D2O and d A,L‐NCBD/ACTR in 40% D2O are shown as dashed lines in blue and red, respectively. Error bars represent the standard deviation among the structures of each R g –based ensemble. (e) Comparison of computed and experimental chemical shifts by (left) Pearson's correlation coefficients and (right) RMSEs. These analyses are performed on 233 100+40+ structures, 6772 100+40 structures, 5265 10040+ structures, and 95,750 10040 structures.
FIGURE 4
FIGURE 4
Representative NMR and MD structures are structurally different yet indistinguishable on the free energy surface. (a) Free energy surface of all NMR and MD structures in terms of two reaction coordinates, the sum of the contact area, A, between NCBD and ACTR and the crossing angle, θ, between their longest helices. Each panel shows the NMR and MD structures from each force field satisfying at least one R g constraints from SANS experiments. Data points are color coded as in Figure 2. An NMR structure and a representative MD structure closest to the free energy minimum from each force field are highlighted by a yellow “x” and presented in b, c, d, e, and f, respectively. (b–f) Representative NCBD (black)/ACTR (gray) structures selected from the free energy surface. The longest helices of NCBD and ACTR are colored in blue and cyan, respectively. The contact area is depicted by a magenta surface. The rest of the complex surface is shown in light gray in the NMR structure but omitted in the other structures for clarity.
FIGURE 5
FIGURE 5
A convolutional variational autoencoder model discerns structural differences in the NCBD/ACTR structures from NMR spectroscopy and MD simulations. (a) The 3‐D latent space representing a total of 9816 NMR and MD structures in the training set. Clusters are labeled by the experimental method and force fields and color coded as in Figure 2. Note that some C36m data are dimmed to reveal the NMR data and the y‐ and z‐axes are reversed due to the viewing angle of the 3‐D plot. White ‘x’ marks are the representative NMR, a99SB, a99SB‐disp, and C36m structures selected from the free energy surface in Figure 4, showing a clear separation between these structures in the CVAE latent space. The representative a99SB‐ILDN structure is in the validation set and therefore not on the graph. We show the same latent space viewed from VAE 1 with VAE 2 and 3 axes switched for ‘x’ visibility and easy comparison with panel b. (b) The 3‐D latent space projected onto the VAE 1‐VAE 2 plane. Structures are labeled based on a conditional agreement of R g values from SANS experiments and all‐atom RMSD values with respect to a reference 100+40+ structure highlighted by a cyan star. Structures that satisfy either one of the R g constraints (10040+ or 100+40) are in yellow and teal, respectively. Structures that are within both constraints (100+40+) are in purple.
FIGURE 6
FIGURE 6
Characterization of NCBD/ACTR structural ensemble based on the structures satisfying both experimental R g constraints reveals three distinct clusters. (a) The 3‐D latent space projected onto the VAE 1‐VAE 2 plane with only the 100+40+ structures. Structures are labeled based on all‐atom RMSD values using cutoff distances of 5 Å and 8 Å with respect to the same reference 100+40+ structure shown in Figure 5b highlighted by a cyan box and presented in e. The structure closest to the centroid of each cluster is boxed and presented in f, g, and h, respectively. (b) The Cα RMSF values of ACTR and NCBD structures in the three clusters. The flexible region of ACTR is highlighted in magenta while that of NCBD is in yellow. Error bars represent the standard deviation. (c) The 100+40+ structures on the free energy surface shown in Figure 4a. (d) Helical fraction of 100+40+ structures. The flexible regions are highlighted again for comparison. (e) Reference 100+40+ structure for RMSD. (f–h) Representative NCBD (black)/ACTR (gray) structures selected from the three clusters, each with its cluster's composition percentage in the ensemble. The flexible regions are color coded as in b and d.

Similar articles

Cited by

References

    1. Agarwal R, Smith MD, Smith JC. Capturing deuteration effects in a molecular mechanics force field: deuterated THF and the THF–water miscibility gap. J Chem Theory Comput. 2020;16(4):2529–2540. - PubMed
    1. Akere A, Chen SH, Liu X, Chen Y, Dantu SC, Pandini A, et al. Structure‐based enzyme engineering improves donor‐substrate recognition of Arabidopsis thaliana glycosyltransferases. Biochem J. 2020;477(15):2791–2805. - PMC - PubMed
    1. Allison JR, Varnai P, Dobson CM, Vendruscolo M. Determination of the free energy landscape of α‐synuclein using spin label nuclear magnetic resonance measurements. J Am Chem Soc. 2009;131(51):18314–18326. - PubMed
    1. Anzick SL, Kononen J, Walker RL, Azorsa DO, Tanner MM, Guan XY, et al. AIB1, a steroid receptor coactivator amplified in breast and ovarian cancer. Science. 1997;277(5328):965–968. - PubMed
    1. Appadurai R, Nagesh J, Srivastava A. High resolution ensemble description of metamorphic and intrinsically disordered proteins using an efficient hybrid parallel tempering scheme. Nat Commun. 2021;12(1):958. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources