Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 8;126(35):5985-6003.
doi: 10.1021/acs.jpca.2c03726. Epub 2022 Aug 28.

IDPConformerGenerator: A Flexible Software Suite for Sampling the Conformational Space of Disordered Protein States

Affiliations

IDPConformerGenerator: A Flexible Software Suite for Sampling the Conformational Space of Disordered Protein States

João M C Teixeira et al. J Phys Chem A. .

Abstract

The power of structural information for informing biological mechanisms is clear for stable folded macromolecules, but similar structure-function insight is more difficult to obtain for highly dynamic systems such as intrinsically disordered proteins (IDPs) which must be described as structural ensembles. Here, we present IDPConformerGenerator, a flexible, modular open-source software platform for generating large and diverse ensembles of disordered protein states that builds conformers that obey geometric, steric, and other physical restraints on the input sequence. IDPConformerGenerator samples backbone phi (φ), psi (ψ), and omega (ω) torsion angles of relevant sequence fragments from loops and secondary structure elements extracted from folded protein structures in the RCSB Protein Data Bank and builds side chains from robust Monte Carlo algorithms using expanded rotamer libraries. IDPConformerGenerator has many user-defined options enabling variable fractional sampling of secondary structures, supports Bayesian models for assessing the agreement of IDP ensembles for consistency with experimental data, and introduces a machine learning approach to transform between internal and Cartesian coordinates with reduced error. IDPConformerGenerator will facilitate the characterization of disordered proteins to ultimately provide structural insights into these states that have key biological functions.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Schematic diagram of the IDPConformerGenerator approach. Generating conformers requires the creation of a reusable database of backbone torsion angles and input of the primary sequence, with optional user-defined parameters including those for amino acid substitutions, secondary structure sampling, and fragment size probabilities. An example of a peptide of 2 residues (fragment size 2) that is used to build inhibitor-2 (I-2) is shown with backbone torsion angles labeled, a helical secondary structure with all-atom side chains of an I-2 conformer, and an illustrative set of 100 generated conformations of I-2. Conformers that are generated can then be scored or reweighted based on experimental data.
Figure 2
Figure 2
Histogram of ω dihedral angle distributions for structures found within the IDPConformerGenerator database. PDBs from the 24 003 PDB structure database were used, with sets of this with resolutions better or equal to 1.5 Å (∼5000 structures), better or equal to 1.8 Å (∼16 000 structures), and better or equal to 2.0 Å (full set). The deviations from an ω torsion angle of 180° (trans) are plotted, centered around 0°, to facilitate the visualization of the distribution rather than the actual ω angles. Cis peptide bonds are ignored for visualization purposes.
Figure 3
Figure 3
Timings for IDPConformerGenerator with MC-SCE and dependencies on the protein test system (size), secondary structure sampling, and fragment length. Speed is defined as the number of conformers per hour. (A) Speeds for variable secondary structure sampling methods on disordered proteins with different lengths, shown for sampling with custom secondary structure propensities (CSSS, yellow), “ANY” secondary structure (gray), combinations of loops, helices, and strands (orange), and only loops (blue). Speeds are shown for selected ensembles of the drkN SH3 domain unfolded state (59 amino acids, aa), the Sic1 N-terminal targeting region (92 aa), α-synuclein (aSyn, 140 aa), and inhibitor 2 (I-2, 159 aa). Speeds for all conformer ensembles generated including for the Tau fragment (441 aa) are in Supporting Information Table S2A. (B) Speeds for variable fragment sizes and secondary structure sampling methods for Sic1 are shown for sampling with only loops with substitutions (gray), only loops without substitutions (orange), and “ANY” secondary structure (blue). Default fragment length probabilities are 0.1, 0.1, 0.3, 0.3, and 0.2 for fragment lengths of 1, 2, 3, 4, and 5, respectively.
Figure 4
Figure 4
Phi (φ) torsion angles in Sic1 ensembles sampled using different fragment sizes, with and without substitutions. Calculations were for 1000-conformer ensembles generated by sampling loops only, with fragment sizes of 1, 2, 3, 4, and 5 and default fragment size probabilities. The left and right columns are for the Sic1 sequence without and with substitutions, respectively. Substitutions are derived from columns 5, 3, and 2 of the EDSSMat50 amino acid substitution matrix. The plot was generated with the ′--plots′ flag in the ′idpconfgen torsions′ CLI.
Figure 5
Figure 5
Diversity analysis of conformational ensembles of the drkN SH3 domain unfolded state and I-2. The radius of gyration (Rg), end-to-end distance (Ree), asphericity (A), and pairwise root-mean-squared deviations of atomic positions (pwRMSDs) are shown as a function of secondary structure sampling parameters for 1000-conformer ensembles generated with different secondary structure sampling, including loops (L+), loops and helices (L+H+), loops, helices, and extended strands (L+H+E+), and all torsion angles agnostic to secondary structure (ANY) and biased by δ2D chemical shifts (CSSS) or with FastFloppyTail (FFT) for the drkN SH3 unfolded state (row 1) and I-2 (row 2). Standard deviations for Rg, Ree, A, and pwRMSD are also shown as bars. Supporting Information Figure S6 shows similar data for other protein systems. * is for the standard FFT protocol, which for this case treats the protein as a mixture of ordered and disordered, while the other is for a modified protocol in which the protein is considered to be fully disordered.
Figure 6
Figure 6
Pairwise RMSD distributions for ensembles of the (A) drkN SH3 domain unfolded state and (B) I-2. Calculations were for different ensembles of 1000 conformers each, plotted with bin sizes of 5 Å. “ANY” indicates sampling the database without biasing secondary structures, “nosub” indicates no substitutions, “sub532” indicates amino acid substitutions from columns 5, 3, and 2 of the EDSSMat50 amino acid substitution matrix, and “CheSPI” or “δ2D” indicates custom secondary structure sampling (CSSS) pools biased by CheSPI or δ2D estimations of secondary structure propensities.
Figure 7
Figure 7
Local structural variations between the Tau K18 16-mer WT and different mutants. (A) Distribution of the distance between V300 O and G303 N atoms in 10 000-conformer ensembles generated with no substitutions. (B) Torsion angle distributions for position 301 of the different conformers in these WT and mutant ensembles, with ω representing the torsion angle N-terminal to the φ, as is our convention (typically denoted as ω of the preceding residue).
Figure 8
Figure 8
Fractional secondary structure in Sic1 ensembles. Analyses were performed on 1200-conformer pools of Sic1 generated with different combinations of secondary structure sampling consisting of loops, helix, and extended strands. Orange indicates α-helix detected by DSSP (solid) and the α-region on the Ramachandran (Rama.) diagram (dashed). Blue indicates extended strand for DSSP (solid) and the β-region on the Ramachandran diagram (dashed). Black indicates coil/loop for DSSP (solid) and other regions on the Ramachandran diagram (dashed).
Figure 9
Figure 9
Comparison of torsion angle sampling for L+/H+/E+ and ANY. Ensembles of 1000 conformers each of the drkN SH3 domain unfolded state were generated with sampling a combination of loops (L), helix (H), and extended strands (E) or sampling without biasing secondary structure with the ANY flag. Phi and psi (φ and ψ) torsion angle distributions for each conformer pool are shown as a scatter plot in the first two rows. The third row depicts fractional secondary structure based on DSSP (dark solid lines) or the Ramachandran (Rama.) diagram (dashed lines), with orange indicating α-helix for DSSP and the α-region of the Ramachandran diagram, blue indicating extended strand for DSSP and the β-region of the Ramachandran diagram, and black indicating coil/loop for DSSP and other regions of the Ramachandran diagram.
Figure 10
Figure 10
Custom secondary structure sampling. (Left) For the drkN SH3 domain unfolded state, two sets of 3000 conformers each were generated, and (right) for inhibitor-2 (I-2), two sets of 1500 conformers each were generated, with (A, B) the “ANY” flag or with (C, D) the CSSS flag to do custom secondary structure sampling based on δ2D calculations from NMR chemical shift data. (A, C) Plots of fractional secondary structure based on DSSP (dark solid lines), the Ramachandran (Rama.) diagram (dashed lines), or δ2D (light solid lines). Orange indicates α-helix for δ2D and DSSP and the α-region on the Ramachandran diagram. Blue indicates extended strand for δ2D and DSSP and the β-region on the Ramachandran diagram. Black indicates coil/loop for δ2D and DSSP and other regions on the Ramachandran diagram. (B, D) Aligned conformers of the ensembles using PyMOL.
Figure 11
Figure 11
Root-mean squared deviations (RMSDs) of back-calculated values from conformational ensembles to experimental data for the drkN SH3 domain unfolded state. Analyses of 1000-conformer ensembles generated using various secondary structure sampling and using FastFloppyTail (FFT). RMSDs are given for SAXS, chemical shifts (carbonyl, Cα, Cβ, Hα), PRE, 3JHN-HA, and NOE if available. Sources of experimental data are provided in Methods. * is for the standard FFT protocol, which for this case treats the protein as a mixture of ordered and disordered, while the other is for a modified protocol in which the protein is considered to be fully disordered.
Figure 12
Figure 12
Analysis of tertiary contacts for Sic1 ensembles. (Top row) Cα–Cα distance matrices (lower) with deviations (upper) for 1000-conformer ensembles of Sic1 generated with the loops-only flag for secondary structure, with substitutions from columns 5, 3, and 2 of the EDSSMat50 amino acid substitution matrix and with variable fragment lengths. (Bottom row) Significant differences between Cα–Cα distance matrices (lower) and deviations (upper) (P < 0.05 from a Mann–Whitney U test).

References

    1. Faísca P. F. N.; Nunes A.; Travasso R. D. M.; Shakhnovich E. I. Non-Native Interactions Play an Effective Role in Protein Folding Dynamics. Protein Sci. 2010, 19 (11), 2196–2209. 10.1002/pro.498. - DOI - PMC - PubMed
    1. Vendruscolo M. Proteome Folding and Aggregation. Curr. Opin Struct Biol. 2012, 22 (2), 138–143. 10.1016/j.sbi.2012.01.005. - DOI - PubMed
    1. Jahn T. R.; Radford S. E. Folding versus Aggregation: Polypeptide Conformations on Competing Pathways. Arch. Biochem. Biophys. 2008, 469 (1), 100–117. 10.1016/j.abb.2007.05.015. - DOI - PMC - PubMed
    1. Kulkarni P.; Bhattacharya S.; Achuthan S.; Behal A.; Jolly M. K.; Kotnala S.; Mohanty A.; Rangarajan G.; Salgia R.; Uversky V. Intrinsically Disordered Proteins: Critical Components of the Wetware. Chem. Rev. 2022, 122 (6), 6614–6633. 10.1021/acs.chemrev.1c00848. - DOI - PMC - PubMed
    1. Biesaga M.; Frigolé-Vivas M.; Salvatella X. Intrinsically Disordered Proteins and Biomolecular Condensates as Drug Targets. Curr. Opin Chem. Biol. 2021, 62, 90–100. 10.1016/j.cbpa.2021.02.009. - DOI - PMC - PubMed

Substances