Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 12;38(8):2353-2355.
doi: 10.1093/bioinformatics/btac072.

pyFoldX: enabling biomolecular analysis and engineering along structural ensembles

Affiliations

pyFoldX: enabling biomolecular analysis and engineering along structural ensembles

Leandro G Radusky et al. Bioinformatics. .

Abstract

Summary: Recent years have seen an increase in the number of structures available, not only for new proteins but also for the same protein crystallized with different molecules and proteins. While protein design software has proven to be successful in designing and modifying proteins, they can also be overly sensitive to small conformational differences between structures of the same protein. To cope with this, we introduce here pyFoldX, a python library that allows the integrative analysis of structures of the same protein using FoldX, an established forcefield and modelling software. The library offers new functionalities for handling different structures of the same protein, an improved molecular parametrization module and an easy integration with the data analysis ecosystem of the python programming language.

Availability and implementation: pyFoldX rely on the FoldX software for energy calculations and modelling, which can be downloaded upon registration in http://foldxsuite.crg.eu/ and its licence is free of charge for academics. The pyFoldX library is open-source. Full details on installation, tutorials covering the library functionality and the scripts used to generate the data and figures presented in this paper are available at https://github.com/leandroradusky/pyFoldX.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A) pyFoldx structure-handling capabilities. Single structures can be instantiated from different formats, while ensembles of structures of the same protein can be instantiated from the protein's UniProt accession. FoldX commands can be executed into structures and ensembles, returning pandas dataframes with energies and, if applicable, objects with the transformed structures. (B) Example of parametrization of a glucose molecule with the pyFoldX paramx package. (C) Analysed mutations dataset description. To train a random forest classifier, 80% of the Missense3D-DB mutations were used in order to estimate the probability of belonging to the ‘pathogenic’ category. The remaining 20% were used for testing and analysed by using the indicated structure in the database and the ensemble of good resolution structures for these proteins. (D) Histogram of probability of belonging to the ‘pathogenic’ category given by the created classifier for mutations mapped into their best structure by Missense3D-DB (left) and the mean of the probabilities for all crystals of good resolution along its ensemble (right). (E) ROC curve of mutation class prediction by the generated classifier taking into account best crystal (orange lines) or mean predictions for crystals along ensemble (blue lines). Thin lines: classifying mutations as pathogenic (Ppathogenic > 0.5) or benign (Ppathogenic ≤ 0.5). Thick lines: mutations with no clear prediction are discarded (0.4 > Ppathogenic > 0.7). Overall, predictions are better when ensembles are considered and high accuracy is achieved (AUC = 0.9) when no clear predictions are discarded from the analysis

References

    1. Alford R.F. et al. (2017) The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput., 13, 3031–3048. - PMC - PubMed
    1. Berman H. et al. (2007) The Worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301–D303. - PMC - PubMed
    1. Berman H.M. et al. (2020) The data universe of structural biology. IUCrJ, 7, 630–638. - PMC - PubMed
    1. Delgado J. et al. (2019) FoldX 5.0: working with RNA, small molecules and a new graphical interface. Bioinformatics, 35, 4168–4169. - PMC - PubMed
    1. Delgado Blanco J. et al. (2020) In silico mutagenesis of human ACE2 with S protein and translational efficiency explain SARS-CoV-2 infectivity in different species. PLoS Comput. Biol., 16, e1008450. - PMC - PubMed

Publication types