Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 22;10(4):e0123998.
doi: 10.1371/journal.pone.0123998. eCollection 2015.

Building a better fragment library for de novo protein structure prediction

Affiliations

Building a better fragment library for de novo protein structure prediction

Saulo H P de Oliveira et al. PLoS One. .

Abstract

Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. "Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources".

PubMed Disclaimer

Conflict of interest statement

Competing Interests: One of the authors [JS] is currently employed at UCB Pharma. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Schematics of Flib.
Starting from a target sequence, we predict secondary structure (SS) and torsion angles for the target (green). We extract fragments from a template database using a combination of random and exhaustive approaches. Fragments are extracted for each target position. A library containing the top-3000 fragments per position is compiled using the SS score and the Ramachandran-specific sequence score (LIB3000). LIB3000 is then sorted according to the torsion angle score and the top-20 fragments per position are selected to comprise the final library. The final library (FLIB) is complemented by fragments that originate by an enrichment routine (in yellow) and fragments that originate from protein threading hits (orange).
Fig 2
Fig 2. Comparison between Flib’s random extraction and exhaustive extraction methods.
Analysis of the precision of fragment libraries generated by Flib using two different approaches for fragment extraction: random extraction (red), and exhaustive extraction (blue). We varied the RMSD to native structure cutoff to define a good fragment from 0.1 to 2.0 Angstroms (x-axis). The average precision on the 43 proteins in the test data set (left) and the average coverage (right) are shown for fragment libraries containing the top-1000 scoring fragments extracted exhaustively or at random. The precision indicates the proportion of good fragments in the generated libraries (y-axis).
Fig 3
Fig 3. Effect of protein threading hits on fragment library quality.
Analysis of the impact of fragments extracted from protein threading hits. Precision and coverage are shown for the fragment libraries generated by LIB20, Protein Threading Hits and Flib (a combination of the other two approaches). We varied the RMSD to native structure cutoff to define a good fragment from 0.1 to 2.0 Angstroms (x-axis). The average precision and coverage on the 43 proteins in the test data set is shown for each approach. The precision indicates the proportion of good fragments in the generated libraries (y-axis). The coverage indicate the proportion of residues of the target represented by at least one good fragment.
Fig 4
Fig 4. Relationship between secondary structure class (SS-Class) and fragment quality.
Boxplot of the RMSD to native structure (y-axis) of 200 fragments per target position (x-axis) for the protein 1E6K. The top-200 scoring fragments from its LIB3000 were selected and are displayed. This subset of LIB3000 was chosen to increase performance of data visualization. Four Different SS Classes are defined: majority α-helical (green), majority β-strand (red), majority loop (blue) and other (black). Positions for which fragments are majority α-helical or majority β-strand present significantly lower RMSDs to the native structure and a smaller spread compared to majority loop and other positions.
Fig 5
Fig 5. Comparison between HHFrag, NNMake and Flib.
Precision (left) and coverage (right) of fragment libraries generated using NNMake (red), HHFrag (green) and Flib (blue). The precision and coverage of the fragment libraries are averaged on a set of 41 structurally diverse proteins. We varied the RMSD cutoff to define a good fragment (x axis) and evaluated the precision (proportion of good Fragments in the libraries) and coverage (proportion of protein residues represented by a good fragment) for each method.
Fig 6
Fig 6. Comparison between HHFrag, NNMake and Flib.
Precision of fragment libraries generated using NNMake (red), HHFrag (green), and Flib (blue) separated by SS Class. The precision of the fragment libraries were averaged on a set of 41 structurally diverse proteins. We varied the cutoff to define a good fragment (x axis) and evaluated the precision (proportion of good fragments in the libraries) for each method within four different SS classes: majority α-helical (top left), majority β-strand (top right), majority loop (bottom right) and other (bottom left).
Fig 7
Fig 7. Effect of Homologs on fragment library quality.
Precision (left) and coverage (right) of fragment libraries generated using three different methods: Rosetta’s NNMake (crosses), our method Flib (circles), and HHFrag (triangles). We varied the cutoff to define a good fragment (x axis) and evaluated the precision (proportion of good fragments in the libraries) and coverage (proportion of protein residues represented by a good fragment) for each of the methods when: homologs are included (red and orange) and when homologs are excluded (light and dark green). Homologs are always excluded from Flib (blue).
Fig 8
Fig 8. TM-Score of the best decoy as generated by Flib+SAINT2 and by NNMake +SAINT2.
For each approach, 1,000 decoys were generated and the best decoy (highest TM-Score when superimposed to native structure) was chosen. Results are shown for the 41 proteins in our data set. We compared the TM-Score of best decoy generated by Flib + SAINT2 (x-axis) against NNMake + SAINT2. Each point represents a target. Point color represents the target's SCOP class and the point size is proportional to the protein length. The dotted lines indicate the cutoff for defining an accurate model (TM-Score > 0.5). Flib libraries generated accurate models for 12 of the 41 cases in our PDB-representative set. NNMake libraries generated an accurate model for 8 of the 41 cases. On the 13 cases for which accurate models were generated, Flib libraries performed better in 10 cases. Flib outperforms NNMake in 31 of the 41 cases.

Similar articles

Cited by

References

    1. Raman S, Vernon R, Thompson J, Tyka M, Sadreyev R, Pei J et al. Structure prediction for CASP8 with all-atom renement using Rosetta. Proteins 77 Suppl 9:89–99. (2009) 10.1002/prot.22540 - DOI - PMC - PubMed
    1. Bradley P, Misura KM, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science 309, 1868–71. (2005) - PubMed
    1. Bonneau R, Strauss CE, Rohl CA, Chivian D, Bradley P, Malmstrom L et al. De novo prediction of three-dimensional structures for major protein families. J Mol Biol 322(1):65–78 (2002) - PubMed
    1. Bonneau R, Tsai J, Ruczinski I, Chivian D, Rohl C, Strauss CE et al. Rosetta in CASP4: progress in ab initio protein structure prediction. Proteins Suppl 5:119–26 (2001) - PubMed
    1. Holmes JB, Tsai J. Some fundamental aspects of building protein structures from fragment libraries. Protein Sci. 2004. June;13(6):1636–50. - PMC - PubMed

Publication types

LinkOut - more resources