Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 1;31(7):999-1006.
doi: 10.1093/bioinformatics/btu791. Epub 2014 Nov 26.

MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins

Affiliations

MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins

David T Jones et al. Bioinformatics. .

Abstract

Motivation: Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues.

Results: Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts-around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV.

Availability and implementation: MetaPSICOV is available as a freely available web server at http://bioinf.cs.ucl.ac.uk/MetaPSICOV. Raw data (predicted contact lists and 3D models) and source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Distribution of predictions for 9678 observed true contacts, as predicted by PSICOV, DCA and CCMpred. Data are the top L/2 correct contact predictions from each method (for sequence separation >=5 residues), for a set of 150 Pfam families (Jones et al., 2012)
Fig. 2.
Fig. 2.
Mean precision of contact prediction against increasing number of predicted pairs for (a) sequence separation ≥5, (b) for sequence separation ≥23 and (c) for sequence separation ≥23 with redundant contact predictions excluded (see text for details)
Fig. 3.
Fig. 3.
MetaPSICOV top-L/2 precision plotted against (a) PSICOV precision, (b) DCA precision and (c) CCMpred precision for sequence separation ≥5 (line x = y shown for reference)
Fig. 4.
Fig. 4.
All contacts (a) and long-range contacts (b) top-L/10 mean precision for alignments with varying numbers of effective sequences (Neff)
Fig. 5.
Fig. 5.
Differences in mean TM-scores for the 150 benchmark proteins obtained by FRAGFOLD using (a) MetaPSICOV stage 1 contacts and (b) MetaPSICOV stage 2 contacts compared with PSICOV contacts

Similar articles

Cited by

References

    1. Altschuh D., et al. (1987) Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol., 193, 693–707. - PubMed
    1. Berman H.M., et al. (2000) The protein data bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
    1. Betancourt M.R., Thirumalai D. (1999) Pair potentials for protein folding: choice of reference states and sensitivity of predicted native states to variations in the interaction schemes. Protein Sci., 8, 361–369. - PMC - PubMed
    1. Cheng J., Baldi P. (2007) Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics, 8, 113. - PMC - PubMed
    1. de Juan D., et al. (2013) Emerging methods in protein co-evolution. Nat. Rev. Genet., 14, 249–261. - PubMed

Publication types