FreeContact: fast and free software for protein contact prediction from residue co-evolution

László Kaján, Thomas A Hopf, Matúš Kalaš, Debora S Marks, Burkhard Rost¹

Affiliations

PMID: 24669753
PMCID: PMC3987048
DOI: 10.1186/1471-2105-15-85

FreeContact: fast and free software for protein contact prediction from residue co-evolution

László Kaján et al. BMC Bioinformatics. 2014.

. 2014 Mar 26:15:85.

doi: 10.1186/1471-2105-15-85.

Authors

László Kaján, Thomas A Hopf, Matúš Kalaš, Debora S Marks, Burkhard Rost¹

Affiliation

¹ Department for Bioinformatics and Computational Biology, TU Munich, Boltzmannstraße 3, Garching 85748, Germany. assistant@rostlab.org.

PMID: 24669753
PMCID: PMC3987048
DOI: 10.1186/1471-2105-15-85

Abstract

Background: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software.

Results: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library "libfreecontact", complete with command line tool "freecontact", as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability.

Conclusions: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).

PubMed Disclaimer

Figures

**Figure 1**
**Runtimes for FreeContact.** We measured the runtime (logarithmic y-axis) for different program components (x-axis) on a single thread. The program components were: “seqw” – sequence weighting; “pairfreq” – pairwise residue frequencies; “shrink” – shrinking of covariance matrix; “inv” – sparse inverse covariance estimation/covariance matrix inversion. The different colors distinguish: the original PSICOV implementation (blue), our acceleration of PSICOV (FC.psicov, yellow), our acceleration of the faster PSICOV version “sensible default” (FC.psicov-fast, green), and our implementation of EVfold-mfDCA (FC.evfold, red). The whiskers on the box plots show the most extreme data point that is less than 1.5-times the interquartile range from the box. Outliers are not shown. Total runtime of all methods tested is dominated by the sparse inverse covariance estimation/covariance matrix inversion component.

**Figure 2**
**Speedup using multiple threads. A**: Sequence weighting. Speed is calculated as: proteins in alignment² length of target protein/runtime. B: Pairwise residue frequency calculation. Speed is calculated as: proteins in alignment length of target protein²/runtime. Dashed lines indicate linear correlation, extrapolated from one thread. The whiskers extend to the most extreme data point that is less than 1.5-times the interquartile range from the box. The surprisingly clear correlation between the number of threads and speed demonstrates how well our implementation scales for multi-threading.

See this image and copyright information in PMC

References

1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28(1):235–242. - PMC - PubMed
1. Magrane M, Consortium U. UniProt knowledgebase: a hub of integrated protein data. Database: the journal of biological databases and curation. 2011;2011:bar009. - PMC - PubMed
1. Rost B, Sander C. Bridging the protein sequence-structure gap by structure predictions. Annual review of biophysics and biomolecular structure. 1996;25:113–136. - PubMed
1. Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T. The SWISS-MODEL repository and associated resources. Nucleic Acids Res. 2009;37(Database issue):D387–392. - PMC - PubMed
1. Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H, Yang Z, Meng EC, Pettersen EF, Huang CC, Datta RS, Sampathkumar P, Madhusudhan MS, Sjölander K, Ferrin TE, Burley SK, Sali A. ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 2011;39(Database issue):D465–474. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM106303/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FreeContact: fast and free software for protein contact prediction from residue co-evolution

Affiliation

FreeContact: fast and free software for protein contact prediction from residue co-evolution

Authors

Affiliation

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources