Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(10):e26767.
doi: 10.1371/journal.pone.0026767. Epub 2011 Oct 28.

Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach

Affiliations

Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach

Xiao-Feng Wang et al. PLoS One. 2011.

Abstract

Integral membrane proteins constitute 25-30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthew's correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic overview of RF-based model building process of the TMhhcp approach.
Figure 2
Figure 2. The precision-recall curves based on the jackknife cross-validation tests.
Panels A and B were generated based on DEF1 and DEF2, respectively. The precision-recall curve analysis was conducted at the whole protein chain level, and the precision-recall curves in panels A and B reflected the average precision-recall curves for the 62 tested protein chains. The average ratios of contact residue pairs to the total residue pairs were 0.028 and 0.027, according to DEF1 and DEF2, respectively. Therefore, the corresponding random prediction precision-recall curves in panel A and B were horizontal lines with the precision value of 0.028 and 0.027, respectively.
Figure 3
Figure 3. The precision-recall curves based on the independent test set.
Panels A and B were generated based on DEF1 and DEF2, respectively. The precision-recall curve analysis was conducted at the whole protein chain level, and the precision-recall curves in panels A and B reflected the average precision-recall curves for the 21 tested protein chains. According to DEF1 or DEF2, the average ratio of contact residue pairs to the total residue pairs on the independent test set was 0.025. Therefore, the corresponding random prediction precision-recall curve in panel A or B was a horizontal line with the precision value of 0.025.
Figure 4
Figure 4. The ratio of contacts to non-contacts according to sequence distance.
This figure describes the ratio of contacts to non-contacts according to the grouping of their sequence distance based on DEF1.
Figure 5
Figure 5. The average prediction accuracy of five covariance algorithms.
This figure gives the average prediction accuracy of five different covariance algorithms to predict residue contacts on the training set using DEF1. L is the sum of lengths of all TM segments of a protein chain.
Figure 6
Figure 6. Two Venn diagrams for the predicted residue contacts and helix-helix interactions by three predictors.
The two Venn diagrams display the complementation between the three predictors, TMHcon, MEMPACK and TMhhcp, to predict residue contacts and helix-helix interactions. The corresponding residue contact definition is based on DEF1. ‘Contact’ in panel A represents the observed residue contacts of protein chains in the test set, while ‘Interaction’ in panel B denotes the observed helix-helix interactions in the test set.
Figure 7
Figure 7. Case studies.
This figure displays the performance of TMhhcp on two recently structure solved TM proteins, the Spinach minor light-harvesting complex CP29 (PDB ID: 3PL9, chain: A) and the human adenosine A2A receptor bound with agonist (UK-432097) (PDB ID: 3QAK, chain: A). Panels A and B plot the observed and predicted residue contacts of 3PL9_A and 3QAK_A, respectively. Each grid contains the residue contacts of the corresponding two TM segments. The edges of a grid represent the lengths of the corresponding two TM segments. Panels C and D give the observed and predicted interacting helical pairs of 3PL9_A and 3QAK_A, respectively, where the two boxes connected by a line represent an interacting helical pair.

References

    1. Liu Y, Engelman DM, Gerstein M. Genomic analysis of membrane protein families: abundance and conserved motifs. Genome Biol. 2002;3:research0054. - PMC - PubMed
    1. Wallin E, von Heijne G. Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Science. 1998;7:1029–1038. - PMC - PubMed
    1. Yildirim MA, Goh KI, Cusick ME, Barabasi AL, Vidal M. Drug-target network. Nature Biotechnology. 2007;25:1119–1126. - PubMed
    1. White SH. The progress of membrane protein structure determination. Protein Science. 2004;13:1948–1949. - PMC - PubMed
    1. Doerr A. Membrane protein structures. Nature Methods. 2009;6:35–35.

Publication types