Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 23;21(11):1127.
doi: 10.3390/e21111127. Epub 2019 Nov 16.

Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting

Affiliations

Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting

Duccio Malinverni et al. Entropy (Basel). .

Abstract

Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.

Keywords: coevolutionary analysis; direct-coupling analysis; maximum entropy models; protein contact predictions; sequence reweighting; specificity determining contacts.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Sequence and Structural variability in the Response Regulator (RR) family. In all panels the color scheme follows the one defined in panel (A). (A) The three most abundant two-domain RR architectures with different dimerization modes, and their number of sequences in the complete RR alignment (fraction of the total number of sequences in parentheses). (B) Sequence variability of the RR family as shown by principal component projection of the RR sequences composed of the OmpR, LytTR, and GerE subfamilies. Projections are along the first two principal components. Black lines depict iso-density levels. (C) Contact map of three representative structures of the different subfamilies. Contacts are defined by a 5 Å distance-threshold between heavy atoms. Gray dots depict intra-molecular contacts. Colored dots depict homo-dimeric inter-molecular contacts (see Methods). (DF) Heterogeneous homo-dimerization assemblies in the RR family. The three structural models used to define the contact map in panel C are depicted. The gray monomers in each model are structurally aligned.
Figure 2
Figure 2
Prediction quality at varying alignments size. All reported quantities are shown as a function of the fraction of sequences randomly sampled from the full alignment Bf. Error bars denote standard deviations over 200 random samplings. (A) Overall precision (i.e., true positive rate) computed over the complete contact map (union of intra-molecular contacts and all interface contacts). Full denotes the union of all three alignments. (B) Fraction of the α-interface predicted in the N (112) highest ranked contacts. (C) Fraction of the β-interface predicted in the N (112) highest ranked contacts. (D) Fraction of the γ-interface predicted in the N (112) highest ranked contacts.
Figure 3
Figure 3
Results of sequence reweighting (SR). (A) Overall precision of the N highest ranked predictions, computed over the full contact map, comprising all intra- and inter-molecular contacts observed in the three reference structures. (B) Average coupling-score of the α-interface. (C) Average coupling-score of the β-interface. (D) Average coupling-score of the γ-interface.
Figure 4
Figure 4
Identification of subfamily specific residue contacts by SR. Gray dots depict intra-molecular contacts. Colored dots depict interface contacts pertaining to the α- (blue), β- (orange) and γ- (brown-red) interfaces respectively. Dots in green are the top ranked contacts according to the Fijk scores (see Methods). (A) Top 10 highest ranked SR contacts for k = OmpR. (B) Top 10 highest ranked SR contacts for k = LytTR. (C) Top 10 highest ranked SR contacts for k = GerE.

References

    1. Weigt M., White R.A., Szurmant H., Hoch J.A., Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. USA. 2009;106:67–72. doi: 10.1073/pnas.0805923106. - DOI - PMC - PubMed
    1. Morcos F., Pagnani A., Lunt B., Bertolino A., Marks D.S., Sander C., Zecchina R., Onuchic J.N., Hwa T., Weigt M. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. USA. 2011;108:E1293–E1301. doi: 10.1073/pnas.1111471108. - DOI - PMC - PubMed
    1. Jones D.T., Buchan D.W.A., Cozzetto D., Pontil M. PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28:184–190. doi: 10.1093/bioinformatics/btr638. - DOI - PubMed
    1. Ovchinnikov S., Kamisetty H., Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. Elife. 2014;3:e02030. doi: 10.7554/eLife.02030. - DOI - PMC - PubMed
    1. Marks D.S., Colwell L.J., Sheridan R., Hopf T.A., Pagnani A., Zecchina R., Sander C. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE. 2011;6:e28766. doi: 10.1371/journal.pone.0028766. - DOI - PMC - PubMed

LinkOut - more resources