Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 2;12(1):1396.
doi: 10.1038/s41467-021-21636-z.

Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences

Affiliations

Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences

Anna G Green et al. Nat Commun. .

Abstract

Increasing numbers of protein interactions have been identified in high-throughput experiments, but only a small proportion have solved structures. Recently, sequence coevolution-based approaches have led to a breakthrough in predicting monomer protein structures and protein interaction interfaces. Here, we address the challenges of large-scale interaction prediction at residue resolution with a fast alignment concatenation method and a probabilistic score for the interaction of residues. Importantly, this method (EVcomplex2) is able to assess the likelihood of a protein interaction, as we show here applied to large-scale experimental datasets where the pairwise interactions are unknown. We predict 504 interactions de novo in the E. coli membrane proteome, including 243 that are newly discovered. While EVcomplex2 does not require available structures, coevolving residue pairs can be used to produce structural models of protein interactions, as done here for membrane complexes including the Flagellar Hook-Filament Junction and the Tol/Pal complex.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Scaling EVcouplings methods to full bacterial genomes.
A The search problem for binary protein-protein interactions in Escherichia coli involves finding all of the estimated 104 true interactions out of 107 possible pairs. Only approximately 9% of these true direct interactions have a crystal structure solved in E. coli or a homologous structure in another organism. B Evolutionary couplings learned directly from protein sequences can resolve interfaces. Sequence alignments of both monomeric proteins are created and concatenated by the reciprocal highest identity procedure before inference of evolutionary couplings. Raw evolutionary coupling scores can be combined with features of their distribution, biochemical properties, and sequence entropy to improve inference. C A benchmark dataset of all non-redundant protein interactions with known interface structure.
Fig. 2
Fig. 2. Model predicts interacting residues and interacting proteins with high precision.
A Recall on the held-out fraction of the positive benchmark set (x-axis) and false positive rate on the held-out dataset of non-interacting complexes (y-axis) at a score threshold that gives the corresponding recall. Our logistic regression model (purple) reduces the false positive rate compared to the previous EV Complex score (blue). B Prediction of protein interaction is based on the prediction of interacting residues. Number of predicted interacting residue pairs for complexes inferred to interact (purple) or not inferred to interact (gray) based on our stringent protein complex prediction threshold. C Example performance on known interaction between ABC transporter permease and ATP binding subunit (UniProt IDs: Y1470_HAEIN and Y1471_HAEIN, PDB ID: 2NQ2 chains C and A [https://www.rcsb.org/structure/2NQ2]. D Example performance on known interaction between DNA primase PriS and PriL (UniProt IDs: PRIS_SULSO and PRIL_SULSO, PDB ID: 1ZT2 chain A and B [https://www.rcsb.org/structure/1ZT2]).
Fig. 3
Fig. 3. Discovery of hundreds of new interactions in the E. coli membrane proteome.
A We searched a high-value subset of the 107 possible interactions in the E. coli proteome by searching membrane compartments with themselves and with adjacent membrane compartments. B We found 504 high-scoring protein interactions in the cell envelope, including 75 with structural characterization and 186 with previous experimental evidence (and no structural characterization). CE 3D configurations of previously structurally characterized interactions are accurately predicted by molecular docking with inferred restraints. RMSDs calculated by comparison to known structures with PDB IDs 3RKO [https://www.rcsb.org/structure/3RKO], 2WU2 [https://www.rcsb.org/structure/2WU2], and 2HQS [https://www.rcsb.org/structure/2HQS], respectively. FH Example of three docked models of newly resolved protein complexes: BamE/MltB, YajC/FtsI, and Lnt/MurJ. Evolutionarily coupled residues used as restraints in docking are shown in magenta and connected with solid lines.
Fig. 4
Fig. 4. Model of the flagellar hook-filament junction.
A Schematic of the orientation of the bacterial flagellum. The proteins FlgL (green) and FlgK (blue) form two rings which create the junction between the hook and filament of the flagellum. B Docked model of FlgL and FlgK using evolutionary couplings. PDB structures of homologous proteins from Salmonella typhimurium were used in docking (PDB IDs 2D4Y [https://www.rcsb.org/structure/2D4Y], and 2D4X [https://www.rcsb.org/structure/2D4X], respectively). Predicted interface residues are highlighted in purple. C A previously inferred model of the FlgK ring from Campylobacter jejuni was used to infer the structure of the entire hook-filament junction. Evolutionary coupled residues (purple) show the interface for FlgL ring insertion into the FlgK ring. By aligning our docked model to the C. jejuni ring, we show that an 11-mer ring of FlgL fits inside the FlgK structure.
Fig. 5
Fig. 5. Atomic resolution model of the Tol-Pal complex.
A Schematic of the proteins involved in the Tol-Pal complex (TolABR, CpoB, and Pal). Interactions with previously solved interfaces are shown in orange and interactions inferred by our method are shown in purple. B Complete model of the Tol-Pal complex inferred by aligning results of docked pairwise models. Note that CpoB is inferred to be a trimer in vivo but was docked as a monomer for modeling purposes CF Residue resolution of TolB-Pal, TolB-TolA, TolR-TolB, and CpoB-TolB interfaces. The top 5 inferred interface contacts are shown in purple. Dashed lines indicate inferred contacts where one or more residues are missing from the solved structure. Structures used are 1TOL_A (TolA) [https://www.rcsb.org/structure/1TOL], 2HQS_A (TolB) [https://www.rcsb.org/structure/2HQS], 5BY4_A (TolR) [https://www.rcsb.org/structure/5BY4], 2HQS_H (Pal) [https://www.rcsb.org/structure/2HQS], and 2XDJ_A (CpoB) [https://www.rcsb.org/structure/2XDJ].
Fig. 6
Fig. 6. Predicted protein interactions in eukaryotic proteomes.
A Distribution of eukaryotic sequences in concatenated sequence alignments. Shown in orange are alignments that passed our sequence diversity threshold, and in gray are those that did not. B Number of correctly predicted inter-protein ECs for eukaryotic-exclusive complexes above the sequence diversity threshold. Eukaryotic-exclusive complexes are defined as complexes whose concatenated sequence alignment is at least 90% eukaryotic sequences. Inter protein ECs are defined as correct if their minimum atom distance is <8 Å. C The human spliceosome proteins Prp38 and MFAP1 have a known interface correctly predicted by our method, and have a protein interaction detected. The inter-protein ECs above the 80% precision threshold used throughout the paper are shown in purple. Known protein structure (PDB ID: 5F5U, [https://www.rcsb.org/structure/5F5U]) is used to visualize the subunits. D Schematic of predicted interaction between Lsm5 and Prp38 with top 5 inter-protein ECs.

References

    1. Kühlbrandt W. The resolution revolution. Science. 2014;343:1443–1444. doi: 10.1126/science.1251652. - DOI - PubMed
    1. Babu M, et al. Global landscape of cell envelope protein complexes in Escherichia coli. Nat. Biotechnol. 2018;36:103. doi: 10.1038/nbt.4024. - DOI - PMC - PubMed
    1. Hu P, et al. Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol. 2009;7:e96. doi: 10.1371/journal.pbio.1000096. - DOI - PMC - PubMed
    1. Rajagopala SV, et al. The binary protein-protein interaction landscape of Escherichia coli. Nat. Biotechnol. 2014;32:285–290. doi: 10.1038/nbt.2831. - DOI - PMC - PubMed
    1. Rual J-F, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005;437:1173–1178. doi: 10.1038/nature04209. - DOI - PubMed

Publication types

LinkOut - more resources