Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul;51(7):1170-1176.
doi: 10.1038/s41588-019-0432-9. Epub 2019 Jun 17.

Inferring protein 3D structure from deep mutation scans

Affiliations

Inferring protein 3D structure from deep mutation scans

Nathan J Rollins et al. Nat Genet. 2019 Jul.

Abstract

We describe an experimental method of three-dimensional (3D) structure determination that exploits the increasing ease of high-throughput mutational scans. Inspired by the success of using natural, evolutionary sequence covariation to compute protein and RNA folds, we explored whether 'laboratory', synthetic sequence variation might also yield 3D structures. We analyzed five large-scale mutational scans and discovered that the pairs of residues with the largest positive epistasis in the experiments are sufficient to determine the 3D fold. We show that the strongest epistatic pairings from genetic screens of three proteins, a ribozyme and a protein interaction reveal 3D contacts within and between macromolecules. Using these experimental epistatic pairs, we compute ab initio folds for a GB1 domain (within 1.8 Å of the crystal structure) and a WW domain (2.1 Å). We propose strategies that reduce the number of mutants needed for contact prediction, suggesting that genomics-based techniques can efficiently predict 3D structure.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Genetic experiments can be used to discover epistatic interactions and solve 3D fold.
Mutant genes can be assayed (top) to reveal functional and structural interactions (middle). It is possible to create and test libraries sufficient enough to determine 3D structure (bottom).
Fig. 2.
Fig. 2.. Experimental epistasis pairs reveal structural contacts in GB1 protein.
a. Maximum value of positive epistasis for each possible pair of residues in the GB1 domain, analyzed from Olson et al. (42) (Supplementary Tables 1 and 2). The most positive epistatic pairs (dark blue) suggest tertiary contacts. b. The 38 top positive epistatic pairs (black) include 28 (L/2) long-range (|i-j| > 5) pairs and 10 local pairs (L: length of protein, Supplementary Table 2). These pairs are used to fold the protein and to determine the topological arrangement of secondary structure elements (orange arrows). Epistatic pairs (black) are overlaid on the true contacting pairs in the NMR structure 2gb1 (48) (minimum heavy atom distance between two residues; dark blue, 5 Å cutoff; light blue, 8 Å cutoff). c. Secondary structure, from top to bottom: observed secondary structure from 2gb1 (48); β strand scores from epistasis values; α helical scores (Methods and Supplementary Table 4); and average per position and full single experimental mutation effect matrices showing concordance with local epistasis scores. d. Epistasis pairs (black) plotted on the 3D structure 2gb1.
Fig. 3.
Fig. 3.. Experimental epistasis pairs reveal contacts in WW domain, RRM domain, and Twister Ribozyme.
a. WW. The top 22 positive epistatic pairs (black) include 18 (L/2) long-range pairs and are close in 3D (dark blue, 5 Å cutoff; light blue, 8 Å cutoff) in the WW domain of human Yap1, analyzed from Araya et al. (18) (Supplementary Tables 1 and 2). b. RRM. Residue pairs that display strong positive epistasis in the second RRM domain in yeast Pab1, analyzed from Melamed et al. (13) (Supplementary Tables 1 and 2). The experiment measured effects of all pairs only within blocks of 25 residues in linear sequence (Fragments 1, 2, and 3) and not between them. Therefore, experimental epistasis data exist only for the shaded square regions on the contact map. The top 38 (L/2) positive epistatic pairs (black) are close in the observed 3D structure 1cvj (65) (dark blue, 5 Å cutoff; light blue, 8 Å cutoff). c. Twister. Left: Contact map showing the 35 nucleotide pairs with the strongest positive epistasis (black), including 24 (L/2) long-range pairs (|i-j| > 5), compared to true contacts from the crystal structure 4oji (70) (dark blue, 5 Å cutoff; light blue, 8 Å cutoff). Strongly epistatic pairs are measured at the pseudoknot contacts (gray circles), and multiple nucleotides – both proximal and non-proximal in 4oji – share strong epistasis with the cleaved nucleotide A7 (dashed gray circle). Right: Two nonstandard pairs (in green: 46A and 28A, left insert; 25C and 7A, right insert) are high-scoring epistatic pairs when compared to 4oji (RNA structure, blue; magnesium ions, yellow).
Fig. 4.
Fig. 4.. Predicted 3D structures from experimental epistasis scores alone.
a. GB1 (gray) generated from positive epistatic pairs, compared to the NMR structure 2gb1 (48) (blue). The predicted structure is within 1.8 Å C-α rmsd of the known structure over 49/56 residues. b. WW domain generated from positive epistatic pairs, compared to the NMR structure 1jmq (61) (blue). Models and structures are represented with secondary structure cartoons (left) and backbone ribbons (right).
Fig. 5.
Fig. 5.. Only a small fraction of all double mutants is needed to determine 3D fold.
a. The precision of L/2 long-range epistatic pairs in contact (minimum heavy atom distance within 5 Å) is plotted for various fractional samples (n = 1,000 each) of the full double mutant library, sampled according to three strategies: completely unguided mutations (gray), pairs of one or more deleterious single mutations (pink), and pairs of two deleterious mutations (red). Precision comparable to that of the full GB1 double mutant dataset (dashed line) is consistently achieved using just 50%, 25%, and 5% as many mutants, for each respective strategy. Central lines in all box-and-whiskers plots correspond to the median, box boundaries represent the first and third quartiles, and whiskers show the range excluding suspected outliers (> Quartile 3 + 1.5 × interquartile range or < Quartile 1 – 1.5 × interquartile range). b. For each of these experimental strategies and library sizes, we folded from the epistatic pairs computed from 10 different random samples and here plot the C-α rmsd of the final predictions. Notably, the third strategy consistently achieved folds more accurate than that of the full dataset (dashed line). Box-and-whisker plots are defined as above. c. 3D ensembles of the final folding results for each 5% subsample versus 2gb1 (48) (blue) illustrate how guided mutations can improve both the accuracy and consistency of models predicted from epistasis measured in small datasets.

References

    1. Hopf TA et al. Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing. Cell 149, 1607–1621, doi:10.1016/j.cell.2012.04.012 (2012). - DOI - PMC - PubMed
    1. Marks DS et al. Protein 3D structure computed from evolutionary sequence variation. PLOS ONE 6, e28766, doi:10.1371/journal.pone.0028766 (2011). - DOI - PMC - PubMed
    1. Hopf TA et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. Elife 3, doi:10.7554/eLife.03430 (2014). - DOI - PMC - PubMed
    1. Weinreb C. et al. 3D RNA and Functional Interactions from Evolutionary Couplings. Cell 165, 963–975, doi:10.1016/j.cell.2016.03.030 (2016). - DOI - PMC - PubMed
    1. Toth-Petroczy A. et al. Structured States of Disordered Proteins from Genomic Sequences. Cell 167, 158–170.e112, doi:10.1016/j.cell.2016.09.010 (2016). - DOI - PMC - PubMed

Methods-only References

    1. Fowler DM et al. High-resolution mapping of protein sequence-function relationships. Nature methods 7, 741–746, doi:10.1038/nmeth.1492 (2010). - DOI - PMC - PubMed
    1. Buchan DW, Minneci F, Nugent TC, Bryson K. & Jones DT Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res 41, W349–357, doi:10.1093/nar/gkt381 (2013). - DOI - PMC - PubMed
    1. Jones DT Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology 292, 195–202, doi:10.1006/jmbi.1999.3091 (1999). - DOI - PubMed
    1. van Zundert GCP et al. The HADDOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes. Journal of molecular biology 428, 720–725, doi:10.1016/j.jmb.2015.09.014 (2016). - DOI - PubMed
    1. Bonneau RT,J; Chivian D; Rohl C; Strauss C; Baker D. Rosetta in CASP4: progress in ab initio protein structure prediction. PROTEINS: Structure, Function, and Genetics 5, 119–126 (2011). - PubMed

Publication types

MeSH terms