Review

. 2022 Dec 16;50(6):1847-1858.

doi: 10.1042/BST20220849.

General strategies for using amino acid sequence data to guide biochemical investigation of protein function

Emily N Kennedy¹, Clay A Foster², Sarah A Barr¹, Robert B Bourret¹

Affiliations

¹ Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, U.S.A.
² Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, U.S.A.

PMID: 36416676
PMCID: PMC10257402
DOI: 10.1042/BST20220849

Review

General strategies for using amino acid sequence data to guide biochemical investigation of protein function

Emily N Kennedy et al. Biochem Soc Trans. 2022.

. 2022 Dec 16;50(6):1847-1858.

doi: 10.1042/BST20220849.

Authors

Emily N Kennedy¹, Clay A Foster², Sarah A Barr¹, Robert B Bourret¹

Affiliations

¹ Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, U.S.A.
² Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, U.S.A.

PMID: 36416676
PMCID: PMC10257402
DOI: 10.1042/BST20220849

Abstract

The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.

Keywords: SimpLogo; amino acid sequences; coevolution; covariation; protein domains.

PubMed Disclaimer

Figures

**Figure 1.. Receiver domain structure**
The five conserved residues that catalyze receiver domain autophosphorylation and autodephosphorylation reactions are shown in green. D is the site of phosphorylation. DD coordinate (orange dashed lines) the divalent metal ion, shown in yellow. K, T, and the metal ion each bind (red dashed lines) one of the phosphoryl group oxygen atoms. A stable, non-covalently bound BeF₃⁻ mimic of the PO₃²⁻ phosphoryl group is shown in cyan and yellow. Five variable residues known to affect reaction kinetics are shown in blue, with positions named in relation to the conserved residues. Black arrow indicates the required path of attack by phosphodonor or water molecule in line with P–O bond to be formed or broken respectively [100]. Based on 1FQW structure of *E. coli* CheY [101].

**Figure 2.. CheY autophosphorylation and autodephosphorylation rate constants as a function of substitution position**
Rate constants of *E. coli* CheY substitution mutants are plotted for autophosphorylation with phosphoramidate (k_phos/K_S) versus autodephosphorylation with water (k_dephos). Note the logarithmic scales on both axes. Red square is wild-type CheY, with NAEPF (single letter amino acid codes, N- to C-terminal) composition for the five variable residues in Figure 1. Intersection of dashed lines indicates rate constants supported by the most abundant (~11%) combinations of five variable residues (MAKPF, MARPF, shown in Panel B) in prokaryotic receiver domains spliced onto the CheY backbone. **(A)** Substitutions at T+1 (aqua triangles). **(B)** Substitutions at D+2 (black diamonds), T+2 (brown squares), or both (blue triangles). **(C)** Substitutions at K+1 (black circles), K+2 (blue diamonds), or both (green triangles). Data from [, –76].

**Figure 3.. Major architectures of proteins containing CheW-like domains.**
The *Class* (designated by numbers from 1 to 6) of CheW-like domains are shown as a function of *Architecture* and *Context* as defined in the text. Approximately 95% of CheW-like domains occur in 16 *Architectures* from three protein lineages [93]. Major *Architectures* are shown schematically from N to C-terminal. CheW proteins contain only CheW-like domains. Based on *Cluster* analysis described in the text, the single *Context* of CheW proteins consists of three *Types*, each of which ultimately belong to a different *Class*, as indicated by the asterisk. CheV proteins contain CheW-like and Receiver domains. CheA proteins are the most architecturally diverse. The basic CheA architecture of Hpt, Dimer, CA, and CheW-like domains is most commonly supplemented with up to 10 N-terminal Hpt domains or an additional CheW-like domain. Relationships between *Classes* 1 to 6 of CheW-like domains and architectural *Contexts* are shown [93]. The three CheA architectures shown also often feature a C-terminal Receiver domain, indicated in brackets. Domains and Pfam designations are CheW-like (PF01584), Receiver (Response_reg, PF00072), Hpt (histidine phosphotransfer, PF01627), Dimer (H-kinase_dim, PF02895), CA (catalytic & ATP binding, HATPase_c, PF02518).

See this image and copyright information in PMC

References

1. Andreeva A, Kulesha E, Gough J and Murzin AG (2020) The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res 48, D376–D382 10.1093/nar/gkz1064 - DOI - PMC - PubMed
1. Sillitoe I, Bordin N, Dawson N, Waman VP, Ashford P, Scholes HM, Pang CSM, Woodridge L, Rauer C, Sen N, Abbasian M, Le Cornu S, Lam SD, Berka K, Varekova IH, Svobodova R, Lees J and Orengo CA (2021) CATH: increased structural coverage of functional space. Nucleic Acids Res 49, D266–D273 10.1093/nar/gkaa1079 - DOI - PMC - PubMed
1. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar, Gustavo A, Sonnhammer ELL, Tosatto SCE, Paladin L, Raj S, Richardson LJ, Finn RD and Bateman A (2020) Pfam: The protein families database in 2021. Nucleic Acids Res 49, D412–D419 10.1093/nar/gkaa913 - DOI - PMC - PubMed
1. Letunic I, Khedkar S and Bork P (2021) SMART: recent updates, new developments and status in 2020. Nucleic Acids Res 49, D458–D460 10.1093/nar/gkaa937 - DOI - PMC - PubMed
1. Clifton BE, Kozome D and Laurino P (2022) Efficient exploration of sequence space by sequence-guided protein engineering and design. Biochemistry 10.1021/acs.biochem.1c00757 - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

R01 GM050860/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

General strategies for using amino acid sequence data to guide biochemical investigation of protein function

Affiliations

General strategies for using amino acid sequence data to guide biochemical investigation of protein function

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources