Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 25:1:652286.
doi: 10.3389/fbinf.2021.652286. eCollection 2021.

Computational Identification of Functional Centers in Complex Proteins: A Step-by-Step Guide With Examples

Affiliations

Computational Identification of Functional Centers in Complex Proteins: A Step-by-Step Guide With Examples

Wei Zhou et al. Front Bioinform. .

Abstract

In proteins, functional centers consist of the key amino acids required to perform molecular functions such as catalysis, ligand-binding, hormone- and gas-sensing. These centers are often embedded within complex multi-domain proteins and can perform important cellular signaling functions that enable fine-tuning of temporal and spatial regulation of signaling molecules and networks. To discover hidden functional centers, we have developed a protocol that consists of the following sequential steps. The first is the assembly of a search motif based on the key amino acids in the functional center followed by querying proteomes of interest with the assembled motif. The second consists of a structural assessment of proteins that harbor the motif. This approach, that relies on the application of computational tools for the analysis of data in public repositories and the biological interpretation of the search results, has to-date uncovered several novel functional centers in complex proteins. Here, we use recent examples to describe a step-by-step guide that details the workflow of this approach and supplement with notes, recommendations and cautions to make this protocol robust and widely applicable for the discovery of hidden functional centers.

Keywords: H-NOX; abscisic acid receptor; adenylyl cyclase; functional centers; guanylate cyclase; hidden domains; moonlighting proteins; nitric oxide sensors.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
A general workflow for the computational identification of functional centers in complex proteins. The approach begins with an alignment of functional centers from proteins across species and followed by the construction of a consensus sequence that includes only conserved key amino acids which are separated by gaps as determined from the alignment. The consensus sequence now serves as a search motif to be queried on various databases to identify candidate functional centers and after which, they are screened on webtools if available, to assign confidence levels to the retrieved candidates. In the final step, top candidates are subjected to structural assessments that include model generations and docking simulations.
Figure 2
Figure 2
Domain architecture of proteins containing nucleotide cyclase functional centers. The domain organizations of experimentally validated GCs and ACs identified using a motif-based approach from Arabidopsis thaliana (At), Brachypodium distachyon (Bd), Solanum lycopersicum (Sl), Hippeastrum (Hp), Pharbitis nil (Pn), and Homo sapiens (Hs), are illustrated as 2-dimensional bars and aligned at their corresponding GC/AC domains. Protein UniProt IDs are as follows: AtLRR (Q9LRR5), AtNCED (Q9LRR7), AtDGK4 (Q1PDI2), AtKUP5 (Q8LPL8), AtKUP7 (Q9FY75), AtClAP (Q9C9X5), AtPPR (Q9SXD8), HpAC1 (E1AQY1), BdTTM3 (I1I2P2), AtBRI1 (O22476) (see Supplementary Figure 2 for an illustration of AtBRI1 protein topology), AtPSKR1 (Q9ZVR7), AtPepR1 (Q9SSL9), AtWAKL10 (Q8VYA3), SlGC17 (A0A3Q7FS62), SlGC18 (A0A3Q7FY08), HsIRAK3 (Q9Y616), HpPepR1 (A0A1U9X9S6), HpGC1 (D9MWM6), PnGC1 (Q0PY32), AtGC1 (Q8L870), AtPNPR1 (F4HR92), and AtNOGC1 (Q9SXD9). *AC center undetermined.
Figure 3
Figure 3
Representative structures of nucleotide cyclase and phosphodiesterase functional centers. (A) The typical nucleotide cyclase center identified through a 14-amino acid long search motif as exemplified by an adenylyl cyclase (AC) in an Arabidopsis potassium channel AtKUP5 (Al-Younis et al., 2018), assumes an alpha-helical secondary fold that is followed by a loop. At the tertiary level, the AC center typically forms a clear cavity that could dock with the substrate ATP in a binding pose where the adenine points into the cavity toward the amino acid at the first position of the motif, and the phosphate points outwards toward the positively charged [KR] amino acid at the solvent exposed region of the cavity. Negatively and positively charged amino acids that are crucial for the interactions with the substrate are colored red and blue, respectively. (B) The putative phosphodiesterase center in an Arabidopsis potassium channel AtKUP5 identified through a 27-47 amino acid long search motif, assumes an alpha-helical secondary fold that is followed by a loop which forms the latch region enclosing the docked cAMP substrate within a distinct cavity (Kwiatkowski et al., 2021). Negatively and positively charged amino acids crucial for the interactions with the substrate are colored red and blue, respectively.
Figure 4
Figure 4
Sequence alignment and domain architecture of proteins containing H-NOX centers, and a representative structure of the H-NOX center. (A) Alignment of the heme-binding centers of H-NOX proteins from organisms across species. Tt, Thermoanaerobacter tengcongensis (UniProt ID: Q8RBX6); Sw, Shewanella woodyi (UniProt ID: B1KIH6); Pa, Pseudoalteromonas atlantica (UniProt ID: Q15VN4); Np, Nostoc punctiforme (UniProt ID: B2IZ76); Lp, Legionella pneumophila (UniProt ID: Q5WTZ5); Vf, Vibrio fischeri (UniProt ID: Q5E1F5); Ce, Caenorhabditis elegans (UniProt ID: Q86C56); Dm, Drosophila melanogaster (UniProt ID: Q24086); Rn, Rattus norvegicus (UniProt ID: P20595); Hs, Homo sapiens (UniProt ID: Q02153) and At, Arabidopsis thaliana. Bolded letters are conserved amino acids that are also experimentally shown to be crucial for heme-binding and stabilization. (B) The domain organizations of experimentally validated H-NOX centers identified using a motif-based approach from Arabidopsis thaliana (At), are illustrated as 2-dimensional bars, and aligned at their corresponding H-NOX centers (see Supplementary Figure 1 for full alignment of H-NOX domains). Protein UniProt IDs are as follows: AtLRB3 (O04615), AtDGK4 (Q1PDI2), and AtNOGC1 (Q9SXD9). (C) A representative structure of a protein containing the H-NOX center. The H-NOX center in an Arabidopsis BTB/POZ domain-containing protein AtLRB (Zarban et al., 2019), identified through a 33–35 amino acid long search motif, assumes a long loop that wraps the docked heme-Fe moiety within a clearly defined pocket. The “H” residue in the motif is the distal ligand that binds to the iron and the YxSxR signature stabilizes the heme through hydrogen bonding. Negatively and positively charged amino acids crucial for the interactions with the substrate are colored red and blue, respectively.
Figure 5
Figure 5
Domain architecture of proteins containing ABA-interacting centers and a representative structure of the ABA center. (A) The domain organizations of ABA-interacting centers identified using a motif-based approach from Arabidopsis thaliana (At), are illustrated as 2-dimensional bars and aligned at their corresponding H-NOX centers. Protein UniProt IDs are as follows: AtGORK (Q94A76), AtPYL8 (Q9FGM1), AtPYL10 (Q8H1R0), AtSKOR (Q9M8S6), AtRSH2 (Q9LVJ3) and AtRSH3 (Q9SYH1). AtGORK, AtPYL8 and AtPYL10 have been confirmed experimentally to be ABA receptors while AtSKOR, AtRSH2 and AtRSH3 harbor the ABA center motif and are known to response to ABA. (B) The ABA-interacting center in an Arabidopsis potassium transporter AtGORK (Ooi et al., 2017) identified through a 26-28 amino acid long search motif occupies a clear cavity that could dock with ABA with the “Y” and “K” residues being crucial for maintaining ABA affinity. Negatively and positively charged amino acids crucial for the interactions with the substrate are colored red and blue, respectively.

Similar articles

Cited by

References

    1. Al-Younis I., Wong A., Lemtiri-Chlieh F., Schmöckel S., Tester M., Gehring C., et al. . (2018). The Arabidopsis thaliana K+-Uptake Permease 5 (AtKUP5) contains a functional cytosolic adenylate cyclase essential for K+ transport. Front. Plant Sci. 9:1645. 10.3389/fpls.2018.01645 - DOI - PMC - PubMed
    1. Angkawijaya A. E., Nguyen V. C., Gunawan F., Nakamura Y. (2020). A pair of Arabidopsis diacylglycerol kinases essential for gametogenesis and endoplasmic reticulum phospholipid metabolism in leaves and flowers. Plant Cell 32, 2602–2620. 10.1105/tpc.20.00251 - DOI - PMC - PubMed
    1. Bianchet C., Wong A., Quaglia M., Alqurashi M., Gehring C., Ntoukakis V., et al. . (2019). An Arabidopsis thaliana leucine-rich repeat protein harbors an adenylyl cyclase catalytic center and affects responses to pathogens. J. Plant Physiol. 232, 12–22. 10.1016/j.jplph.2018.10.025 - DOI - PubMed
    1. Bonetta R., Valentino G. (2020). Machine learning techniques for protein function prediction. Proteins 88, 397–413. 10.1002/prot.25832 - DOI - PubMed
    1. Bowler C., Neuhaus G., Yamagata H., Chua N. H. (1994). Cyclic GMP and calcium mediate phytochrome phototransduction. Cell 77, 73–81. 10.1016/0092-8674(94)90236-4 - DOI - PubMed

LinkOut - more resources