Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Aug 18;106(33):13785-90.
doi: 10.1073/pnas.0906801106. Epub 2009 Aug 3.

Proteome-wide prediction of acetylation substrates

Affiliations

Proteome-wide prediction of acetylation substrates

Amrita Basu et al. Proc Natl Acad Sci U S A. .

Abstract

Acetylation is a well-studied posttranslational modification that has been associated with a broad spectrum of biological processes, notably gene regulation. Many studies have contributed to our knowledge of the enzymology underlying acetylation, including efforts to understand the molecular mechanism of substrate recognition by several acetyltransferases, but traditional experiments to determine intrinsic features of substrate site specificity have proven challenging. Here, we combine experimental methods with clustering analysis of protein sequences to predict protein acetylation based on the sequence characteristics of acetylated lysines within histones with our unique prediction tool PredMod. We define a local amino acid sequence composition that represents potential acetylation sites by implementing a clustering analysis of histone and nonhistone sequences. We show that this sequence composition has predictive power on 2 independent experimental datasets of acetylation marks. Finally, we detect acetylation for selected putative substrates using mass spectrometry, and report several nonhistone acetylated substrates in budding yeast. Our approach, combined with more traditional experimental methods, may be useful for identifying acetylated substrates proteome-wide.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Schematic of the overall computational and experimental approach. (A) Human core histone proteins (H2A: orange; H2B: red; H3: blue; H4: green) containing 56 lysines (black) were taken as input data for computational training. (B) A sliding window of amino acids (black bars) flanking the input lysine (at position 0) is used to train the model. Not all window lengths are shown. Weights (calculated as inversely proportional to distance [d]) are applied to amino acids based on the distance from the input lysine to the amino acid in positions −12 to +12. (C) BLAST sequence alignments are performed between all 56 lysines and surrounding sequences, and the highest scoring alignment is selected to begin the clustering analysis. Shown are sequences H4K5 and H3K36 (boxed in red) spanning positions −6 to +6 and their highest scoring match (denoted by a checkmark). Note that H4K5 and H2AK5 do not have 6 residues flanking the lysine N-terminally; scores are normalized based on length in these cases. (D) Lysines clustered together based on sequence alignment scores creating a fully predictive hierarchical tree (4 sequences are shown here; all 56 sequences are shown in Fig. 2). (E) Sequences are color coded according to published data on their modification state. Red: validated evidence of the lysine being acetylated; green: this lysine was not observed as being acetylated in literature. (F) After establishing PredMod, predictions were made on lysines in human core histones. The algorithm was then validated using a set of human acetylated proteins reported in literature, substrates detected using a pan-acetyl IP approach, and a yeast proteome-wide dataset. Finally, predictions were made on yeast nonhistone sites and validated in vivo.
Fig. 2.
Fig. 2.
Computational prediction of human histone acetylation sites. Predictive tree of all 56 lysines from human core histone sequences using hierarchical clustering (see SI Text for details). Histone lysines (in red or green) are color coded according to published data on their modification state as described in Fig. 1E. For each pair of sequences under a single node, amino acids are colored in light purple (identical residues) or dark blue (in accordance with the BLOSUM matrix) (25). Underlined red lysines represent the residue that was used for training the algorithm. Dashed red vertical line represents the selected threshold used to make predictions. Gray boxes represent a zoomed-in view of lysines that cluster together. An R next to the lysine indicates that a C- to N-terminal arrangement was used in the alignment.
Fig. 3.
Fig. 3.
Prediction performance on human nonhistone substrates. ROC curve for human pan-acetyl IP substrate test set (A) and literature-validated human acetylated proteins (B). The y axis represents the true positive rate, and the x axis the false positive rate. Win = (x,y) denotes the length of residues spanning the lysine; x: number of residues N-terminal to the lysine; y: number of residues C-terminal to the lysine. Diagonal line represents a random prediction.
Fig. 4.
Fig. 4.
Frequency distribution of amino acids surrounding lysines in human histone and nonhistone proteins. Frequency of amino acids (y axis) spanning positions −6 to +6 (x axis) in validated acetylated lysines in histone proteins (23 lysines) (A), validated acetylated lysines within proteins in literature (73 lysines) (B), validated lysines in the pan-acetyl IP substrates (51 lysines) (C), not observed as acetylated lysines in histones (33 lysines) (D), and not observed as acetylated lysines in proteins as reported in literature and not observed as acetylated lysines in pan-acetyl IP substrates (3,493 lysines) (E). Residues in green: basic; red: hydrophobic; pink: small; blue: S/T; black: all other residues. Underlined red K: lysine that has been validated experimentally as acetylated; underlined green K: lysine that has not been experimentally observed as acetylated. X denotes that no amino acid was present in that position. Tick marks represent residues described in text.
Fig. 5.
Fig. 5.
Novel predictions and in vivo validation of S. cerevisiae nonhistone proteins. (A) Coomassie-stained gel of TAP-tag pull-down purification of yeast proteins Eaf7, Sir3, and Spt6. Asterisk denotes bands that were isolated and inspected for acetylation by MS. (B) Sequence alignment of candidate proteins with identical or similar histone regions. Light purple amino acid pairs represent identical residues, and dark blue pairs represent residues that can be evolutionarily substitutable in accordance with the BLOSUM matrix. Correctly predicted lysines are indicated in light blue. Red lysines are acetylated histone residues.

Similar articles

Cited by

References

    1. Allfrey VG, Faulkner R, Mirsky AE. Acetylation and methylation of histones and their possible role in the regulation of RNA synthesis. Proc Natl Acad Sci USA. 1964;51:786–794. - PMC - PubMed
    1. Verreault A, Kaufman PD, Kobayashi R, Stillman B. Nucleosomal DNA regulates the core-histone-binding subunit of the human Hat1 acetyltransferase. Curr Biol. 1998;8(2):96–108. - PubMed
    1. Taverna SD, Li H, Ruthenburg AJ, Allis CD, Patel DJ. How chromatin-binding modules interpret histone modifications: Lessons from professional pocket pickers. Nat Struct Mol Biol. 2007;14(11):1025–1040. - PMC - PubMed
    1. Grant PA. A tale of histone modifications. Genome Biol. 2001;2(4):REVIEWS0003. - PMC - PubMed
    1. Strahl BD, Allis CD. The language of covalent histone modifications. Nature. 2000;403(6765):41–45. - PubMed

Publication types

LinkOut - more resources