. 2009 Aug 18;106(33):13785-90.

doi: 10.1073/pnas.0906801106. Epub 2009 Aug 3.

Proteome-wide prediction of acetylation substrates

Amrita Basu¹, Kristie L Rose, Junmei Zhang, Ronald C Beavis, Beatrix Ueberheide, Benjamin A Garcia, Brian Chait, Yingming Zhao, Donald F Hunt, Eran Segal, C David Allis, Sandra B Hake

Affiliations

PMID: 19666589
PMCID: PMC2728972
DOI: 10.1073/pnas.0906801106

Proteome-wide prediction of acetylation substrates

Amrita Basu et al. Proc Natl Acad Sci U S A. 2009.

. 2009 Aug 18;106(33):13785-90.

doi: 10.1073/pnas.0906801106. Epub 2009 Aug 3.

Authors

Amrita Basu¹, Kristie L Rose, Junmei Zhang, Ronald C Beavis, Beatrix Ueberheide, Benjamin A Garcia, Brian Chait, Yingming Zhao, Donald F Hunt, Eran Segal, C David Allis, Sandra B Hake

Affiliation

¹ Laboratory of Chromatin Biology, Rockefeller University, New York, NY 10065, USA.

PMID: 19666589
PMCID: PMC2728972
DOI: 10.1073/pnas.0906801106

Abstract

Acetylation is a well-studied posttranslational modification that has been associated with a broad spectrum of biological processes, notably gene regulation. Many studies have contributed to our knowledge of the enzymology underlying acetylation, including efforts to understand the molecular mechanism of substrate recognition by several acetyltransferases, but traditional experiments to determine intrinsic features of substrate site specificity have proven challenging. Here, we combine experimental methods with clustering analysis of protein sequences to predict protein acetylation based on the sequence characteristics of acetylated lysines within histones with our unique prediction tool PredMod. We define a local amino acid sequence composition that represents potential acetylation sites by implementing a clustering analysis of histone and nonhistone sequences. We show that this sequence composition has predictive power on 2 independent experimental datasets of acetylation marks. Finally, we detect acetylation for selected putative substrates using mass spectrometry, and report several nonhistone acetylated substrates in budding yeast. Our approach, combined with more traditional experimental methods, may be useful for identifying acetylated substrates proteome-wide.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
Schematic of the overall computational and experimental approach. (A) Human core histone proteins (H2A: orange; H2B: red; H3: blue; H4: green) containing 56 lysines (black) were taken as input data for computational training. (B) A sliding window of amino acids (black bars) flanking the input lysine (at position 0) is used to train the model. Not all window lengths are shown. Weights (calculated as inversely proportional to distance [d]) are applied to amino acids based on the distance from the input lysine to the amino acid in positions −12 to +12. (C) BLAST sequence alignments are performed between all 56 lysines and surrounding sequences, and the highest scoring alignment is selected to begin the clustering analysis. Shown are sequences H4K5 and H3K36 (boxed in red) spanning positions −6 to +6 and their highest scoring match (denoted by a checkmark). Note that H4K5 and H2AK5 do not have 6 residues flanking the lysine N-terminally; scores are normalized based on length in these cases. (D) Lysines clustered together based on sequence alignment scores creating a fully predictive hierarchical tree (4 sequences are shown here; all 56 sequences are shown in Fig. 2). (E) Sequences are color coded according to published data on their modification state. Red: validated evidence of the lysine being acetylated; green: this lysine was not observed as being acetylated in literature. (F) After establishing PredMod, predictions were made on lysines in human core histones. The algorithm was then validated using a set of human acetylated proteins reported in literature, substrates detected using a pan-acetyl IP approach, and a yeast proteome-wide dataset. Finally, predictions were made on yeast nonhistone sites and validated in vivo.

**Fig. 2.**
Computational prediction of human histone acetylation sites. Predictive tree of all 56 lysines from human core histone sequences using hierarchical clustering (see *SI Text* for details). Histone lysines (in red or green) are color coded according to published data on their modification state as described in Fig. 1E. For each pair of sequences under a single node, amino acids are colored in light purple (identical residues) or dark blue (in accordance with the BLOSUM matrix) (25). Underlined red lysines represent the residue that was used for training the algorithm. Dashed red vertical line represents the selected threshold used to make predictions. Gray boxes represent a zoomed-in view of lysines that cluster together. An R next to the lysine indicates that a C- to N-terminal arrangement was used in the alignment.

**Fig. 3.**
Prediction performance on human nonhistone substrates. ROC curve for human pan-acetyl IP substrate test set (A) and literature-validated human acetylated proteins (B). The y axis represents the true positive rate, and the x axis the false positive rate. Win = (x,y) denotes the length of residues spanning the lysine; x: number of residues N-terminal to the lysine; y: number of residues C-terminal to the lysine. Diagonal line represents a random prediction.

**Fig. 4.**
Frequency distribution of amino acids surrounding lysines in human histone and nonhistone proteins. Frequency of amino acids (y axis) spanning positions −6 to +6 (x axis) in validated acetylated lysines in histone proteins (23 lysines) (A), validated acetylated lysines within proteins in literature (73 lysines) (B), validated lysines in the pan-acetyl IP substrates (51 lysines) (C), not observed as acetylated lysines in histones (33 lysines) (D), and not observed as acetylated lysines in proteins as reported in literature and not observed as acetylated lysines in pan-acetyl IP substrates (3,493 lysines) (E). Residues in green: basic; red: hydrophobic; pink: small; blue: S/T; black: all other residues. Underlined red K: lysine that has been validated experimentally as acetylated; underlined green K: lysine that has not been experimentally observed as acetylated. X denotes that no amino acid was present in that position. Tick marks represent residues described in text.

**Fig. 5.**
Novel predictions and in vivo validation of S. cerevisiae nonhistone proteins. (A) Coomassie-stained gel of TAP-tag pull-down purification of yeast proteins Eaf7, Sir3, and Spt6. Asterisk denotes bands that were isolated and inspected for acetylation by MS. (B) Sequence alignment of candidate proteins with identical or similar histone regions. Light purple amino acid pairs represent identical residues, and dark blue pairs represent residues that can be evolutionarily substitutable in accordance with the BLOSUM matrix. Correctly predicted lysines are indicated in light blue. Red lysines are acetylated histone residues.

See this image and copyright information in PMC

References

1. Allfrey VG, Faulkner R, Mirsky AE. Acetylation and methylation of histones and their possible role in the regulation of RNA synthesis. Proc Natl Acad Sci USA. 1964;51:786–794. - PMC - PubMed
1. Verreault A, Kaufman PD, Kobayashi R, Stillman B. Nucleosomal DNA regulates the core-histone-binding subunit of the human Hat1 acetyltransferase. Curr Biol. 1998;8(2):96–108. - PubMed
1. Taverna SD, Li H, Ruthenburg AJ, Allis CD, Patel DJ. How chromatin-binding modules interpret histone modifications: Lessons from professional pocket pickers. Nat Struct Mol Biol. 2007;14(11):1025–1040. - PMC - PubMed
1. Grant PA. A tale of histone modifications. Genome Biol. 2001;2(4):REVIEWS0003. - PMC - PubMed
1. Strahl BD, Allis CD. The language of covalent histone modifications. Nature. 2000;403(6765):41–45. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

R01 GM037537/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Proteome-wide prediction of acetylation substrates

Affiliation

Proteome-wide prediction of acetylation substrates

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases