Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Apr;17(4):606-13.
doi: 10.1110/ps.073347208.

The epitope space of the human proteome

Affiliations

The epitope space of the human proteome

Lisa Berglund et al. Protein Sci. 2008 Apr.

Abstract

In the post-genome era, there is a great need for protein-specific affinity reagents to explore the human proteome. Antibodies are suitable as reagents, but generation of antibodies with low cross-reactivity to other human proteins requires careful selection of antigens. Here we show the results from a proteome-wide effort to map linear epitopes based on uniqueness relative to the entire human proteome. The analysis was based on a sliding window sequence similarity search using short windows (8, 10, and 12 amino acid residues). A comparison of exact string matching (Hamming distance) and a heuristic method (BLAST) was performed, showing that the heuristic method combined with a grid strategy allows for whole proteome analysis with high accuracy and feasible run times. The analysis shows that it is possible to find unique antigens for a majority of the human proteins, with relatively strict rules involving low sequence identity of the possible linear epitopes. The implications for human antibody-based proteomics efforts are discussed.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Principle of the sliding window method. Here, a window size of 12 amino acids is used. The first 12 amino acids of the protein sequence (1) is compared to all proteins in the human Ensembl database (2), here by using the blastp program. The best hit (highest number of identical amino acids between the hit and the query sequence) is retrieved from the results of the sequence similarity search. In this example, the best hit has eight amino acids identical to the query sequence, or 67% sequence identity (3). The window is moved one amino acid toward the C terminus of the protein (4). Steps 1–4 are repeated until the full length of the protein has been covered.
Figure 2.
Figure 2.
Protein profile for the estrogen receptor protein (ESR) using the Hamming distance method with 10 amino acids window (A) and blastp with 10 amino acids window (B), 12 amino acids window (C), and eight amino acids window (D). The bars marked in red in A and B indicate positions where the results from the exact method (A) and the heuristic method (B) are deviating. The middle position of each window on the protein is given on the X-axis.
Figure 3.
Figure 3.
The linear epitopes of the human proteome space. For each window size, the longest consecutive amino acid stretch with all windows under a threshold value (e.g., no more than six out of eight amino acid residues identical to a protein from another gene), was determined for each of the 22,983 human genes in Ensembl. The maximum consecutive length found for the proteins encoded by each gene was selected as representative for that gene. The number of human genes (Y-axis) for each category of maximum consecutive length (X-axis) is presented for window sizes of A. Eight amino acids (threshold values 5, 6, and 7 identical amino acids). (B) Ten amino acids (threshold values 7, 8, and 9 identical amino acids). (C) Twelve amino acids (threshold values 7, 8, 9, 10, and 11 identical amino acids).
Figure 4.
Figure 4.
Examples of sequence profiles (12 amino acids window) for proteins with well-characterized, protein-specific antibodies used in clinical diagnostics. The window position given is the middle position for the window on the protein. The gray lines indicate where the protein-specific antibody binds to the protein. (A) Receptor tyrosine-protein kinase erbB-2 protein (ERBB2). The binding site of the antibody has been structurally determined (Cho et al. 2003). (B) Prostate-specific antigen protein (KLK3). The antibodies recognizing this epitope are known to have specific binding to free protein (Piironen et al. 1998).
Figure 5.
Figure 5.
Comparison of a global sequence profile (window size 50 amino acids) (A) and a local sequence profile (window size 12 amino acids) (B) for the leukocyte common antigen protein (PTPRC).

References

    1. Alix, A.J. Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine. 1999;18:311–314. - PubMed
    1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Anderson, N.L., Anderson, N.G. The human plasma proteome: History, character, and diagnostic prospects. Mol. Cell. Proteomics. 2002;1:845–867. - PubMed
    1. Andrade, J., Berglund, L., Uhlén, M., Odeberg, J. Using Grid technology for computationally intensive applied bioinformatics analyses. In Silico Biol. 2006;6:495–504. - PubMed

Publication types