Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2007 Jul;3(7):e114.
doi: 10.1371/journal.pcbi.0030114.

Introduction to computational proteomics

Affiliations
Review

Introduction to computational proteomics

Jacques Colinge et al. PLoS Comput Biol. 2007 Jul.
No abstract available

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Steps in Sample Analysis by Proteomics
(A) Sample complexity reduction via an LC column. This is applicable to both proteins and peptides. It is possible to collect fractions at fixed or variable time intervals to obtain a series of less complex samples; however, direct MS analysis is also an option. The figure illustrates how peptides/proteins 1–11 are fractionated. (B) Major steps in “bottom-up” proteomics and combinations thereof. Optional steps and essential steps are in rounded and bold rectangles, respectively. Green represents shotgun peptide sequencing entire sample digestion followed by multidimensional LC separation of peptides. Blue represents the classical gel approach, with or without (dashed arrows) peptide LC. Red combines protein and peptide LC. (C) Data-dependent MS/MS analysis. Here, ESI of a liquid sample and alternation of the instrument between MS and MS/MS modes is illustrated. The data generated is a sequence of peptide experimental m/z associated with the corresponding fragments m/z. The complete analysis is named an LC-MS run.
Figure 2
Figure 2. Peptide Mass Fingerprinting Database Search Algorithm
Figure 3
Figure 3. Peak Detection
(A) Shown in this magnified region of a MALDI–PMF spectrum are the signals generated by peptides. The spectrum is acquired from a mixture of several peptides. Multiple copies of each peptide are present simultaneously. Multiple copies of a peptide (each detected with a small mass error) result in the essentially Gaussian shape of the peaks. Each copy comprises atoms containing different isotopes. Finally, one peptide yields several peaks with relative intensities that match the relative probabilities of the observed isotopes. The monoisotopic peak, i.e., the first peak, is relevant for mass computation. It is noteworthy to mention that the signal is noisy and the sampling limited. Shown in red is a model of a complete peptide signal fit to the experimental data. From the model location m, the mass can be directly deduced and detection of isotopes as additional peptide masses is avoided. The green line is an estimation of the local noise level. (B) Principle of the model.
Figure 4
Figure 4. Peptide Theoretical Mass Computation and Fragmentation
(A) As illustrated, the peptide atomic composition is dependent on the residue Ri and on fixed atoms (H2O). Therefore, once the peptide sequence is known, it is possible to sum the mass of each amino acid and add the mass of a water molecule to determine the theoretical mass of the peptide. If some amino acid residues are modified, mass shifts are added to the unmodified peptide mass. (B) Peptides fragment at specific locations named a,b,c,x,y,z. N-terminal fragments are termed ai,bi,ci, where i denotes the number of amino acids in the fragment. Similarly, the complementary C-terminal fragments are termed xn −1,yn −1,zn −1, n is the peptide length. (C) Example of fragment mass computation. (D) The same example as in (C) with phosphorylated threonine residues (+79.9663 Da). Note that all fragment ions including the ion with one or two threonine residues are shifted in mass once or twice, respectively.
Figure 5
Figure 5. Peptide Match
Match of an experimental spectrum with a peptide sequence. All theoretical fragment masses (within a given mass tolerance) observed in the experimental data are represented by a coloured disk. As is often the case, it is clear that not all fragment types are detected. Some neutral losses are not possible depending on the fragment amino acids (shown by a dot). Structural properties of the match are apparent, i.e., consecutive fragment ion matches (albeit with “holes”); and more intense b and y fragments (indicated by the colour, peak intensity relative order scale on the right with relative count of matched peaks).
Figure 6
Figure 6. MS/MS Database Search Algorithm
In this simplified MS/MS search algorithm, we assume that the peptide charge states are unknown and that all possible values (1–4 typically) need to be assessed. In practice, charge state determination is dependent on instrument mass resolution. Additionally, it is common that the charge is known for some, but not all, peptides.
Figure 7
Figure 7. Consecutive Fragment Matches
To detect sequences of consecutive fragment matches for a given type of fragment, it is possible to use a HMM. A sequence of symbols the length of the peptide is observed with alphabet letters {m,f}, m for match and f for failed match. The model topology is designed to accommodate for some missing matches: S1 represents a first uninformed match, whereas S2 and S3 represent matches with preceding matches.
Figure 8
Figure 8. Issues in Protein Identification
Complications in identifying proteins. Four proteins (A, B, C, D) are identified by four distinct peptides (black squares). Although A and B are different, it is impossible to ascertain which molecule is present, as both have been identified by the same (shared) peptides. A variation of this is shown in C. Protein D shares three peptides with A and B, and two with C, but also has a specific fourth peptide. From this information it can be concluded that D is in the sample.
Figure 9
Figure 9. Spectrum Graph
Spectrum graph of peptide MTDSK. The spectrum contains the b and y fragment ion masses plus two neutral losses and two peaks generated from noise. Only one amino acid's mass differences are accepted. Masses are complemented and interpreted as b fragments. Even in this oversimplified case, it is observed that many edges are created in addition to those that are necessary. In particular, part of the reverse sequence in the graph is observed. The graph complexity increases rapidly with real spectra and with two amino acid mass differences accepted; see also the two examples given in Figures S1 and S2.

Similar articles

Cited by

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. - PubMed
    1. Pandey A, Mann M. Proteomics to study genes and genomes. Nature. 2000;405:837–846. - PubMed
    1. Patton WF. Proteome analysis. II. Protein subcellular redistribution: Linking physiology to genomics via the proteome and separation technologies involved. J Chromatogr B Analyt Technol Biomed Life Sci. 1999;722:203–223. - PubMed
    1. Khatib-Shahidi S, Andersson M, Herman JL, Gillespie TA, Caprioli RM. Direct molecular analysis of whole-body animal tissue sections by imaging MALDI mass spectrometry. Anal Chem. 2006;78:6448–6456. - PubMed
    1. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006;440:631–636. - PubMed

Publication types

MeSH terms