Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 20;86(10):4758-66.
doi: 10.1021/ac4037679. Epub 2014 May 1.

Proteomic identification of monoclonal antibodies from serum

Affiliations

Proteomic identification of monoclonal antibodies from serum

Daniel R Boutz et al. Anal Chem. .

Abstract

Characterizing the in vivo dynamics of the polyclonal antibody repertoire in serum, such as that which might arise in response to stimulation with an antigen, is difficult due to the presence of many highly similar immunoglobulin proteins, each specified by distinct B lymphocytes. These challenges have precluded the use of conventional mass spectrometry for antibody identification based on peptide mass spectral matches to a genomic reference database. Recently, progress has been made using bottom-up analysis of serum antibodies by nanoflow liquid chromatography/high-resolution tandem mass spectrometry combined with a sample-specific antibody sequence database generated by high-throughput sequencing of individual B cell immunoglobulin variable domains (V genes). Here, we describe how intrinsic features of antibody primary structure, most notably the interspersed segments of variable and conserved amino acid sequences, generate recurring patterns in the corresponding peptide mass spectra of V gene peptides, greatly complicating the assignment of correct sequences to mass spectral data. We show that the standard method of decoy-based error modeling fails to account for the error introduced by these highly similar sequences, leading to a significant underestimation of the false discovery rate. Because of these effects, antibody-derived peptide mass spectra require increased stringency in their interpretation. The use of filters based on the mean precursor ion mass accuracy of peptide-spectrum matches is shown to be particularly effective in distinguishing between "true" and "false" identifications. These findings highlight important caveats associated with the use of standard database search and error-modeling methods with nonstandard data sets and custom sequence databases.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A schematic of the structure and representative sequences of the immunoglobulin (Ig) heavy chain variable domain (VH). The VH sequence is created by recombination of V, D, and J subgenes and encodes epitope binding sites for antigen-recognition. Complementarity determining regions (CDRs) represent uniquely nondegenerate fingerprints, interspersed between constant framework sequences (FRs), and manifest as hypervariable and conserved sequences, respectively, in the multiple sequence alignment. Antigen binding specificity is primarily dictated by the CDR-H3 region. Hence, the challenge of antibody repertoire proteomics can be largely reduced to the problem of successfully identifying CDR-H3-containing peptides.
Figure 2
Figure 2
In contrast to the proteome in general, antibody peptide sequences resemble each other in both mass and expected fragmentation patterns. The peptide sequence search space is thus strongly dependent on mass accuracy, as seen by plotting the extent of theoretical peptide-spectral match ambiguity, for (A) human proteome peptide sequences, (B) rabbit CCH antibody VH peptides, and (C) human tetanus toxoid antibody VH peptides. Reducing precursor mass tolerance thus more strongly affects the potential for false identifications in VH peptides than for a typical proteome. Here, an in silico digest of the rabbit CCH VH antibody sequences generated 505 790 unique peptide sequences (constrained to fully tryptic peptides of ≥8 amino acids, ≤6000 Da theoretical mass, and ≤2 missed cleavages). Each peptide sequence contributes to a y-axis bin defined by the self-inclusive count of all theoretical peptides within a specified mass tolerance (x-axis) and sharing at least 60% predicted fragmentation ion similarity. For comparison, the human proteome (A) and human TT VH (C) sequence databases were processed likewise and subsampled to include the same number of peptide sequences as (B). The intersequence similarity evident in the antibody sets is negligible in this size-matched human proteome control.
Figure 3
Figure 3
Confidently identified spectra from most proteomics samples generally score well against only one database sequence. In contrast, the interspersal of conserved (framework) and variable regions in antibody F(ab′)2 sequences often leads to multiple high-scoring PSMs for a single IgG-VH peptide spectrum. Plotting the primary PSM score (XCorr) vs the normalized difference in XCorr scores between the two top-scoring matches (ΔCN) from proteomic analysis of (A) human HeLa cell lysate compared to (B) rabbit and (C) human IgG-VH peptide spectra reveals a substantial proportion of high XCorr/low ΔCN PSMs (denoted by black boxes) in the IgG-VH data sets. Standard false discovery rate (FDR) calculations fail for these PSMs, as illustrated by high (blue), medium (green), and low (red) Percolator confidence scores: many high XCorr/low ΔCN PSMs are erroneously assigned high confidence in spite of high-scoring second hits implicit in the low ΔCN values. Filtering out low ΔCN PSMs inadvertently removes many true hits. Comparison of PSM XCorr distributions between target (blue) and decoy (red) databases reveals that standard decoys do not adequately model the nonrandom structure of IgG-VH peptides [(D) human proteome, (E) rabbit IgG-VH, (F) human IgG-VH]. This is attributable to high-scoring, incorrect matches to IgG framework region-derived sequences. By constructing an alternate decoy database for which variable residues were shuffled but J-region framework regions were preserved (“Conserved-J Decoy”), ambiguity of CDR-H3,J peptide assignment can be modeled (green). These peptides acount for the majority of high-XCorr PSMs in rabbit (E), while additional framework-derived peptides add to the complexity of the human IG-VH sample (F, inset).
Figure 4
Figure 4
High-scoring PSMs for antibody CDR-H3 peptide mass spectra are dominated by matches to peptides sharing identical C-terminal J region FR4 framework sequences. This is illustrated by two top-scoring peptide sequences mapped to a single observed rabbit spectrum, with shared (orange) and unique in silico predicted MS2 fragmentation peaks associated with APYGDGDPYNLWGPGTLVTVSSGQPK (blue) and DAGTSGYHFNLWGPGTLVTVSSGQPK (green). Both sequences exhibit PSMs with XCorr >4.7 with a normalized difference in XCorr scores (ΔCN) of 0.006. A similar trend accounts for a large proportion of the high-scoring matches in Figure 3B,C,E,F.
Figure 5
Figure 5
A limited set of higher-confidence identifications can be created using differential covalent modification to flag cysteine-containing peptides. (A) Comparison of rabbit CCH spectra from samples treated with iodoacetamide (Cys +57 Da) vs iodoethanol (Cys +44 Da) results in a 13 Da mass difference per cysteine. PSMs for paired spectra exhibiting a mass shift but no cysteine residues in the corresponding matched sequences can be flagged as false identifications. (B) Comparison of precursor mass offsets between differentially labeled rabbit CCH samples confirms alkylation and oxidation account for the most abundant modifications.
Figure 6
Figure 6
Correctly matched PSMs exhibit a systematically smaller average mass deviation (AMD) compared to incorrect identifications. (A) Plotting the difference in precursor ion mass from expected peptide mass (Precursor Mass Accuracy) vs XCorr scores of individual rabbit CCH PSMs reveals overlapping mass accuracy distributions for PSMs matched to the same peptide sequence for correct (blue) and incorrect (red) identifications. While individual incorrect PSMs may achieve higher XCorr scores than correct matches, the average precursor mass accuracy across all PSMs for a given peptide (AMD) discriminates well between correct and incorrect identifications. (B) For the set of high-confidence rabbit CCH PSMs derived from cysteine-labeling, true identifications exhibit low AMD scores while false identifications are more uniformly distributed. Thus, filtering by AMD strongly controls misidentifications. Here, controlling AMD to within 1.5 ppm provides 100% recall of true identifications and increases precision from near 50% (background rate) to 79%. Requiring AMD < 1 ppm further increases precision to 87% with no loss of recall.

References

    1. Poulsen T. R.; Meijer P. J.; Jensen A.; Nielsen L. S.; Andersen P. S. J. Immunol 2007, 179, 3841–3850. - PubMed
    1. Glanville J.; Zhai W.; Berka J.; Telman D.; Huerta G.; Mehta G. R.; Ni I.; Mei L.; Sundar P. D.; Day G. M.; Cox D.; Rajpal A.; Pons J. Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 20216–20221. - PMC - PubMed
    1. Briney B. S.; Crowe J. E. Jr. Front. Immunol. 2013, 4, 42. - PMC - PubMed
    1. Murphy K.; Travers P.; Walport M.; Janeway C.. Janeway′s immunobiology, 8th ed.; Garland Science: New York, 2012; p xix, 868 p.
    1. Tarlinton D.; Good-Jacobson K. Science 2013, 341, 1205–1211. - PubMed

Publication types