. 2014 May 20;86(10):4758-66.

doi: 10.1021/ac4037679. Epub 2014 May 1.

Proteomic identification of monoclonal antibodies from serum

Daniel R Boutz¹, Andrew P Horton, Yariv Wine, Jason J Lavinder, George Georgiou, Edward M Marcotte

Affiliations

Affiliation

¹ Center for Systems & Synthetic Biology, †Institute for Cellular and Molecular Biology, ⊥Department of Biomedical Engineering, §Department of Chemical Engineering, and ∥Department of Molecular Biosciences, University of Texas at Austin , Austin, Texas 78712, United States.

PMID: 24684310
PMCID: PMC4033631
DOI: 10.1021/ac4037679

Proteomic identification of monoclonal antibodies from serum

Daniel R Boutz et al. Anal Chem. 2014.

. 2014 May 20;86(10):4758-66.

doi: 10.1021/ac4037679. Epub 2014 May 1.

Authors

Daniel R Boutz¹, Andrew P Horton, Yariv Wine, Jason J Lavinder, George Georgiou, Edward M Marcotte

Affiliation

¹ Center for Systems & Synthetic Biology, †Institute for Cellular and Molecular Biology, ⊥Department of Biomedical Engineering, §Department of Chemical Engineering, and ∥Department of Molecular Biosciences, University of Texas at Austin , Austin, Texas 78712, United States.

PMID: 24684310
PMCID: PMC4033631
DOI: 10.1021/ac4037679

Abstract

Characterizing the in vivo dynamics of the polyclonal antibody repertoire in serum, such as that which might arise in response to stimulation with an antigen, is difficult due to the presence of many highly similar immunoglobulin proteins, each specified by distinct B lymphocytes. These challenges have precluded the use of conventional mass spectrometry for antibody identification based on peptide mass spectral matches to a genomic reference database. Recently, progress has been made using bottom-up analysis of serum antibodies by nanoflow liquid chromatography/high-resolution tandem mass spectrometry combined with a sample-specific antibody sequence database generated by high-throughput sequencing of individual B cell immunoglobulin variable domains (V genes). Here, we describe how intrinsic features of antibody primary structure, most notably the interspersed segments of variable and conserved amino acid sequences, generate recurring patterns in the corresponding peptide mass spectra of V gene peptides, greatly complicating the assignment of correct sequences to mass spectral data. We show that the standard method of decoy-based error modeling fails to account for the error introduced by these highly similar sequences, leading to a significant underestimation of the false discovery rate. Because of these effects, antibody-derived peptide mass spectra require increased stringency in their interpretation. The use of filters based on the mean precursor ion mass accuracy of peptide-spectrum matches is shown to be particularly effective in distinguishing between "true" and "false" identifications. These findings highlight important caveats associated with the use of standard database search and error-modeling methods with nonstandard data sets and custom sequence databases.

PubMed Disclaimer

Figures

**Figure 1**
A schematic of the structure and representative sequences of the immunoglobulin (Ig) heavy chain variable domain (V_H). The V_H sequence is created by recombination of V, D, and J subgenes and encodes epitope binding sites for antigen-recognition. Complementarity determining regions (CDRs) represent uniquely nondegenerate fingerprints, interspersed between constant framework sequences (FRs), and manifest as hypervariable and conserved sequences, respectively, in the multiple sequence alignment. Antigen binding specificity is primarily dictated by the CDR-H3 region. Hence, the challenge of antibody repertoire proteomics can be largely reduced to the problem of successfully identifying CDR-H3-containing peptides.

**Figure 2**
In contrast to the proteome in general, antibody peptide sequences resemble each other in both mass and expected fragmentation patterns. The peptide sequence search space is thus strongly dependent on mass accuracy, as seen by plotting the extent of theoretical peptide-spectral match ambiguity, for (A) human proteome peptide sequences, (B) rabbit CCH antibody V_H peptides, and (C) human tetanus toxoid antibody V_H peptides. Reducing precursor mass tolerance thus more strongly affects the potential for false identifications in V_H peptides than for a typical proteome. Here, an *in silico* digest of the rabbit CCH V_H antibody sequences generated 505 790 unique peptide sequences (constrained to fully tryptic peptides of ≥8 amino acids, ≤6000 Da theoretical mass, and ≤2 missed cleavages). Each peptide sequence contributes to a y-axis bin defined by the self-inclusive count of all theoretical peptides within a specified mass tolerance (x-axis) and sharing at least 60% predicted fragmentation ion similarity. For comparison, the human proteome (A) and human TT V_H (C) sequence databases were processed likewise and subsampled to include the same number of peptide sequences as (B). The intersequence similarity evident in the antibody sets is negligible in this size-matched human proteome control.

**Figure 3**
Confidently identified spectra from most proteomics samples generally score well against only one database sequence. In contrast, the interspersal of conserved (framework) and variable regions in antibody F(ab′)2 sequences often leads to multiple high-scoring PSMs for a single IgG-V_H peptide spectrum. Plotting the primary PSM score (XCorr) vs the normalized difference in XCorr scores between the two top-scoring matches (ΔCN) from proteomic analysis of (A) human HeLa cell lysate compared to (B) rabbit and (C) human IgG-V_H peptide spectra reveals a substantial proportion of high XCorr/low ΔCN PSMs (denoted by black boxes) in the IgG-V_H data sets. Standard false discovery rate (FDR) calculations fail for these PSMs, as illustrated by high (blue), medium (green), and low (red) Percolator confidence scores: many high XCorr/low ΔCN PSMs are erroneously assigned high confidence in spite of high-scoring second hits implicit in the low ΔCN values. Filtering out low ΔCN PSMs inadvertently removes many true hits. Comparison of PSM XCorr distributions between target (blue) and decoy (red) databases reveals that standard decoys do not adequately model the nonrandom structure of IgG-V_H peptides [(D) human proteome, (E) rabbit IgG-V_H, (F) human IgG-V_H]. This is attributable to high-scoring, incorrect matches to IgG framework region-derived sequences. By constructing an alternate decoy database for which variable residues were shuffled but J-region framework regions were preserved (“Conserved-J Decoy”), ambiguity of CDR-H3,J peptide assignment can be modeled (green). These peptides acount for the majority of high-XCorr PSMs in rabbit (E), while additional framework-derived peptides add to the complexity of the human IG-V_H sample (F, inset).

**Figure 4**
High-scoring PSMs for antibody CDR-H3 peptide mass spectra are dominated by matches to peptides sharing identical C-terminal J region FR4 framework sequences. This is illustrated by two top-scoring peptide sequences mapped to a single observed rabbit spectrum, with shared (orange) and unique *in silico* predicted MS2 fragmentation peaks associated with *APYGDGDPY*NLWGPGTLVTVSSGQPK (blue) and *DAGTSGYHF*NLWGPGTLVTVSSGQPK (green). Both sequences exhibit PSMs with XCorr >4.7 with a normalized difference in XCorr scores (ΔCN) of 0.006. A similar trend accounts for a large proportion of the high-scoring matches in Figure 3B,C,E,F.

**Figure 5**
A limited set of higher-confidence identifications can be created using differential covalent modification to flag cysteine-containing peptides. (A) Comparison of rabbit CCH spectra from samples treated with iodoacetamide (Cys +57 Da) vs iodoethanol (Cys +44 Da) results in a 13 Da mass difference per cysteine. PSMs for paired spectra exhibiting a mass shift but no cysteine residues in the corresponding matched sequences can be flagged as false identifications. (B) Comparison of precursor mass offsets between differentially labeled rabbit CCH samples confirms alkylation and oxidation account for the most abundant modifications.

**Figure 6**
Correctly matched PSMs exhibit a systematically smaller average mass deviation (AMD) compared to incorrect identifications. (A) Plotting the difference in precursor ion mass from expected peptide mass (Precursor Mass Accuracy) vs XCorr scores of individual rabbit CCH PSMs reveals overlapping mass accuracy distributions for PSMs matched to the same peptide sequence for correct (blue) and incorrect (red) identifications. While individual incorrect PSMs may achieve higher XCorr scores than correct matches, the average precursor mass accuracy across all PSMs for a given peptide (AMD) discriminates well between correct and incorrect identifications. (B) For the set of high-confidence rabbit CCH PSMs derived from cysteine-labeling, true identifications exhibit low AMD scores while false identifications are more uniformly distributed. Thus, filtering by AMD strongly controls misidentifications. Here, controlling AMD to within 1.5 ppm provides 100% recall of true identifications and increases precision from near 50% (background rate) to 79%. Requiring AMD < 1 ppm further increases precision to 87% with no loss of recall.

See this image and copyright information in PMC

References

1. Poulsen T. R.; Meijer P. J.; Jensen A.; Nielsen L. S.; Andersen P. S. J. Immunol 2007, 179, 3841–3850. - PubMed
1. Glanville J.; Zhai W.; Berka J.; Telman D.; Huerta G.; Mehta G. R.; Ni I.; Mei L.; Sundar P. D.; Day G. M.; Cox D.; Rajpal A.; Pons J. Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 20216–20221. - PMC - PubMed
1. Briney B. S.; Crowe J. E. Jr. Front. Immunol. 2013, 4, 42. - PMC - PubMed
1. Murphy K.; Travers P.; Walport M.; Janeway C.. Janeway′s immunobiology, 8th ed.; Garland Science: New York, 2012; p xix, 868 p.
1. Tarlinton D.; Good-Jacobson K. Science 2013, 341, 1205–1211. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Proteomic identification of monoclonal antibodies from serum

Affiliation

Proteomic identification of monoclonal antibodies from serum

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases