. 2022 Aug 23;119(34):e2203505119.

doi: 10.1073/pnas.2203505119. Epub 2022 Aug 15.

Repertoire-scale measures of antigen binding

Rohit Arora¹, Ramy Arnaout^{1

2}

Affiliations

¹ Division of Clinical Pathology, Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA 02215.
² Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215.

PMID: 35969768
PMCID: PMC9407674
DOI: 10.1073/pnas.2203505119

Repertoire-scale measures of antigen binding

Rohit Arora et al. Proc Natl Acad Sci U S A. 2022.

. 2022 Aug 23;119(34):e2203505119.

doi: 10.1073/pnas.2203505119. Epub 2022 Aug 15.

Authors

Rohit Arora¹, Ramy Arnaout^{1

2}

Affiliations

¹ Division of Clinical Pathology, Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA 02215.
² Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215.

PMID: 35969768
PMCID: PMC9407674
DOI: 10.1073/pnas.2203505119

Abstract

Antibodies and T cell receptors (TCRs) are the fundamental building blocks of adaptive immunity. Repertoire-scale functionality derives from their epitope-binding properties, just as macroscopic properties like temperature derive from microscopic molecular properties. However, most approaches to repertoire-scale measurement, including sequence diversity and entropy, are not based on antibody or TCR function in this way. Thus, they potentially overlook key features of immunological function. Here we present a framework that describes repertoires in terms of the epitope-binding properties of their constituent antibodies and TCRs, based on analysis of thousands of antibody-antigen and TCR-peptide-major-histocompatibility-complex binding interactions and over 400 high-throughput repertoires. We show that repertoires consist of loose overlapping classes of antibodies and TCRs with similar binding properties. We demonstrate the potential of this framework to distinguish specific responses vs. bystander activation in influenza vaccinees, stratify cytomegalovirus (CMV)-infected cohorts, and identify potential immunological "super-agers." Classes add a valuable dimension to the assessment of immune function.

Keywords: B cell repertoires; Gibbs free energy; T cell repertoires; antigen binding; immunological diversity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

**Fig. 1.**
Sequence diversity vs. class diversity. Each circle represents a B or T cell; each color represents a unique antibody or TCR sequence. Similar colors encode antibodies or TCRs with similar epitope binding properties. Two repertoires, for example, repertoires 1 and 2 (A), that have the same total number of cells (A) and identical sequence frequency distributions (B), have identical sequence diversity (for all *^qD*); *Insets* give the effective number versions (3, 50, 58) of entropy and BPI, ¹D = e^{Shannon entropy} and ^∞D = 1/BPI. Lower pairwise binding similarities in repertoire 2 (C) give repertoire 2 higher class diversity than repertoire 1; repertoire 2 can recognize more different epitopes (D). Color coding reflects optimal binding (e.g., red sequence, red epitope). The colors of the bars in E indicate the contributions of the antibody or TCR encoded by the sequence of that color. Similar colors bind better than different colors. Higher frequencies (B) can partially compensate for weaker binding.

**Fig. 2.**
Large-scale experimental *ΔK_d* for single-amino acid substitutions on binding. (A) Overview. In this study, the experimental *K_d* binding data are from SKEMPI. (B–D) Class diversity ≠ edit distance: the nonuniqueness of edit distance–based diversity. We use a particular form of class diversity based on binding similarity; the similarity function we fit to the binding data in SKEMPI (*K_d*) is what yields this form. However, every form of class similarity differs from edit distance in that class diversity is uniquely determined by its similarity function, whereas diversity measures based on edit distance alone—that is, ones that are not based on a fit to any external data but are solely based on the number of clusters that result from a particular edit distance cutoff—are not unique in this way. B shows the simplest “repertoire” that illustrates this point. Each node represents a sequence. Edges connect sequences that differ at just a single amino acid position. If we cluster by edit distance with a clustering threshold of one amino acid difference, there are three different possible clusterings (C). In contrast, Eq. 1, which defines class diversity, gives a unique solution. In edit distance–only measures, the clustering threshold need not be one amino acid; it can be two, or three, or, indeed, any arbitrary number. In contrast, the 0.3 in *Z_ij* = 0.3^m in the specific form of class diversity that we explore in this study is not chosen arbitrarily: It is the value determined by a fit to SKEMPI binding data. (D) Example of multiple different possible pure edit distance–based diversity measures for a 50-sequence connected cluster from the day 7 post-influenza-vaccination sample in Fig. 4C. Each node is a unique sequence. Each pair of nodes is connected by an edge if they differ at a single amino acid position. Here, “diversity” means number of clusters at the indicated edit distance threshold, beginning with the highest-degree node (the sequence with the most connections; same approach as in C, diversity = 1). Clusters with more than one sequence are identified by a gray background. None of the shown thresholds convey that there are three related clusters. While some other edit distance–based threshold or strategy could be used based on network topology, class diversity is not ad hoc or post hoc in this way, as it is based on independent data: binding data. (E) Examples of reference–variant pairs with the view centered on the substituted amino acid. PDB ID is given in upper left of each row; substitution is given in upper right. (F) Distributions for core (*Top*) and noncore (*Bottom*) mutations for immunoglobulin (IG) and TCR (TR) pairs. (G) Combining the distributions in B proportional to the relative frequencies of core and noncore residues results in an overall distribution (black), plotted as one minus the cumulative distribution function (CDF) and an exponential fit (blue, e^1/(−*^RT*^ln^s⁾). Gray line indicates the mean –Rtln(s).

**Fig. 3.**
Robustness, validity, and comparison to edit distance–only measures. (A) The ⁰D and ⁰*D_S* diversity, (B) discovery rate, and (C) maximum error for sequences (open symbols) and classes (filled circles) for repertoires from DNA (small circles) or mRNA (small triangles) and for metarepertoires (large circles) vs. sample size. Maximum undercount in C is the maximum fraction by which sample diversity will underestimate overall diversity (49). Red arrowhead, underestimate for a 300,000-sequence TRB repertoire is ≤33%; yellow arrowhead, sample class diversity of a 1-million-sequence IGH repertoire will underestimate overall class diversity by ≤30×; open arrowhead, for a million-sequence IGH repertoire from DNA, there is a ∼50–50 chance that the next sequence will be new. (D–F) Validity: sequence vs. class diversity for four in silico repertoires, each with 34 unique/752 total sequences with identical sequence frequency distributions (compare Fig. 1B). In the networks, each node represents a unique sequence; node size reflects that sequence’s frequency in the repertoire. Edges connect sequences that differ at a single amino acid position. (D) CDR3s from a somatically hypermutated IGG clonotype. The extent to which class diversity exceeds one reflects intraclone diversity. (E) CDR3s from two different IGG clonotypes. (F) CDR3s drawn randomly from repertoires in this study. (G) Non-CDR3 amino acid sequences generated uniformly at random. Note the contrast between class diversity and edit distance thresholds In D–F, the final two columns, edit distance–based clustering requires a threshold to be chosen: for example, one, two, or three amino acids. Sequences that differ by this threshold amount or less are clustered together. The resulting number of clusters gives one measure of diversity. Different thresholds often give different clusters, and thereby different measures of diversity. In the rightmost column of *D–F*, note the fairly wide ranges for repertoires A and B, a consequence of the nonuniqueness illustrated in Fig. 2 B–D. In the extremely diverse repertoires in C (all very different CDR3s) and D (random amino acids), edit distance approximates class diversity, but this happens only in the most extreme cases, not in typical repertoires (e.g., Fig. 4 C–E).

**Fig. 4.**
Class diversity for stratification and discovery. (A) Sequence and class diversity for IGG repertoires from mRNA in influenza vaccination. (B) Fold change in sequence diversity vs. class diversity. (C–E) Binding-based class diversity does not correspond to a simple edit distance threshold. C shows network representations of the largest connected component of day 0 and day 7 IGG CDR3 repertoires from subject T10-Y1 from A and B (24): Each dot represents a sequence; each edge connects sequences that differ by a single amino acid position. (D) Using purely edit distance, the diversity, measured as the number of clusters, ranges over orders of magnitude, depending on what cluster threshold is chosen. Absent more information, the choice of cluster threshold is arbitrary. Note that, at every cluster threshold, diversity is higher in the day 7 repertoire (thick blue line) than the day 0 repertoire (thin black line). (E) In contrast, there is no arbitrariness to the measures of class diversity presented in this study: They are determined by the fit to the *K_d* binding data described. The ⁰D_S based on binding similarity happens to correspond roughly to a clustering threshold of six amino acids in the day 0 repertoire and eight amino acids in the day 7 repertoire; there is no set correspondence because class diversity is different from simple edit distance–based diversity (compare to thresholds of one to three amino acids in Fig. 3). Also in contrast to edit distance, class diversity in the day 7 repertoire (filled symbol) is lower than in the day 0 repertoire (open symbol), reflecting the capture of repertoire-wide structure that simple edit distance–based measures cannot capture, at any clustering threshold. (F) Class diversity vs. sequence diversity for different repertoire types. Each symbol is a repertoire. (G) Score from a random forest classifier based on combining sequence and class diversity for TRB repertoires from DNA for q = 1 and ∞ for CMV-seronegative (empty circles) vs. seropositive (filled circles) individuals, with (H) network diagram schematics of the corresponding repertoires; filled circles represent CMV-specific CDR3s. Note that the highest-probability CMV-negative repertoires in D include 51 repertoires (overlapping symbols). (I) Sequence diversity and (J) class diversity for TRB repertoires from DNA by age. Arrowheads indicate exceptional individuals. + and – designate 90% confidence–imputed CMV serostatus; q = 0 except in D.

See this image and copyright information in PMC

References

1. Six A., et al. , The past, present, and future of immune repertoire biology—The rise of next-generation repertoire analysis. Front. Immunol. 4, 413 (2013). - PMC - PubMed
1. Chaudhary N., Wesemann D. R., Analyzing immunoglobulin repertoires. Front. Immunol. 9, 462 (2018). - PMC - PubMed
1. Hill M. O., Diversity and evenness: A unifying notation and its consequences. Ecology 54, 427–432 (1973).
1. Hosoi A., et al. , Increased diversity with reduced “diversity evenness” of tumor infiltrating T-cells for the successful cancer immunotherapy. Sci. Rep. 8, 1058 (2018). - PMC - PubMed
1. Britanova O. V., et al. , Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling. J. Immunol. 192, 2689–2698 (2014). - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

R01 AI148747/AI/NIAID NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Repertoire-scale measures of antigen binding

Affiliations

Repertoire-scale measures of antigen binding

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources