Review

. 2018 Jul;284(1):167-179.

doi: 10.1111/imr.12665.

Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination

Yuval Elhanati¹, Zachary Sethna¹, Curtis G Callan Jr¹, Thierry Mora², Aleksandra M Walczak³

Affiliations

¹ Joseph Henry Laboratories, Princeton University, Princeton, NJ, USA.
² Laboratoire de physique statistique, CNRS, Sorbonne Université, Université Paris-Diderot, and École Normale Supérieure (PSL University), Paris, France.
³ Laboratoire de physique théorique, CNRS, Sorbonne Université, and École Normale Supérieure (PSL University), Paris, France.

PMID: 29944757
PMCID: PMC6033145
DOI: 10.1111/imr.12665

Review

Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination

Yuval Elhanati et al. Immunol Rev. 2018 Jul.

. 2018 Jul;284(1):167-179.

doi: 10.1111/imr.12665.

Authors

Yuval Elhanati¹, Zachary Sethna¹, Curtis G Callan Jr¹, Thierry Mora², Aleksandra M Walczak³

Affiliations

¹ Joseph Henry Laboratories, Princeton University, Princeton, NJ, USA.
² Laboratoire de physique statistique, CNRS, Sorbonne Université, Université Paris-Diderot, and École Normale Supérieure (PSL University), Paris, France.
³ Laboratoire de physique théorique, CNRS, Sorbonne Université, and École Normale Supérieure (PSL University), Paris, France.

PMID: 29944757
PMCID: PMC6033145
DOI: 10.1111/imr.12665

Abstract

Despite the extreme diversity of T-cell repertoires, many identical T-cell receptor (TCR) sequences are found in a large number of individual mice and humans. These widely shared sequences, often referred to as "public," have been suggested to be over-represented due to their potential immune functionality or their ease of generation by V(D)J recombination. Here, we show that even for large cohorts, the observed degree of sharing of TCR sequences between individuals is well predicted by a model accounting for the known quantitative statistical biases in the generation process, together with a simple model of thymic selection. Whether a sequence is shared by many individuals is predicted to depend on the number of queried individuals and the sampling depth, as well as on the sequence itself, in agreement with the data. We introduce the degree of publicness conditional on the queried cohort size and the size of the sampled repertoires. Based on these observations, we propose a public/private sequence classifier, "PUBLIC" (Public Universal Binary Likelihood Inference Classifier), based on the generation probability, which performs very well even for small cohort sizes.

Keywords: TCR repertoires; TCR sharing; inference; probability of generation; public sequences.

PubMed Disclaimer

Figures

**Figure 1**
Cartoon representation of the pipeline for computing the distribution of shared sequences between samples. (A) Sharing between samples is analyzed by marking repeated CDR3s between K samples. (B) The overlapping sequences are counted and binned, and the number of CDR3s that were shared m times is computed. (C) Distribution of the number of sequences that are shared m times between the sample of K individuals

**Figure 2**
Distribution of sharing numbers. (A) Distribution of the number of sequences that are shared between m individuals (m = sharing number) for 14 mice. Data points (blue crosses) are compared to analytical model predictions (see Section 7.3.1) with selection (red curves) and without selection (green curve), and with simulations (see Section 7.2) based on the generation model with selection (red crosses) and without selection (green crosses). While the model without selection underestimates sharing, the prediction is improved by adding selection. The model predictions derived from analytical calculations and stochastic simulations agree well. The selection factor q, defined as the probability of a CDR3 to pass thymic selection, is inferred by least‐square regression from the relation between the number of unique CDR3 amino acid sequences with the number of unique nucleotide sequence reads (inset, see Section 7). (B) Distribution of sharing numbers in a cohort of 658 humans. The model prediction with selection (simulation: black crosses, analytics: red line) agrees well with the data (blue crosses). The selection factor is obtained as for mice (inset)

**Figure 3**
The sharing number depends on the sampling depth and cohort size. Downsampling the number of sequences in all individuals affects sharing, and decreases the observed probability to be public. (A) The number of sequences for each sharing number decreases as the repertoires of all individual are downsampled by a factor 0.5 (blue points) compared to the original sample (red points), as predicted by the model (red and blue lines). The normalized distribution of sharing numbers (inset) shows that downsampling affects larger sharing numbers more. (B) Model prediction of the fraction of sequences that are entirely private (ie, appearing in just one individual), as a function of the downsampling fraction and cohort size. Larger samples and cohorts result in fewer private sequences

**Figure 4**
(A) Number of unique CDR3 amino acid sequences in the pooled repertoire of n individuals, as a function of n. This number does not depend strongly on the order in which individuals are added to the group (black error bars, obtained by measuring variations across 30 random orderings). The theoretical prediction (red line, see Section 7.3.4) agrees very well with the data. The model prediction was obtained using the mean sample size of the pooled repertoire across 30 random orderings. Each new individual adds ∼200 000 new CDR3 sequences. (B) Theoretical extrapolation to very large cohorts (red line). This model prediction is based on an average sample size. The same prediction can be done for the full repertoires contained in the human body (with 10¹¹ unique recombination events), which yields much larger numbers of unique CDR3s (black line). (C) Model prediction for the fraction of sequences in each individual that are truly “public,” ie, have a generation probability larger than 1/N, where N is the number of unique TCRs in each individual (repertoire size). The red and blue stripes mark the possible range of repertoire sizes in mice and humans, according to current knowledge

**Figure 5**
Distributions of the logarithm of the generation probability for different minimal sharing numbers, for (A) mice and (B) humans. For larger sharing numbers, the distribution shifts toward higher probabilities and becomes narrower. This shift enables the characterization of the sharing number, or the degree of publicness, using the generation probability. The model captures the right trend of the sharing numbers, despite predicting much narrower distributions

**Figure 6**
Cartoon representation of the pipeline for the PUBLIC classifier. (A) To each CDR3 sequence in the dataset we associate its generation probability (p _gen), which PUBLIC uses to predict the empirical sharing number. (B) The p _gen distributions of shared sequences depend on the sharing number m. We pick a classifier threshold value of P _gen, θ, that separates public from private sequences for this sharing number value of m. The areas of the histograms that fall on the wrong side of the threshold are defined as the false positive and false negative rates. (C) For a given choice of the minimal sharing number m, we plot the true and false positive rates as a function of the classifier threshold θ to obtain a receiver operating characteristic

**Figure 7**
Performance of the PUBLIC classifier. Receiver operating characteristic (ROC) curves for (A) mice and (B) humans for different minimal sharing numbers m. Inset: the area under the ROC curve (AUROC) describes the probability of classifying a given sequence as public or private. Higher AUROC values correspond to a better a classifier. The AUROC score increases with the minimal sharing number m (inset), meaning that a more restrictive definition of publicness gives better classifiers

**Figure 8**
Distribution of sharing numbers in a cohort of 30 bladder cancer patients. The distribution is compared to a sub‐cohort of 30 healthy individuals downsampled to have the same sample sizes as the cancer samples. The distribution are the same in healthy and bladder cancer patients, indicating that there are no common significantly over‐represented TCRs in the blood repertoire of cancer patients

See this image and copyright information in PMC

Cited by

Treg Enhancing Therapies to Treat Autoimmune Diseases.
Eggenhuizen PJ, Ng BH, Ooi JD. Eggenhuizen PJ, et al. Int J Mol Sci. 2020 Sep 23;21(19):7015. doi: 10.3390/ijms21197015. Int J Mol Sci. 2020. PMID: 32977677 Free PMC article. Review.
Dynamics of B cell repertoires and emergence of cross-reactive responses in patients with different severities of COVID-19.
Montague Z, Lv H, Otwinowski J, DeWitt WS, Isacchini G, Yip GK, Ng WW, Tsang OT, Yuan M, Liu H, Wilson IA, Peiris JSM, Wu NC, Nourmohammad A, Mok CKP. Montague Z, et al. Cell Rep. 2021 May 25;35(8):109173. doi: 10.1016/j.celrep.2021.109173. Epub 2021 May 9. Cell Rep. 2021. PMID: 33991510 Free PMC article.
Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity.
DeWitt WS 3rd, Smith A, Schoch G, Hansen JA, Matsen FA 4th, Bradley P. DeWitt WS 3rd, et al. Elife. 2018 Aug 28;7:e38358. doi: 10.7554/eLife.38358. Elife. 2018. PMID: 30152754 Free PMC article.
Autoencoder based local T cell repertoire density can be used to classify samples and T cell receptors.
Dvorkin S, Levi R, Louzoun Y. Dvorkin S, et al. PLoS Comput Biol. 2021 Jul 26;17(7):e1009225. doi: 10.1371/journal.pcbi.1009225. eCollection 2021 Jul. PLoS Comput Biol. 2021. PMID: 34310600 Free PMC article.
CDR3 and V genes show distinct reconstitution patterns in T cell repertoire post-allogeneic bone marrow transplantation.
Tickotsky-Moskovitz N, Louzoun Y, Dvorkin S, Rotkopf A, Kuperman AA, Efroni S. Tickotsky-Moskovitz N, et al. Immunogenetics. 2021 Apr;73(2):163-173. doi: 10.1007/s00251-020-01200-7. Epub 2021 Jan 21. Immunogenetics. 2021. PMID: 33475766

See all "Cited by" articles

References

1. Mora T, Walczak A. Quantifying lymphocyte receptor diversity. In: Das J, Jayaprakash C, eds. Systems Immunology: An Introduction to Modeling Methods for Scientists. Boca Raton, FL: CRC Press, Taylor and Francis; 2018;1–10.
1. Moss PA, Moots RJ, Rosenberg WM, et al. Extensive conservation of alpha and beta chains of the human T‐cell antigen receptor recognizing HLA‐A2 and influenza A matrix peptide. Proc Natl Acad Sci USA. 1991;88:8987–8990. - PMC - PubMed
1. Casanova JL, Cerottini JC, Matthes M, et al. H‐2‐restricted cytolytic T lymphocytes specific for HLA display T cell receptors of limited diversity. J Exp Med. 1992;176:439–447. - PMC - PubMed
1. Argaet VP, Schmidt CW, Burrows SR, et al. Dominant selection of an invariant T cell antigen receptor in response to persistent infection by Epstein‐Barr virus. J Exp Med. 1994;180:2335–2340. - PMC - PubMed
1. Cibotti R, Cabaniols JP, Pannetier C, et al. Public and private V beta T cell receptor repertoires against hen egg white lysozyme (HEL) in nontransgenic versus HEL transgenic mice. J Exp Med. 1994;180:861–872. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination

Affiliations

Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources