. 2017 Nov 15:8:1500.

doi: 10.3389/fimmu.2017.01500. eCollection 2017.

Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information

Ryo Yokota¹, Yuki Kaminaga², Tetsuya J Kobayashi^{1

2

3}

Affiliations

¹ Institute of Industrial Science, The University of Tokyo, Tokyo, Japan.
² Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan.
³ PRESTO, Japan Science and Technology Agency (JST), Saitama, Japan.

PMID: 29187849
PMCID: PMC5694755
DOI: 10.3389/fimmu.2017.01500

Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information

Ryo Yokota et al. Front Immunol. 2017.

. 2017 Nov 15:8:1500.

doi: 10.3389/fimmu.2017.01500. eCollection 2017.

Authors

Ryo Yokota¹, Yuki Kaminaga², Tetsuya J Kobayashi^{1

2

3}

Affiliations

¹ Institute of Industrial Science, The University of Tokyo, Tokyo, Japan.
² Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan.
³ PRESTO, Japan Science and Technology Agency (JST), Saitama, Japan.

PMID: 29187849
PMCID: PMC5694755
DOI: 10.3389/fimmu.2017.01500

Abstract

Inter-sample comparisons of T-cell receptor (TCR) repertoires are crucial for gaining a better understanding of the immunological states determined by different collections of T cells from different donor sites, cell types, and genetic and pathological backgrounds. For quantitative comparison, most previous studies utilized conventional methods in ecology, which focus on TCR sequences that overlap between pairwise samples. Some recent studies attempted another approach that is categorized into Poisson abundance models using the abundance distribution of observed TCR sequences. However, these methods ignore the details of the measured sequences and are consequently unable to identify sub-repertoires that might have important contributions to the observed inter-sample differences. Moreover, the sparsity of sequence data due to the huge diversity of repertoires hampers the performance of these methods, especially when few overlapping sequences exist. In this paper, we propose a new approach for REpertoire COmparison in Low Dimensions (RECOLD) based on TCR sequence information, which can estimate the low-dimensional structure by embedding the pairwise sequence dissimilarities in high-dimensional sequence space. The inter-sample differences between repertoires are then quantified by information-theoretic measures among the distributions of data estimated in the embedded space. Using datasets of mouse and human TCR repertoires, we demonstrate that RECOLD can accurately identify the inter-sample hierarchical structures, which have a good correspondence with our intuitive understanding about sample conditions. Moreover, for the dataset of transgenic mice that have strong restrictions on the diversity of their repertoires, our estimated inter-sample structure was consistent with the structure estimated by previous methods based on abundance or overlapping sequence information. For the dataset of human healthy donors and Sézary syndrome patients, our method also showed robust estimation performance even under the condition of high sparsity in TCR sequences, while previous studies failed to estimate the structure. In addition, we identified the sequences that contribute to the pairwise-sample differences between the repertoires with the different genetic backgrounds of mice. Such identification of the sequences contributing to variation in immune cell repertoires may provide substantial insight for the development of new immunotherapies and vaccines.

Keywords: Jensen–Shannon divergence; T cell; TCR repertoire; inter-repertoire comparison; manifold learning; pairwise sequence alignment; sequence dissimilarity.

PubMed Disclaimer

Figures

**Figure 1**
Dissimilarity matrices and their embedded distributions with 10 different score matrices: **(A)** PAM and **(B)** BLOSUM. The upper and lower panels show the dissimilarity matrices and projection maps in two-dimensional space, respectively. All of the rows and columns in each dissimilarity matrix were sorted according to the sum of their elements. The colors of points in the lower panels of **(A,B)** correspond to the clusters in PAM250 and BLOSUM45 that were discriminated by k-means algorithms (k = 7).

**Figure 2**
Dimensional reduction with four different dimensionality-reduction methods: **(A)** t-SNE, **(B)** MDS, **(C)** ISOMAP, and **(D)** SE. Panel (i) includes the points of the total unique sequences observed in all samples. Panel (ii) includes only the portions of sequences that were observed in each sample. “Ep” and “Wt” denote two different genetic backgrounds of mice. “TN” and “TR” denote naive and regulatory T cells. “Thy” and “Per” denote the thymus and peripheral lymph nodes, respectively. For example, EpTN-Thy denotes the naive T cells that were collected from the thymus in the “Ep” mice.

**Figure 3**
JSD matrices and their clustering results with four different methods: (i) t-SNE, (ii) MSD, (iii) ISOMAP, and (iv) SE. **(A)** Matrices of pairwise-sample differences and **(B)** the dendrogram constructed from the matrices.

**Figure 4**
Sample-distance matrices constructed with two methods: (i) BPLN, (ii) Bray–Curtis. **(A)** Matrices of pairwise-sample distances and **(B)** the dendrogram constructed from the matrices.

**Figure 5**
Significance tests of JSD values using bootstraps. Each colored arrow indicates the naive JSD values of Figure 3A(i). The light red region indicates the one-sided confidence interval with 99% coverage.

**Figure 6**
Spatial distribution of the local JSD values between EpTN-Thy and WtTN-Thy. The white curves show the contours of the regions with significantly high local JSDs.

**Figure 7**
Relative frequencies of observed amino acids at each position in the contributing sequences of EpTN-Thy.

**Figure 8**
Results of applying our methods to the dataset of the human TCR α-chain CDR3 sequences derived from the peripheral blood T cells adopted from the two healthy donors (HD2 and HD3) and the ten Sézary syndrome patients (P1–P10). **(A)** Dissimilarity matrix of resampled sequences in all samples. **(B)** Embedding results of the dissimilarity matrix in **(A)** by t-SNE, and estimated PDFs by the KDE algorithm. **(C)** Sample-distance matrix with the PDFs in **(B)**. **(D)** The dendrogram constructed from the matrix in **(C)**.

See this image and copyright information in PMC

Cited by

Computational Strategies for Dissecting the High-Dimensional Complexity of Adaptive Immune Repertoires.
Miho E, Yermanos A, Weber CR, Berger CT, Reddy ST, Greiff V. Miho E, et al. Front Immunol. 2018 Feb 21;9:224. doi: 10.3389/fimmu.2018.00224. eCollection 2018. Front Immunol. 2018. PMID: 29515569 Free PMC article. Review.
Deep learning-based prediction of autoimmune diseases.
Yang D, Peng X, Zheng S, Peng S. Yang D, et al. Sci Rep. 2025 Feb 7;15(1):4576. doi: 10.1038/s41598-025-88477-4. Sci Rep. 2025. PMID: 39920178 Free PMC article.
Methods for sequence and structural analysis of B and T cell receptor repertoires.
Teraguchi S, Saputri DS, Llamas-Covarrubias MA, Davila A, Diez D, Nazlica SA, Rozewicki J, Ismanto HS, Wilamowski J, Xie J, Xu Z, Loza-Lopez MJ, van Eerden FJ, Li S, Standley DM. Teraguchi S, et al. Comput Struct Biotechnol J. 2020 Jul 17;18:2000-2011. doi: 10.1016/j.csbj.2020.07.008. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32802272 Free PMC article. Review.
The Bayesian optimist's guide to adaptive immune receptor repertoire analysis.
Olson BJ, Matsen FA 4th. Olson BJ, et al. Immunol Rev. 2018 Jul;284(1):148-166. doi: 10.1111/imr.12664. Immunol Rev. 2018. PMID: 29944760 Free PMC article. Review.
Spatiotemporal Single-Cell Analysis Reveals T Cell Clonal Dynamics and Phenotypic Plasticity in Human Graft-versus-Host Disease.
Shi L, Uzuni A, Wang XK, Pressler M, Harle DW, Chakrabarti S, Macedo R, Belay K, Gordillo CA, Raps E, Zhang JYA, Nazaret A, Fan JL, Jin Y, Shen X, Fuller JS, Azad T, Huang J, Chainani P, Abrams JA, Del Portillo A, Mapara MY, Alhamar M, Sykes M, McFaline-Figueroa JL, Azizi E, Reshef R. Shi L, et al. bioRxiv [Preprint]. 2025 May 28:2025.05.24.655962. doi: 10.1101/2025.05.24.655962. bioRxiv. 2025. PMID: 40501545 Free PMC article. Preprint.

See all "Cited by" articles

References

1. Hou D, Chen C, Seely EJ, Chen S, Song Y. High-throughput sequencing-based immune repertoire study during infectious disease. Front Immunol (2016) 7:336.10.3389/fimmu.2016.00336 - DOI - PMC - PubMed
1. Sims JS, Grinshpun B, Feng Y, Ung TH, Neira JA, Samanamud JL, et al. Diversity and divergence of the glioma-infiltrating T-cell receptor repertoire. Proc Natl Acad Sci U S A (2016) 113(25):E3529–37.10.1073/pnas.1601012113 - DOI - PMC - PubMed
1. Bray JR, Curtis JT. An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr (1957) 27(4):325–49.10.2307/1942268 - DOI
1. Silverman JD, Washburne AD, Mukherjee S, David LA. A phylogenetic transform enhances analysis of compositional microbiota data. eLife (2017) 6:085201.10.7554/eLife.21887 - DOI - PMC - PubMed
1. Tang ZZ, Chen G, Alekseyenko AV. PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances. Bioinformatics (2016) 32(17):2618–25.10.1093/bioinformatics/btw311 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information

Affiliations

Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases