Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 15:8:1500.
doi: 10.3389/fimmu.2017.01500. eCollection 2017.

Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information

Affiliations

Quantification of Inter-Sample Differences in T-Cell Receptor Repertoires Using Sequence-Based Information

Ryo Yokota et al. Front Immunol. .

Abstract

Inter-sample comparisons of T-cell receptor (TCR) repertoires are crucial for gaining a better understanding of the immunological states determined by different collections of T cells from different donor sites, cell types, and genetic and pathological backgrounds. For quantitative comparison, most previous studies utilized conventional methods in ecology, which focus on TCR sequences that overlap between pairwise samples. Some recent studies attempted another approach that is categorized into Poisson abundance models using the abundance distribution of observed TCR sequences. However, these methods ignore the details of the measured sequences and are consequently unable to identify sub-repertoires that might have important contributions to the observed inter-sample differences. Moreover, the sparsity of sequence data due to the huge diversity of repertoires hampers the performance of these methods, especially when few overlapping sequences exist. In this paper, we propose a new approach for REpertoire COmparison in Low Dimensions (RECOLD) based on TCR sequence information, which can estimate the low-dimensional structure by embedding the pairwise sequence dissimilarities in high-dimensional sequence space. The inter-sample differences between repertoires are then quantified by information-theoretic measures among the distributions of data estimated in the embedded space. Using datasets of mouse and human TCR repertoires, we demonstrate that RECOLD can accurately identify the inter-sample hierarchical structures, which have a good correspondence with our intuitive understanding about sample conditions. Moreover, for the dataset of transgenic mice that have strong restrictions on the diversity of their repertoires, our estimated inter-sample structure was consistent with the structure estimated by previous methods based on abundance or overlapping sequence information. For the dataset of human healthy donors and Sézary syndrome patients, our method also showed robust estimation performance even under the condition of high sparsity in TCR sequences, while previous studies failed to estimate the structure. In addition, we identified the sequences that contribute to the pairwise-sample differences between the repertoires with the different genetic backgrounds of mice. Such identification of the sequences contributing to variation in immune cell repertoires may provide substantial insight for the development of new immunotherapies and vaccines.

Keywords: Jensen–Shannon divergence; T cell; TCR repertoire; inter-repertoire comparison; manifold learning; pairwise sequence alignment; sequence dissimilarity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Dissimilarity matrices and their embedded distributions with 10 different score matrices: (A) PAM and (B) BLOSUM. The upper and lower panels show the dissimilarity matrices and projection maps in two-dimensional space, respectively. All of the rows and columns in each dissimilarity matrix were sorted according to the sum of their elements. The colors of points in the lower panels of (A,B) correspond to the clusters in PAM250 and BLOSUM45 that were discriminated by k-means algorithms (k = 7).
Figure 2
Figure 2
Dimensional reduction with four different dimensionality-reduction methods: (A) t-SNE, (B) MDS, (C) ISOMAP, and (D) SE. Panel (i) includes the points of the total unique sequences observed in all samples. Panel (ii) includes only the portions of sequences that were observed in each sample. “Ep” and “Wt” denote two different genetic backgrounds of mice. “TN” and “TR” denote naive and regulatory T cells. “Thy” and “Per” denote the thymus and peripheral lymph nodes, respectively. For example, EpTN-Thy denotes the naive T cells that were collected from the thymus in the “Ep” mice.
Figure 2
Figure 2
Dimensional reduction with four different dimensionality-reduction methods: (A) t-SNE, (B) MDS, (C) ISOMAP, and (D) SE. Panel (i) includes the points of the total unique sequences observed in all samples. Panel (ii) includes only the portions of sequences that were observed in each sample. “Ep” and “Wt” denote two different genetic backgrounds of mice. “TN” and “TR” denote naive and regulatory T cells. “Thy” and “Per” denote the thymus and peripheral lymph nodes, respectively. For example, EpTN-Thy denotes the naive T cells that were collected from the thymus in the “Ep” mice.
Figure 3
Figure 3
JSD matrices and their clustering results with four different methods: (i) t-SNE, (ii) MSD, (iii) ISOMAP, and (iv) SE. (A) Matrices of pairwise-sample differences and (B) the dendrogram constructed from the matrices.
Figure 3
Figure 3
JSD matrices and their clustering results with four different methods: (i) t-SNE, (ii) MSD, (iii) ISOMAP, and (iv) SE. (A) Matrices of pairwise-sample differences and (B) the dendrogram constructed from the matrices.
Figure 4
Figure 4
Sample-distance matrices constructed with two methods: (i) BPLN, (ii) Bray–Curtis. (A) Matrices of pairwise-sample distances and (B) the dendrogram constructed from the matrices.
Figure 5
Figure 5
Significance tests of JSD values using bootstraps. Each colored arrow indicates the naive JSD values of Figure 3A(i). The light red region indicates the one-sided confidence interval with 99% coverage.
Figure 6
Figure 6
Spatial distribution of the local JSD values between EpTN-Thy and WtTN-Thy. The white curves show the contours of the regions with significantly high local JSDs.
Figure 7
Figure 7
Relative frequencies of observed amino acids at each position in the contributing sequences of EpTN-Thy.
Figure 8
Figure 8
Results of applying our methods to the dataset of the human TCR α-chain CDR3 sequences derived from the peripheral blood T cells adopted from the two healthy donors (HD2 and HD3) and the ten Sézary syndrome patients (P1–P10). (A) Dissimilarity matrix of resampled sequences in all samples. (B) Embedding results of the dissimilarity matrix in (A) by t-SNE, and estimated PDFs by the KDE algorithm. (C) Sample-distance matrix with the PDFs in (B). (D) The dendrogram constructed from the matrix in (C).

Similar articles

Cited by

References

    1. Hou D, Chen C, Seely EJ, Chen S, Song Y. High-throughput sequencing-based immune repertoire study during infectious disease. Front Immunol (2016) 7:336.10.3389/fimmu.2016.00336 - DOI - PMC - PubMed
    1. Sims JS, Grinshpun B, Feng Y, Ung TH, Neira JA, Samanamud JL, et al. Diversity and divergence of the glioma-infiltrating T-cell receptor repertoire. Proc Natl Acad Sci U S A (2016) 113(25):E3529–37.10.1073/pnas.1601012113 - DOI - PMC - PubMed
    1. Bray JR, Curtis JT. An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr (1957) 27(4):325–49.10.2307/1942268 - DOI
    1. Silverman JD, Washburne AD, Mukherjee S, David LA. A phylogenetic transform enhances analysis of compositional microbiota data. eLife (2017) 6:085201.10.7554/eLife.21887 - DOI - PMC - PubMed
    1. Tang ZZ, Chen G, Alekseyenko AV. PERMANOVA-S: association test for microbial community composition that accommodates confounders and multiple distances. Bioinformatics (2016) 32(17):2618–25.10.1093/bioinformatics/btw311 - DOI - PMC - PubMed