Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug 25;10(1):68.
doi: 10.1186/s13073-018-0577-7.

Exploring the pre-immune landscape of antigen-specific T cells

Affiliations

Exploring the pre-immune landscape of antigen-specific T cells

Mikhail V Pogorelyy et al. Genome Med. .

Abstract

Background: Adaptive immune responses to newly encountered pathogens depend on the mobilization of antigen-specific clonotypes from a vastly diverse pool of naive T cells. Using recent advances in immune repertoire sequencing technologies, models of the immune receptor rearrangement process, and a database of annotated T cell receptor (TCR) sequences with known specificities, we explored the baseline frequencies of T cells specific for defined human leukocyte antigen (HLA) class I-restricted epitopes in healthy individuals.

Methods: We used a database of TCR sequences with known antigen specificities and a probabilistic TCR rearrangement model to estimate the baseline frequencies of TCRs specific to distinct antigens epitopespecificT-cells. We verified our estimates using a publicly available collection of TCR repertoires from healthy individuals. We also interrogated a database of immunogenic and non-immunogenic peptides is used to link baseline T-cell frequencies with epitope immunogenicity.

Results: Our findings revealed a high degree of variability in the prevalence of T cells specific for different antigens that could be explained by the physicochemical properties of the corresponding HLA class I-bound peptides. The occurrence of certain rearrangements was influenced by ancestry and HLA class I restriction, and umbilical cord blood samples contained higher frequencies of common pathogen-specific TCRs. We also identified a quantitative link between specific T cell frequencies and the immunogenicity of cognate epitopes presented by defined HLA class I molecules.

Conclusions: Our results suggest that the population frequencies of specific T cells are strikingly non-uniform across epitopes that are known to elicit immune responses. This inference leads to a new definition of epitope immunogenicity based on specific TCR frequencies, which can be estimated with a high degree of accuracy in silico, thereby providing a novel framework to integrate computational and experimental genomics with basic and translational research efforts in the field of T cell immunology.

Keywords: Antigen; Immune repertoire; Immunogenicity; T cell receptor.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

N/A

Consent for publication

N/A

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Estimating baseline T cell frequencies using a VDJ rearrangement model. a Schematic description of the TCRβ baseline frequency estimator. CDR3 sequences were sampled from the pre-trained probabilistic model of Murugan et al. for each VJ segment combination, translated, and matched to a given CDR3 sequence (allowing at most one amino acid substitution, see the “Methods” section) to estimate its theoretical rearrangement probability. Resulting probabilities were corrected for the sample-specific VJ segment frequency profile. b The observed (Y-axis) versus estimated (X-axis) rearrangement frequencies for 6853 human TCR sequences with known antigen specificities selected from VDJdb in 786 immune repertoire samples from Emerson et al. containing 151,020,646 unique rearrangements (identical TCRβ nucleotide sequences observed in different donors were counted as distinct). Observed frequencies were computed as the total number of unique rearrangements encoding a given CDR3 amino acid sequence in the pooled dataset (with at most one substitution) divided by the total number of unique rearrangements. The red line displays the linear model fit for log-transformed frequencies. c Density plot showing the probability of rearranging the same nucleotide sequence in different individuals versus the theoretical rearrangement probability for VDJdb TCR variants (amino acid sequences). The red curve displays the smoothing fit
Fig. 2
Fig. 2
Rearrangement probabilities and population frequencies of TCRs specific for different antigens. a Estimated rearrangement probabilities for TCRs specific for 33 different HLA class I-restricted epitopes. Only epitopes associated with at least 30 different TCR amino acid sequences were selected from VDJdb (n = 5623 TCRs). The distribution of theoretical rearrangement probabilities is shown using violin plots; red dots indicate the median rearrangement probabilities. The variance of specific TCR frequencies across different epitopes is highly significant (P < 10−27, ANOVA for log probabilities). b As in a, but the TCR sequences are grouped by epitope origin. The difference in rearrangement probabilities among epitopes grouped by origin is also highly significant (P < 10−11, ANOVA for log probabilities). c Fractions of clonotypes specific for different epitopes showing population frequencies of 5–9%, 10–14%, 15–19%, or 20%+ in 786 immune repertoire samples from Emerson et al. d As in c, but grouped by epitope origin
Fig. 3
Fig. 3
Epitope features that affect the rearrangement probabilities of specific TCRs. a Population frequencies of TCRs specific for epitopes of different length, net partial specific volume (sixth Kidera factor), and net surrounding hydrophobicity (tenth Kidera factor). Fractions of public clonotypes (found in 5%+ of samples) are shown with population frequencies as in Fig. 2b. The association and correlation between these features and the theoretical rearrangement probabilities is highly significant: PANOVA = 10−8, Pcorr = 4 × 10−6 for length; PANOVA = 8 × 10−9, Pcorr = 10−6 for partial specific volume; PANOVA = 4 × 10−10, Pcorr = 4 × 10−8 for surrounding hydrophobicity (P values were corrected for multiple testing using the Benjamini–Hochberg method). Only epitope lengths of 8 to 11 amino acids were considered in the first subplot, as other lengths were represented by fewer than 30 TCRs. Partial specific volume and surrounding hydrophobicity were categorized into four quantiles (Q1 to Q4, from smallest to largest standardized value) according to their levels among VDJdb epitopes. See main text for details of feature selection. b CDR3 length distributions for epitope lengths of 8 to 11 amino acids. c Density plot of rearrangement probabilities for VDJdb TCRs with different CDR3 lengths. d, e Projection of epitope and CDR3 structures on a plane passing through the line connecting their C- and N-terminal residues and the center of mass of all Cα atoms. Longer epitope and CDR3 sequences result in more bulged structures. Data were obtained from a manually curated list of 125 PDB structures [https://github.com/antigenomics/tcr-pmhc-study]. f Schematic representation of the association between CDR3 and epitope lengths and the potential consequences for TCR cross-reactivity and specificity
Fig. 4
Fig. 4
HLA-mediated selection of TCRs and epitope-specific clonal expansions. a Box and swarm plots show the distributions of ratios of the observed and expected numbers of rearrangements for different combinations of donor HLAs (according to genotypes from Emerson et al.) and HLAs associated with specific TCRs (according to epitope restrictions from VDJdb). Each dot represents the ratio of the total number of TCR rearrangements specific for epitopes restricted by a given HLA and the expected number of TCR rearrangements, computed with the assumption of independence between TCR restriction and donor HLA (see insert with formula). Red dots indicate matches between donor HLAs and rearranged TCRs. The inset box plot shows observed to expected ratios for matched and mismatched HLAs (**, P = 0.004, Mann–Whitney U test). Only HLA alleles present in at least 30 immune repertoire samples with at least 100 associated TCR sequences in VDJdb were selected. b Log10-transformed P values for VDJdb TCR enrichments in groups of samples with different HLAs (computed using a hypergeometric test comparing the number of times a given TCR was found in samples with and without a certain HLA). Left panel: enrichment P values plotted against rearrangement probabilities for sample groups that either do (red dots) or do not (black dots) have an HLA matching a given TCR (P > 10−4 shown with density plot). Right panel: the same data with epitopes grouped by source. P values were adjusted for multiple testing using the Benjamini–Hochberg method (TCRs with Padjusted > 0.05 were filtered out). c Distribution of the log2 read frequency ratios of CMV-specific clonotypes in HLA-matched and HLA-mismatched samples from CMV-seronegative (CMV, red), CMV-seropositive (CMV+, blue), and CMV-indeterminate donors (Unknown, green). As in previous panels, HLA matching indicates the presence of at least one HLA corresponding to the restriction element for a given TCR. All three distributions are significantly different: P = 6 × 10−11 for CMV-seropositive versus CMV-seronegative donors; P = 4 × 10−4 for CMV-seropositive versus CMV-indeterminate donors; P = 8 × 10−13 for CMV-seronegative versus CMV-indeterminate donors; Kolmogorov–Smirnov test. d Numbers of EBV-specific clonotypes constituting higher or lower fractions of reads in HLA-matched versus HLA-mismatched samples. Only HLA alleles associated with EBV-specific clonotypes according to VDJdb are shown (HLA-B*44 was discarded, as it was represented by just three sequences). Error bars show 95% confidence intervals (binomial distribution)
Fig. 5
Fig. 5
Specific T cell frequencies at baseline correlate with epitope immunogenicity profiles. a Principal component analysis of epitope space for immunogenic and non-immunogenic epitopes from Chowell et al. Dimensionality reduction was performed on 10-dimensional vectors of Kidera factor sums for each epitope, and the first two principal components were used to plot each epitope into a 2D plane using the Euclidean distance between Kidera factor vectors. The density map shows the overall epitope repertoire space. Red and blue contour maps show densities for immunogenic and non-immunogenic epitopes, respectively. b Correlation of median theoretical rearrangement probabilities of TCRs specific for certain epitopes and T-scores for the Euclidean distance of each VDJdb epitope to the immunogenic and non-immunogenic epitopes computed in Kidera factor space (R = 0.35, P = 0.039). T-scores were computed by comparing distances from a given epitope to immunogenic versus non-immunogenic epitopes. Only epitopes with more than 30 associated TCRs were selected from VDJdb. c A schematic representation of the algorithm used to transform categorical representation of immunogenicity (yes/no for data from Chowell et al., and yes/unknown for VDJdb epitopes) into a continuous set of probability values using an immunogenicity classifier to enable a correlation analysis between immunogenicity and TCR repertoire structure. d Correlation of median theoretical rearrangement probabilities of TCRs specific for certain epitopes and the probability of a given epitope being immunogenic as estimated using an expectation maximization classifier (R = 0.37, P = 0.031). e Cumulative distribution function plot for median rearrangement probabilities predicted for immunogenic and non-immunogenic epitopes using a simple linear model based on Kidera factor sums. The difference in predicted values for all data from Chowell et al. is highly significant (P < 2 × 10−16, Kolmogorov–Smirnov test)
Fig. 6
Fig. 6
Epitope-specific TCRα-TCRβ heterodimer frequencies can be estimated using TCRβ clonotype frequencies. a, b Matching paired TCRα-TCRβ sequencing data (PairSEQ assay, Howie et al.) against VDJdb. a Scatter plot of TCRα and TCRβ chain rearrangement frequencies matching a given epitope. b Product of marginal frequencies of TCRα and TCRβ chain rearrangements (i.e., TCR heterodimer frequencies assuming independent pairing) plotted against the frequencies of paired-chain records matching the same epitopes. Mean frequencies were computed as follows: (number of matching rearrangements)/(number of records in VDJdb for a given epitope)/(total number of rearrangements in the PairSEQ dataset). c As in a, but using TCRα and TCRβ frequencies estimated using the TCR rearrangement model. d As in b, but using TCRα and TCRβ frequencies estimated using the TCR rearrangement model. e Conditions required to estimate baseline T cell frequencies using TCRβ rearrangement frequencies alone. f Scatter plot of the mean theoretical rearrangement probabilities for TCRβ chain and paired TCRα-TCRβ chain rearrangements matching a given epitope. Epitopes lacking paired TCRα-TCRβ sequences, as well as epitopes represented by less than 30 TCRα or TCRβ sequences according to VDJdb, were excluded from the analysis. This figure uses 3-letter epitope abbreviations (see Additional file 1: Table S3 for full epitope names)

References

    1. Benichou J, Ben-Hamo R, Louzoun Y, Efroni S. Rep-Seq: uncovering the immunological repertoire through next-generation sequencing. Immunology. 2012;135:183–191. doi: 10.1111/j.1365-2567.2011.03527.x. - DOI - PMC - PubMed
    1. Shugay M, Bagaev DV, Zvyagin IV, Vroomans RM, Crawford JC, Dolton G, Komech EA, Sycheva AL, Koneva AE, Egorov ES, et al. VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic Acids Res. 2017;46:D419–D427. doi: 10.1093/nar/gkx760. - DOI - PMC - PubMed
    1. Britanova OV, Shugay M, Merzlyak EM, Staroverov DB, Putintseva EV, Turchaninova MA, Mamedov IZ, Pogorelyy MV, Bolotin DA, Izraelson M, et al. Dynamics of individual T cell repertoires: from cord blood to centenarians. J Immunol. 2016;196:5005–5013. doi: 10.4049/jimmunol.1600005. - DOI - PubMed
    1. Greiff V, Miho E, Menzel U, Reddy ST. Bioinformatic and statistical analysis of adaptive immune repertoires. Trends Immunol. 2015;36:738–749. doi: 10.1016/j.it.2015.09.006. - DOI - PubMed
    1. Heather JM, Ismail M, Oakes T, Chain B. High-throughput sequencing of the T-cell receptor repertoire: pitfalls and opportunities. Brief Bioinform. 2018;19(4):554-65. - PMC - PubMed

Publication types

MeSH terms