Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 12;16(3):e1007714.
doi: 10.1371/journal.pcbi.1007714. eCollection 2020 Mar.

Comprehensive analysis of structural and sequencing data reveals almost unconstrained chain pairing in TCRαβ complex

Affiliations

Comprehensive analysis of structural and sequencing data reveals almost unconstrained chain pairing in TCRαβ complex

Dmitrii S Shcherbinin et al. PLoS Comput Biol. .

Abstract

Antigen recognition by T-cells is guided by the T-cell receptor (TCR) heterodimer formed by α and β chains. A huge diversity of TCR sequences should be maintained by the immune system in order to be able to mount an effective response towards foreign pathogens, so, due to cooperative binding of α and β chains to the pathogen, any constraints on chain pairing can have a profound effect on immune repertoire structure, diversity and antigen specificity. By integrating available structural data and paired chain sequencing results we were able to show that there are almost no constraints on pairing in TCRαβ complexes, allowing naive T-cell repertoire to reach the highest possible diversity. Additional analysis reveals that the specific choice of contacting amino acids can still have a profound effect on complex conformation. Moreover, antigen-driven selection can distort the uniform landscape of chain pairing, while small, yet significant, differences in the pairing can be attributed to various specialized T-cell subsets such as MAIT and iNKT T-cells, as well as other TCR sets specific to certain antigens.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Contact map of TCRαβ complexes.
A heatmap of inter-chain residue contact frequencies observed in n = 131 human and n = 39 mouse TCR:peptide:MHC complexes. Residue pairs having a distance between closest atoms of less than 5Å in at least one complex were considered in the analysis. Contact frequency was estimated by counting the number of times a given residue pair has a Cα distance of less than 15Å in PDB structures. CDR regions are shown with dashed lines, excluded middle portion of CDR3 is shown with dotted line.
Fig 2
Fig 2. Variability and contact frequency of TCR chain residues.
a. Contact frequency (red bars) and information content (blue line) across α and β chains. For human, transparent blue line shows information content computed from residue frequencies weighted by corresponding V and J gene usage in donor repertoires. Dashed lines show FR-CDR boundaries. b. Scatter plot represents the probability of being involved in inter-chain contact (x axis) and information content of amino acid frequency distribution (y axis) for individual α and β chain residues. Region with a selection of frequently contacted residues (x > 0.75) that are have variable amino acid content (y < 0.75) is highlighted with a red background. Residues with more information content should be considered as less variable, residues having no inter-chain contacts are not shown.
Fig 3
Fig 3. Amino acid statistics at contacting residues.
A two dimensional density plot comparing the number of times a given amino acid combination is observed at an inter-chain contact versus the number of times it was expected to be observed nE. The expected count nE is calculated using amino acid frequency distributions at separate chains and assuming random amino acid pairing; higher nO / nE ratio suggests enrichment of a given amino acid pair at corresponding contacting residues. The number of contacting residue pairs observed with certain nO and nE values (density of points at a given bin) is highlighted by color. Dotted lines show 95% confidence interval for the nO / nE ratio assuming Normal distribution with standard deviation computed from Binomial distribution approximated by nE and the total number of observations nT ≈ 3x105.
Fig 4
Fig 4. Bayesian network (BN) of TCRαβ complex residues.
a. The graph of BN built with separately learned inter-chain contacts (shown in S4 Fig) whitelisted and residues that are not contacting according to contact frequency thresholding blacklisted. b. A density plot showing correlation between log-likelihood (LL) of BN for paired chains (y axis, computed using the network in a.) and sum of LLs of individual α and β chains for ith clonotype from PairSEQ dataset. In order to compute individual chain LLs two independent networks were built by removing inter-chain edges and separating α and β residue components of the BN.
Fig 5
Fig 5. Contacting residues define mutual orientation of chains in TCRαβ complex.
a. A schematic definition of angles between α and β chains. Principal axes (xα, yα, zα) and (xβ, yβ, zβ) of both TCR chains are computed using the inertia tensor of all atoms of a given chain with the exception of constant domain atoms (top panel); representative orientation of principal axes in real TCR:pMHC complex are shown. Euler angles φ1,2,3 are then computed by superimposing chain centers of mass and computing angles between α and β principal axes (bottom panel). Illustrations were adapted from Wikimedia Commons (https://commons.wikimedia.org/wiki/File:63-T-CellReceptor-MHC.tif by David Goodsell and https://commons.wikimedia.org/wiki/File:Eulerangles.svg by Lionel Brits). b. Testing association between amino acid type (see Methods section and panel d. insert for amino acid cluster definition) and inter-chain angles. Point size shows ANOVA F-score for association between amino acid type and each of three Euler angles across TCR alpha and beta chain positions. The testing is performed for a non-redundant set of TCR chain orientations: all PDB structures with the same VαJαVβJβ are collapsed into a single observation with mean φ1, φ2 and φ3 angles to prevent biases from several complexes with the same TCR. Red circles and labels show contact positions where a significant association between amino acid content and inter-chain angle is present, determined as P < 0.05 (adjusted for multiple testing). c. Representative distribution of φ3 angle values for each amino acid type at α57 position. d. Visualization of all PDB structures aligned to a single representative TCR beta chain. TCR alpha chains are colored according to amino acid type at α57 position.
Fig 6
Fig 6. Log-likelihood (LL) distributions for TCRαβ pairs with known antigen specificity.
Cumulative distribution functions of αβ pair LLs computed according to the model shown in Fig 4A. The plot shows n = 1388 real TCRαβ pairs from the VDJdb database (red curves), as well as pairs shuffled within groups specific to the same epitope (orange curves) and pairs shuffled across the entire dataset (dashed black curve). For the latter case, n = 10,000 pairs were selected at random for each epitope pair with re-sampling allowed in order to balance the dataset. Labels above panels are the cognate epitope sequences. Significant differences (Kolmogorov-Smirnov test P-value less than 0.05) between real and shuffled distributions are observed for CINGVCWTV (Kolmogorov-Smirnov test D-value = 0.25, P = 6x10-4), ELAGIGILTV (D = 0.23, P = 2x10-2), GILGFVFTL (D = 0.45, P < 10−15) and GLCTLVAML epitopes (D = 0.26, P = 3x10-8).
Fig 7
Fig 7. Characteristic residue contacts of MAIT TCRs.
a. Scatter-plot of amino acid pair enrichment at contacting residues for the Jα gene choice of MAIT T-cells versus overall enrichment observed for given contact residues in the PairSEQ dataset. Y axis shows the log ratio of amino acid pair probabilities for VαJαVβ combinations corresponding to MAIT T-cells and those with a free choice for the Jα gene. X axis shows observed to expected amino acid pair count ratio at contacting residues in the PairSEQ dataset. Residue pairs with enriched amino acid pairs coming from MAIT Jα gene choice (y > 0.25, corresponding to ~19% increase in frequency) are colored in red and labeled. Not that overall enrichment for corresponding amino acid pairs in the PairSEQ dataset is relatively moderate (x < 0.125, corresponding to ~9% increase in frequency). b.-d. Structural data showing Glutamine (GLN, Q) at α108 and Tyrosine (TYR, Y) at β55 interacting with an Arginine (ARG) of MHC alpha-1 helix domain in the MAIT:MR1 complex. MR1 complex structures are shown in b (4pj7) and c (5u1r), d shows an non-MAIT TCR (having the same amino acids at α108 and β55) in complex with MHCI (4jry). Polar contacts between GLN:ARG and TYR:ARG are shown with dotted lines in b and c, but are absent in d. PDB structure chain coloring: green for MHC, yellow for TCRα and pink for TCRβ; antigen peptide in d is shown with purple.
Fig 8
Fig 8. Exploring invariant TCR using enrichment analysis of VαJαVβJβ gene combinations.
a. Scatterplot showing enrichment of certain TCR gene trios (unique combinations of three of four TCR germline genes, either JαVβJβ, VαVβJβ, VαJαJβ or VαJαVβ) in the PairSEQ dataset. Logarithm of the ratio of the observed and expected counts for all possible gene trios is plotted against their observed count. Expected count is calculated under the assumption of random αβ pairing as (count of α part alone) x (count of β part) / (total number of reads). Points are colored by the P-value of the hypergeometric enrichment test for the co-occurrence of α and β parts of the gene trio (adjusted for multiple comparisons using Holm method). Canonical MAIT (TRAV1-2, TRAJ12/20/33, TRBV6-4) and iNKT (TRAV10, TRAJ18, TRBV25-1) variants are highlighted with corresponding labels. Only gene trios supported by at least 10 reads are shown. Pink circle highlights the Va13 Ja56 Vb10-3 population. b. Grouping of selected TCR gene trios (having adjusted P < 0.05 for enrichment test and represented by at least 10 reads) according to overlap between their VαJαVβJβ gene sets. The plot shows the layout of the resulting graph of gene trios (nodes), having edges connecting pairs of nodes with exactly matching gene sets (missing genes, e.g. Vα in JαVβJβ, are considered as wildcards). Nodes of the graph are represented by points and are colored according to the connected component (cluster) of the network they were assigned to. Cluster ID is a combination of most frequent gene names in co-clustered trios. c. CDR3 spectratyping and motifs for the Va13 Ja56 Vb10-3 population. Top plots show distribution of CDR3 alpha (left) and beta (right) chains of the population compared to all PairSEQ TCRs rearranged with corresponding alpha or beta segments, note that only a single dominant length is present for both alpha and beta. Bottom plots show sequence logos of corresponding CDR3 lengths in the population.

References

    1. Mora T, Walczak AM. Quantifying lymphocyte receptor diversity. bioRxiv. 2016; 046870. 10.1101/046870 - DOI
    1. Rossjohn J, Gras S, Miles JJ, Turner SJ, Godfrey DI, McCluskey J. T Cell Antigen Receptor Recognition of Antigen-Presenting Molecules. Annu Rev Immunol. 2015;33: 169–200. 10.1146/annurev-immunol-032414-112334 - DOI - PubMed
    1. Sewell AK. Why must T cells be cross-reactive? Nat Rev Immunol. 2012;12: 669–677. 10.1038/nri3279 - DOI - PMC - PubMed
    1. Arstila TP, Casrouge A, Baron V, Even J, Kanellopoulos J, Kourilsky P. A direct estimate of the human alphabeta T cell receptor diversity. Science. 1999;286: 958–961. 10.1126/science.286.5441.958 - DOI - PubMed
    1. Howie B, Sherwood AM, Berkebile AD, Berka J, Emerson RO, Williamson DW, et al. High-throughput pairing of T cell receptor α and β sequences. Sci Transl Med. 2015;7: 301ra131 10.1126/scitranslmed.aac5624 - DOI - PubMed

Publication types