Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 8;50(12):e68.
doi: 10.1093/nar/gkac190.

Stitchr: stitching coding TCR nucleotide sequences from V/J/CDR3 information

Affiliations

Stitchr: stitching coding TCR nucleotide sequences from V/J/CDR3 information

James M Heather et al. Nucleic Acids Res. .

Abstract

The study and manipulation of T cell receptors (TCRs) is central to multiple fields across basic and translational immunology research. Produced by V(D)J recombination, TCRs are often only recorded in the literature and data repositories as a combination of their V and J gene symbols, plus their hypervariable CDR3 amino acid sequence. However, numerous applications require full-length coding nucleotide sequences. Here we present Stitchr, a software tool developed to specifically address this limitation. Given minimal V/J/CDR3 information, Stitchr produces complete coding sequences representing a fully spliced TCR cDNA. Due to its modular design, Stitchr can be used for TCR engineering using either published germline or novel/modified variable and constant region sequences. Sequences produced by Stitchr were validated by synthesizing and transducing TCR sequences into Jurkat cells, recapitulating the expected antigen specificity of the parental TCR. Using a companion script, Thimble, we demonstrate that Stitchr can process a million TCRs in under ten minutes using a standard desktop personal computer. By systematizing the production and modification of TCR sequences, we propose that Stitchr will increase the speed, repeatability, and reproducibility of TCR research. Stitchr is available on GitHub.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic of Stitchr algorithm. (A) Overview of Stitchr modules. Stitchr first obtains germline V gene, J gene, constant region (C), and leader (L) sequences from IMGT/GENE-DB. Next, the junction-spanning sequence is determined, depending on input mode (see B–D), and the complete TCR sequence is assembled. Complete single chain rearrangements can subsequently have arbitrary user-provided sequences appended to the 5′ or 3′ of the TCR, and finally paired chains can be combined (e.g. via a 2A self-cleaving peptide sequence) into a bicistronic single expression sequence. (B) When an amino acid (AA) CDR3 junction sequence is provided, the V and J genes are translated (I), aligned, and ‘deleted’ back from the CDR3-proximal edge until the longest possible overlap with the appropriate side of the junction is found (II), i.e. the longest suffix of the V that matches the prefix of the CDR3, or vice versa for the J. The remaining residues which cannot be encoded by the germline genes are then ‘reverse translated’ using a codon frequency table (III), and the trimmed germline genes and non-templated residues are concatenated. Vertical dotted lines show codons. (C) If provided a nucleotide (NT) CDR3 junction sequence (depicted by bold/capitalized font), the germline genes are again translated (I), as well as the CDR3 sequence (IIa). The amino acid sequences are aligned and the germline contributions to the CDR3 are determined (IIb). The AA sequence is then converted to NT, however instead of assigning codons for the non-templated residues based on a codon usage table, the nucleotides in the provided CDR3 are used (III, bold text indicates retained original NT sequence). (D) If the provided junction sequence includes additional nucleotide sequence context that extends beyond the CDR3 (depicted by lowercase text), the ‘seamless’ (SL) option can be used. In this mode, V and J germline genes are again deleted to the edge of the overlapping NT sequence (vertical dotted lines), allowing Stitchr to seamlessly combine germline V and J with the provided CDR3-spanning sequence (II). (E) Examples of actual Stitchr commands used to run the examples shown in B (AA), C (NT), and D (NT-SL). Note that Stitchr defaults to human TCRs, thus the species flag doesn’t need to be set here. All three options produce a full-length TCR sequence (F) that encodes the same amino acid sequence, with the seamless option reproducing the identical nucleotide sequence (assuming the correct V and J alleles were provided).
Figure 2.
Figure 2.
Validation of Stitchr-generated TCR sequences. Amino acid CDR3 sequences of four TCR heterodimers were used as Stitchr input (A) and stitched output sequences were aligned (B) to the rearranged sequences extracted from the corresponding PDB structures (using ‘ATG’ in place of leaders omitted from the crystallized structures), showing the correct incorporation of junction sequence and constant region. MAG-IC3 ‘α’ sequence indicates Stitchr output using a modified TRAV21*02 gene to replicate the engineered amino acid sequence used in the PDB structure. (C) Functional validation of Stitchr-produced TCR sequences using a Jurkat activation assay. CD8-positive, TCRb-negative Jurkat cells were transduced with one of five different TCRs and co-cultured with peptide pulsed (10, 1, or 0 μg/ml) HLA-matched or mis-matched target cell lines. Data shown are triplicate technical replicates from one experiment and are representative of at least two independent biological repeats.
Figure 3.
Figure 3.
Application of Stitchr to high-throughput TCR datasets using the companion script Thimble. (A) To benchmark the speed of Thimble, large TCR datasets with amino acid CDR3s provided were downloaded either from bulk beta chain TCR-seq datasets (14), or from the curated antigen-associated TCR database VDJdb (15) (processed both all together and by each chain individually). Thimble, the high-throughput interface to Stitchr, was run on these original files (triangle markers), and from files containing 100–1,000,000 TCRs generated by randomly re-sampling these files (dot markers), with each repertoire size randomly produced 3 times. Connecting lines indicate bootstrapped locally weighted linear regressions. (B) Overview of sequence-level Stitchr validation. TCRs with known V/J/CDR3 information and nucleotide sequence were produced by in silico recombination of IMGT-stored germline genes using immuneSIM (I). V/J genes and CDR3 information (taken as exact junctions in nucleotide or amino acid forms, or as nucleotides with additional padding sequences for seamless mode) were input to Stitchr (via Thimble) (II). TCR variable domain sequences produced by Stitchr were then compared against the corresponding parental simulated TCR sequences (III). (C) Run time duration of Thimble applied to 50,000 α and β TCRs generated by immuneSIM, comparing different formats of junction region input: amino acid (AA), nucleotide (NT), nucleotide with padding nucleotides 5′ and 3′ for seamless (SL) integration, either 10, 20, 30, or 200 nt (200 5′, 30 3′). (D) Percentage of TCRs produced by Stitchr for which the variable region (start of V gene to end of J gene) perfectly matched the input sequence generated by immuneSIM, at both the nucleotide (NT, purple) and translated (AA, grey) levels. (E) Histogram of positional mismatches between simulated and stitched sequences for NT and AA junction input modes. Histograms were generated with 111 bins, so each bar corresponds approximately to one codon (given the variable domain length distribution of ∼333 nucleotides, Supplementary Figure S7A).
Figure 4.
Figure 4.
Assessment of Stitchr/Thimble accuracy on high-throughput TCR-seq data. (A) Relative positional mismatches between Thimble-generated sequences and original input sequenced TCRs for different junction inputs: (columns left-to-right): AA, NT, SL (20), SL (200). Top row shows errors when using IMGT-provided TCR germline genes only. Middle row shows the same analysis with expanded Y-axis to highlight the bottom hundredth of the mismatch range. Bottom row shows mismatches upon rerunning Stitchr/Thimble when providing additional novel TCR alleles inferred from the individual donor repertoires. (B) Percentage of all TCRs produced by Stitchr/Thimble that match perfectly to the original input sequences, using the IMGT reference. (C) Percentage of only those TCRs that use a potentially novel inferred V gene allele and agree perfectly between TCR-seq and Stitchr/Thimble output, before (blue) and after (red) including those alleles in the reference dataset, at the nucleotide (left) or amino acid (right) level.

References

    1. Buckley R.H. Molecular defects in human severe combined immunodeficiency and approaches to immune reconstitution. Annu. Rev. Immunol. 2004; 22:625–655. - PubMed
    1. Markert M.L., Hummell D.S., Rosenblatt H.M., Schiff S.E., Harville T.O., Williams L.W., Schiff R.I., Buckley R.H.. Complete digeorge syndrome: persistence of profound immunodeficiency. J. Pediatr. 1998; 132:7. - PubMed
    1. Yin L., Dai S., Clayton G., Gao W., Wang Y., Kappler J., Marrack P.. Recognition of self and altered self by T cells in autoimmunity and allergy. Protein Cell. 2013; 4:8–16. - PMC - PubMed
    1. Yi M., Qin S., Zhao W., Yu S., Chu Q., Wu K.. The role of neoantigen in immune checkpoint blockade therapy. Exp. Hematol. Oncol. 2018; 7:28. - PMC - PubMed
    1. Zappasodi R., Merghoub T., Wolchok J.D.. Emerging concepts for immune checkpoint blockade-based combination therapies. Cancer Cell. 2018; 33:581–598. - PMC - PubMed

Publication types

Substances