Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 28:3:176.
doi: 10.3389/fimmu.2012.00176. eCollection 2012.

Immunoglobulin analysis tool: a novel tool for the analysis of human and mouse heavy and light chain transcripts

Affiliations

Immunoglobulin analysis tool: a novel tool for the analysis of human and mouse heavy and light chain transcripts

Tobias Rogosch et al. Front Immunol. .

Abstract

Sequence analysis of immunoglobulin (Ig) heavy and light chain transcripts can refine categorization of B cell subpopulations and can shed light on the selective forces that act during immune responses or immune dysregulation, such as autoimmunity, allergy, and B cell malignancy. High-throughput sequencing yields Ig transcript collections of unprecedented size. The authoritative web-based IMGT/HighV-QUEST program is capable of analyzing large collections of transcripts and provides annotated output files to describe many key properties of Ig transcripts. However, additional processing of these flat files is required to create figures, or to facilitate analysis of additional features and comparisons between sequence sets. We present an easy-to-use Microsoft(®) Excel(®) based software, named Immunoglobulin Analysis Tool (IgAT), for the summary, interrogation, and further processing of IMGT/HighV-QUEST output files. IgAT generates descriptive statistics and high-quality figures for collections of murine or human Ig heavy or light chain transcripts ranging from 1 to 150,000 sequences. In addition to traditionally studied properties of Ig transcripts - such as the usage of germline gene segments, or the length and composition of the CDR-3 region - IgAT also uses published algorithms to calculate the probability of antigen selection based on somatic mutational patterns, the average hydrophobicity of the antigen-binding sites, and predictable structural properties of the CDR-H3 loop according to Shirai's H3-rules. These refined analyses provide in-depth information about the selective forces acting upon Ig repertoires and allow the statistical and graphical comparison of two or more sequence sets. IgAT is easy to use on any computer running Excel(®) 2003 or higher. Thus, IgAT is a useful tool to gain insights into the selective forces and functional properties of small to extremely large collections of Ig transcripts, thereby assisting a researcher to mine a data set to its fullest.

Keywords: antibody repertoire; deep sequencing; high-throughput analysis; immunoglobulin heavy chain gene; immunoglobulin light chain gene; rearrangement; sequence analysis software; somatic mutation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Screenshot of the “input” worksheet.
Figure 2
Figure 2
Screenshot of the “summary” worksheet of IgAT.
Figure 3
Figure 3
The “VDJ” worksheet contains graphs displaying the relative utilization of VH families (A), DH families (B), and JH gene segments (C) as well as individual V gene segments (D) and DH gene segments (E).
Figure 4
Figure 4
The graphs in the “CDR-3_length” (positions 105–117) worksheet display the length distribution of CDR-H3 (A), N1 (B), N2 (C), and deconstruction graphs for CDR-H3 with (D) or without (E) identifiable DH gene segment. Lengths are given in nucleotides.
Figure 5
Figure 5
(A) Somatic mutation frequency of Ig transcripts (mutations per 1000 nt). Each data point represents the somatic mutation frequency of one sequence. (B) Inference of Ag selection in Ig transcripts. Shown is the ratio of replacement mutations in CDR-H1 and CDR-H2 (RCDR) to the total number of mutations in the V region (MV) plotted against MV. The dark shaded area represents the 90% confidence limits and the light gray shaded area the 95% confidence limits for the probability of random mutations. A data point falling outside these confidence limits represents a sequence that has a high proportion of replacement mutations in the CDR. The probability that such a sequence has accumulated as many replacement mutations in the CDR by mere random mutation is p = 0.1 and p = 0.05, respectively. An allocation above the upper confidence limit was considered indicative of Ag selection. Data points are accompanied by their observed frequency. 6.5% (α = 0.05) of the sequences were Ag selected (α = 0.1: 9.6%).
Figure 6
Figure 6
Graphic output of the analysis of amino acid frequencies and variability, using as an example the CDR-H3 sequences with the length of 12 amino acids (positions 105–117, n = 2572). (A) The Shannon entropy for each position in the CDR-H3 region (the higher the score the more variable the position in terms of amino acids). (B) Relative amino acid frequencies at the positions 105–117 for CDR-H3 region. Each bar represents 100% of the amino acid residues found at this specific position. The amino acid residues are stacked in the order of their hydrophobicity according to a normalized Kyte–Doolittle Index (Eisenberg, 1984). Charged amino acid residues are at the bottom, and hydrophobic amino acid residues at the top of each bar as presented previously (Zemlin et al., 2003). (C) The Kabat–Wu variability for each position in the CDR-3 region (the higher the score the more variable the position). (D) Overall amino acid frequencies within the CDR-3 loop (positions 107–114).
Figure 7
Figure 7
Amino acid frequencies of the CDR-H3 loop for all unique sequences (positions 107–114).
Figure 8
Figure 8
Distribution of average CDR-H3 loop hydrophobicities according to a normalized Kyte–Doolittle scale (positions 107–114; Eisenberg, 1984).
Figure 9
Figure 9
Reading frame utilization given as percent of all unique sequences with identifiable DH gene segment. The DH reading frames are defined according to the nomenclature of Ichihara et al. (1989).
Figure 10
Figure 10
Predicted structural features of the CDR-3 according to the “H3-rules” by Shirai et al. (1999). (K−, kinked base; K+, extra kinked base; K−/+, kinked or extra kinked base; E, extended base; hp def K−, deformed hairpin in sequences with kinked base; hp def K+, deformed hairpin in sequences with extra kinked base; hp def K−/+, deformed hairpin in sequences with kinked and extra kinked base; H lad K−, intact hydrogen bond ladder in sequences with kinked base; H lad K+, intact hydrogen bond ladder in sequences with extra kinked base; H lad K−/+, intact hydrogen bond ladder in sequences with kinked and extra kinked base).

Similar articles

Cited by

References

    1. Ademokun A., Wu Y. C., Martin V., Mitra R., Sack U., Baxendale H., Kipling D., Dunn-Walters D. K. (2011). Vaccination-induced changes in human B-cell repertoire and pneumococcal IgM and IgA antibody at different ages. Aging Cell 10, 922–930 - PMC - PubMed
    1. Alamyar E., Giudicelli V., Li S., Duroux P., Lefranc M. P. (2012). IMGT/HighV-QUEST: the IMGT® web portal for immunoglobulin (IG) or antibody and T cell receptor (TR) analysis from NGS high throughput and deep sequencing. Immunome Res. 8, 26. - PubMed
    1. Arnaout R., Lee W., Cahill P., Honan T., Sparrow T., Weiand M., Nusbaum C., Rajewsky K., Koralov S. B. (2011). High-resolution description of antibody heavy-chain repertoires in humans. PLoS ONE 6, e22365.10.1371/journal.pone.0022365 - DOI - PMC - PubMed
    1. Benichou G., Yamada Y., Yun S. H., Lin C., Fray M., Tocco G. (2011). Immune recognition and rejection of allogeneic skin grafts. Immunotherapy 3, 757–77010.2217/imt.11.2 - DOI - PMC - PubMed
    1. Berek C., Griffiths G. M., Milstein C. (1985). Molecular events during maturation of the immune response to oxazolone. Nature 316, 412–41810.1038/316412a0 - DOI - PubMed

LinkOut - more resources