Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 1;23(2):qzaf041.
doi: 10.1093/gpbjnl/qzaf041.

Characterization of Chronic Lymphocytic Leukemia Immunoglobulin Rearrangements from Partial Read Sequencing

Affiliations

Characterization of Chronic Lymphocytic Leukemia Immunoglobulin Rearrangements from Partial Read Sequencing

Azahara Fuentes-Trillo et al. Genomics Proteomics Bioinformatics. .

Abstract

The determination of the mutational status in the immunoglobulin variable region is an established prognostic biomarker for chronic lymphocytic leukemia (CLL). The length and inner variability of the variable, diversity, and joining (VDJ) rearranged sequences compromise B-cell clone characterization using next-generation sequencing (NGS), and a standardization is needed to adapt the procedure to the current clinical guidelines. Here, we develop a complete strategy for sequencing the variable domain of the immunoglobulin heavy chain (IGH) locus with a simple, low-cost, and efficient method that enables sequencing using shorter reads (MiSeq 150 × 2), allowing for faster results. Clonality and mutational status determination are performed within the same analysis pipeline. We tested and validated the method using 319 CLL patients previously diagnosed with IGH locus characterized using Sanger sequencing, along with 47 healthy donor samples. The analysis method follows a clone-centered consensus sequence strategy to identify B-cell clones and establish a clonal threshold specific for each patient's clonality profile, thereby overcoming the limitations of Sanger sequencing which is the gold standard used for determining immunoglobulin heavy variable (IGHV) mutational status.

Keywords: IGH locus; B cell; Chronic lymphocytic leukemia; Immune repertoire; NGS.

PubMed Disclaimer

Conflict of interest statement

The authors have declared no competing interests.

Figures

Graphical Abstract
Graphical Abstract
Figure 1
Figure 1
Workflow for IGH locus characterization in CLL patients A. Sequencing with multiplexed FR primer sets and Illumina MiSeq 2 × 150 bp kit. B. Reconstruction of the region of interest by overlapping FR reads using the in-house pipeline B-MyRepCLL (https://github.com/afuentri/B-MyRepCLL). The pipeline generates a consensus sequence for each B-cell rearrangement followed by filtering steps to minimize artifacts. C. Repertoire structure determination. Automatic KNN classification distinguishes between CLL and healthy repertoires, followed by prioritization of the predominant rearrangements to identify clonal rearrangements (clonal/subclonal cut-off). VDJ, variable, diversity, and joining; KNN, K-nearest neighbors; FR, framework region; CLL, chronic lymphocytic leukemia; IGHV, immunoglobulin heavy chain variable; IGHD, immunoglobulin heavy chain diversity; IGHJ, immunoglobulin heavy chain joining.
Figure 2
Figure 2
Bioinformatics pipeline basis A. Reads from theoretical clonal IG rearrangements in a single sample (different colors represent different B-cell clones) are initially assigned independently to different IGHV@ and IGHJ@ alleles. Dotted lines indicate reads aligned to both V and J alleles. B. In the next step, these reads are used to infer the V–J allele combinations. Reads corresponding to each IGHVIGHJ pairing are isolated and mapped against a joined reference of the specific IGHVIGHJ pair. A consensus sequence is generated for each combination, representing an individual IG rearrangement. These consensus sequences are used to calculate the percentage identity against germline IGHV@ alleles. Asterisks represent the gap for IGHD sequence. Red and gray boxes indicate somatic hypermutation events and the junction sequence which is a gap in the reference alleles combination, respectively. C. CDR3 and IGHD sequence extraction is performed. CDR3 amino acid sequence is retrieved by searching for the conserved amino acid motifs (Cys104, Typ118, and WGXG) in different open reading frames. IGHD@ is detected as an insertion considering the combined sequences of IGHVIGHJ alleles as reference. IG, immunoglobulin; CDR3, complementary determining region 3.
Figure 3
Figure 3
Optimization of the procedure A. Accuracy of KNN classification (k = 1–10) in a random split of the dataset between training and test sets. B. Scatter plot of the KNN test classification with clonal and polyclonal labels. C. F1-micro and F1-macro average accuracy scores for 10-fold cross-validation for KNN classification (k = 1–10). D. Box plot for MAX_DIFF values per sample grouped by polyclonal, 1CLONE, and >1CLONE. After Mann-Whitney U test, Bonferroni-corrected P values are annotated to show differences between group distributions (polyclonal vs. 1CLONE: P = 1.303E−27; polyclonal vs. >1CLONE: P = 5.562E−13; 1CLONE vs. >1CLONE: P = 3.213E−03). ns, not significant (P > 0.05); *, 0.01 < P ≤ 0.05; **, 0.001 < P ≤ 0.01; ***, 0.0001 < P ≤ 0.001; ****, P ≤ 0.0001. The scipy.stats Python module was used to perform the statistical test. MAX_DIFF maximum clonal difference within a sample.
Figure 4
Figure 4
Representation of clonal ratios in polyclonal (healthy), 1CLONE, and 2CLONE groups from the test samples Bars on the positive Y-axis represent the clonal percentage of the different IG rearrangements detected per individual, with different colors per donor. Bars on the negative Y-axis represent clonal percentage ratios between consecutive clones in a sample, ordered by abundance. The maximum clonal ratio is highlighted in red and the remaining clonal ratios in black. A. Polyclonal repertoire (five healthy donors). B. 1CLONE repertoire (24 samples with a single predominant clone). C. 2CLONE repertoire (10 samples with double predominant rearrangements).

References

    1. Chiorazzi N, Chen SS, Rai KR. Chronic lymphocytic leukemia. Cold Spring Harb Perspect Med 2021;11:a035220 - PMC - PubMed
    1. Hamblin TJ, Davis Z, Gardiner A, Oscier DG, Stevenson FK. Unmutated Ig V genes are associated with a more aggressive form of chronic lymphocytic leukemia. Blood 1999;94:1848–54. - PubMed
    1. Damle RN, Wasil T, Fais F, Ghiotto F, Valetto A, Allen SL, et al. Ig V gene mutation status and CD38 expression as novel prognostic indicators in chronic lymphocytic leukemia. Blood 1999;94:1840–7. - PubMed
    1. Ghia P, Scielzo C, Frenquelli M, Muzio M, Caligaris-Cappio F. From normal to clonal B cells: chronic lymphocytic leukemia (CLL) at the crossroad between neoplasia and autoimmunity. Autoimmun Rev 2007;7:127–31. - PubMed
    1. Hallek M, Cheson BD, Catovsky D, Caligaris-Cappio F, Dighiero G, Döhner H, et al. iwCLL guidelines for diagnosis, indications for treatment, response assessment, and supportive management of CLL. Blood 2018;131:2745–60. - PubMed