Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 20;10(1):20.
doi: 10.1186/s13073-018-0528-3.

BALDR: a computational pipeline for paired heavy and light chain immunoglobulin reconstruction in single-cell RNA-seq data

Affiliations

BALDR: a computational pipeline for paired heavy and light chain immunoglobulin reconstruction in single-cell RNA-seq data

Amit A Upadhyay et al. Genome Med. .

Abstract

B cells play a critical role in the immune response by producing antibodies, which display remarkable diversity. Here we describe a bioinformatic pipeline, BALDR (BCR Assignment of Lineage using De novo Reconstruction) that accurately reconstructs the paired heavy and light chain immunoglobulin gene sequences from Illumina single-cell RNA-seq data. BALDR was accurate for clonotype identification in human and rhesus macaque influenza vaccine and simian immunodeficiency virus vaccine induced vaccine-induced plasmablasts and naïve and antigen-specific memory B cells. BALDR enables matching of clonotype identity with single-cell transcriptional information in B cell lineages and will have broad application in the fields of vaccines, human immunodeficiency virus broadly neutralizing antibody development, and cancer.BALDR is available at https://github.com/BosingerLab/BALDR .

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Two healthy individuals were vaccinated with the 2016 Fluarix quadrivalent seasonal influenza vaccine. Vaccinated individuals who participated in this study provided informed consent in writing in accordance with the protocols approved by the IRB of Emory University IRB#00089789, entitled “sc-RNA-seq for clinical samples.” Peripheral blood CD19+ B cells were obtained from a healthy, unvaccinated individual who provided informed consent and was recruited under the auspices of Emory IRB#00045821, entitled “Phlebotomy of healthy adults for the purpose of evaluation and validation of immune response assays.” These protocols adhere to international guidelines established in the Declaration of Helsinki by the World Medical Association.

All rhesus macaque samples were obtained from animals undergoing vaccine studies housed at the Yerkes National Primate Research Center, which is accredited by the American Association of Accreditation of Laboratory Animal Care. This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health, a national set of guidelines in the USA, and also to international recommendations detailed in the Weatherall Report (2006). This work received prior approval by the Institutional Animal Care and Use Committees (IACUC) of Emory University (IACUC protocol #YER-2002353-061916GA, entitled Center for HIV/AIDS Vaccine Immunology and Immunogen Discovery-Parent Project, and #2000936, entitled B-cell Biology of Mucosal Immune Protection from SIV Challenge. Appropriate procedures were performed to ensure that potential distress, pain, discomfort, and/or injury were limited to that unavoidable in the conduct of the research plan. The sedative ketamine (10 mg/kg) and/or tiletamine/zolazepam (Telazol, 4 mg/kg) was applied as necessary for blood draws, and analgesics were used when determined appropriate by veterinary medical staff.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Experimental design. a A healthy individual was vaccinated with Fluarix Quad 2016–2017 vaccine and after 7 days CD38+ CD27+ plasmablasts were single-cell sorted into 96-well plates using flow cytometry. 10 μL lysates were aliquoted to single-cell RNA-seq (9 μL) and nested RT-PCR (nested RT-PCR (1 μL)) to sequence the immunoglobulin heavy (IgH) and light (IgL) chain genes. b ELISPOT assay of day 7 post-vaccination plasmablasts that shows IgH isotype usage and specificity of the plasmablast population for influenza vaccine. c Bioanalyzer plots of single-cell sequencing libraries after SMART-Seq v4 amplification for a plasmablast and a peripheral blood CD19+ B cell. The peaks in the plasmablast plot match in nt sequence length to the full-length heavy and light chain genes. Ig immunoglobulin gene, IgH immunoglobulin heavy chain gene, IgL immunoglobulin light chain gene
Fig. 2
Fig. 2
Pipeline for immunoglobulin gene reconstruction in human samples. The pipeline used for IgH and IgL gene reconstruction using either all sequencing reads (Unfiltered) or bioinformatically filtered reads (IG_mapped, IG_mapped+Unmapped, Recombinome_mapped, and IMGT_mapped) from sc-RNA-seq data. Details for each filter are described in Methods and in the text. In the initial step, adapter sequences are trimmed from the fastq files using Trimmomatic. Reads are then filtered to enrich those containing partial sequences from the IgH or IgL variable region and constant regions, and to exclude reads mapping to conventional protein coding genes. Filtered (or total) reads are then assembled using the Trinity algorithm without normalization. The assembled transcript models are annotated using IgBLAST. The reads used for assembly are mapped to the assembled transcript models using bowtie2. The models are ranked according to the number of reads mapped. Transcript models that are not productive or have a V(D)J and CDR nucleotide sequence that is the same as a higher ranked model are filtered out. The top model from the remaining set is selected as the putative heavy or light chain
Fig. 3
Fig. 3
De novo reconstruction of sc-RNA-seq data yields a single dominant transcript model for IgH and IgL. The number of sequencing reads mapping to the reconstructed Ig transcript models (IG_mapped+Unmapped method) using bowtie2 quantification are shown for 176 flu vaccine-induced human plasmablasts (AW2-AW3 dataset). a IgH transcript models using Unfiltered reconstruction. b IgL models from Unfiltered reconstruction. c Ratio of reads mapping to the top and second-most abundant transcript models from Unfiltered reconstruction for IgH and IgL. The dashed line indicates a twofold ratio between the top and runner-up models. Red lines represent medians of each dataset
Fig. 4
Fig. 4
Reconstruction of Ig transcripts by BALDR is highly accurate. The fidelity of bioinformatic reconstruction of immunoglobulin variable regions was assessed by sequence comparison to a “gold-standard” sequence obtained independently from an aliquot of the single B cell lysate prior to amplification. a Accuracy, defined as correct identification of clonotype (V(D)J gene segment and CDR3 sequence of NGS-reconstructed IgH and IgL relative to 115 IgH and 140 IgL sequences obtained from nested RT-PCR and Sanger sequencing for all filtering methods. b Clonal distribution of single cells. The cells were assigned into families based on V, J, and CDR3 length of IgH and IgL. c Assessment of NGS-reconstruction fidelity at the nt level. Nucleotide sequences of reconstructed IgH chains determined to be accurate at the clonotype level were compared to matched sequences obtained by Sanger sequencing by blastn alignment. d SHMs in V region compared to germline IMGT sequences
Fig. 5
Fig. 5
Accurate Ig reconstruction in conventional human CD19+ B cells. a Accuracy of Ig reconstruction for peripheral blood total CD19+ B cells (VH dataset) determined by comparison to 31 IgH and 31 IgL sequences obtained from nested RT-PCR and Sanger sequencing. b Somatic hypermutations in V region compared to germline IMGT sequences
Fig. 6
Fig. 6
BALDR maintains accuracy across diverse sequencing parameters. Accuracy of Ig reconstruction for 51 plasmablasts (AW1 dataset) for different sequencing conditions (PE/SE and read lengths of 50, 75, and 101) determined by comparison to 34 IgH (a) and 41 IgL (b) sequences obtained from nested RT-PCR and Sanger sequencing. PE paired end, SE single-end sequencing
Fig. 7
Fig. 7
Ig transcript reconstruction in rhesus macaques with poor immunoglobulin reference annotation. a Pipeline for Ig assembly using unfiltered and filtered approaches (Filter-Non-IG: Discard reads mapping to non-Ig annotated regions of rhesus genome; IG_mapped: select reads mapped to the Ig coordinates and IG_mapped+Unmapped: combine IG_mapped reads and Unmapped reads for assembly). Ig reconstruction was carried out for 42 plasmablasts, 33 memory B cells, and 33 germinal center (GC) B cells. b Concordance of V(D)J gene annotation and CDR3 nucleotide sequence of Filter-Non-IG method with nested RT-PCR sequences from plasmablast and GC B cells

Similar articles

Cited by

References

    1. Teng G, Papavasiliou FN. Immunoglobulin somatic hypermutation. Annu Rev Genet. 2007;41:107–120. doi: 10.1146/annurev.genet.41.110306.130340. - DOI - PubMed
    1. Lefranc M-P, Lefranc G. The immunoglobulin factsbook. London: Academic Press; 2001.
    1. Lefranc M. IMGT® databases, web resources and tools for immunoglobulin and T cell receptor sequence analysis. Leukemia. 2003;17:260. doi: 10.1038/sj.leu.2402637. - DOI - PubMed
    1. Yaari G, Kleinstein SH. Practical guidelines for B-cell receptor repertoire sequencing analysis. Genome Med. 2015;7:121. doi: 10.1186/s13073-015-0243-2. - DOI - PMC - PubMed
    1. Newell EW, Davis MM. Beyond model antigens: high-dimensional methods for the analysis of antigen-specific T cells. Nat Biotechnol. 2014;32:149–157. doi: 10.1038/nbt.2783. - DOI - PMC - PubMed

Publication types