Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Oct 2:2023.09.29.560187.
doi: 10.1101/2023.09.29.560187.

Inferring B cell phylogenies from paired heavy and light chain BCR sequences with Dowser

Affiliations

Inferring B cell phylogenies from paired heavy and light chain BCR sequences with Dowser

Cole G Jensen et al. bioRxiv. .

Update in

Abstract

Antibodies are vital to human immune responses and are composed of genetically variable heavy and light chains. These structures are initially expressed as B cell receptors (BCRs). BCR diversity is shaped through somatic hypermutation and selection during immune responses. This evolutionary process produces B cell clones, cells that descend from a common ancestor but differ by mutations. Phylogenetic trees inferred from BCR sequences can reconstruct the history of mutations within a clone. Until recently, BCR sequencing technologies separated heavy and light chains, but advancements in single cell sequencing now pair heavy and light chains from individual cells. However, it is unclear how these separate genes should be combined to infer B cell phylogenies. In this study, we investigated strategies for using paired heavy and light chain sequences to build phylogenetic trees. We found incorporating light chains significantly improved tree accuracy and reproducibility across all methods tested. This improvement was greater than the difference between tree building methods and persisted even when mixing bulk and single cell sequencing data. However, we also found that many phylogenetic methods estimated significantly biased branch lengths when some light chains were missing, such as when mixing single cell and bulk BCR data. This bias was eliminated using maximum likelihood methods with separate branch lengths for heavy and light chain gene partitions. Thus, we recommend using maximum likelihood methods with separate heavy and light chain partitions, especially when mixing data types. We implemented these methods in the R package Dowser: https://dowser.readthedocs.io.

PubMed Disclaimer

Conflict of interest statement

Competing interests: S.H.K. receives consulting fees from Peraton. K.B.H. receives consulting fees from Prellis Biologics. The remaining authors have no competing interests.

Figures

Figure 1.
Figure 1.
Comparison of heavy and light chain somatic hypermutation (SHM) among datasets. Each point shows the mean SHM frequency along the heavy and light chain V-gene for each clone. This is shown for a dataset of subjects with COVID-19 (left), a subject who recently received an influenza vaccination (middle), and 20 simulated datasets (right). For empirical datasets, only clones with at least three or five (COVID-19 and influenza datasets, respectively) distinct sequences were included. The line partitioning each plot indicates equally mutated heavy and light chains. Marginal distributions for heavy or light chain SHM are shown outside of each plot.
Figure 2.
Figure 2.
The accuracy and reproducibility of tree topology estimates are improved using paired heavy and light chains. A) Robinson-Foulds (RF) cluster distance between estimated and true tree topologies for trees built using only the heavy chain (H) and paired heavy and light chain sequences (H+L) using a representative maximum parsimony, single-partition maximum likelihood, and multi-partition models. Additionally, the right column shows the average RF distance for all methods. B) Mean bootstrap values for trees built using each dataset. P values were calculated using a Wilcoxon test. For comparisons with all methods, see Supplementary Fig. 1.
Figure 3.
Figure 3.
Effect of missing light chains on tree topology and branch length estimates. A) Average RF cluster distance for trees built using paired heavy and light chain sequences (H+L, left) with an incrementally greater proportion of light chains missing. The top right panel shows the mean RF distance for trees built using only the heavy chain (H). B) Mean branch length error of trees built using H+L sequences (left) and H sequences (right). Points show the means of all 20 simulation replicates.
Figure 4.
Figure 4.
Branch length estimates from many methods are affected by missing light chains. A) Schematic of simulation strategy. Heavy chains were simulated with approximately twice as many mutations as light chains. For single cell (SC) data, heavy and light (H+L) sequences were retained for all three tips. For mixed single cell and bulk (SC+bulk) data, the light chain of tip B was removed. B) For each replicate, branch lengths were estimated using different methods and compared to the true branch length. In SC+bulk data, most methods incorrectly estimated the length of branch 3, which immediately precedes tip B.
Figure 5.
Figure 5.
Choice in tree building method significantly impacts conclusions with mixed single cell and bulk data. Data were obtained from blood (48.40% bulk BCR, 51.60% single cell) and germinal center (GC, 100% single cell). Mean divergence (sum of branch lengths from germline to each tip) for blood and GC sequences in shared clones estimated using different methods are shown. Only data from clones sampled in both the GC at the specified time point as well as blood at day 5, are shown. Above each plot is a representative tree. For heavy chain only comparisons, see Supplementary Fig. 2.

References

    1. Murphy K., Weaver C. & Berg L. Janeway’s Immunobiology. (2022).
    1. Victora G. D. & Nussenzweig M. C. Germinal Centers. Annu. Rev. Immunol. 30, 429–457 (2012). - PubMed
    1. Jiang R. et al. Thymus-derived B cell clones persist in the circulation after thymectomy in myasthenia gravis. Proc. Natl. Acad. Sci. (2020) doi:10.1073/pnas.2007206117. - DOI - PMC - PubMed
    1. Wu X. et al. Maturation and Diversity of the VRC01-Antibody Lineage over 15 Years of Chronic HIV-1 Infection. Cell 161, 470–485 (2015). - PMC - PubMed
    1. Yaari G. & Kleinstein S. H. Practical guidelines for B-cell receptor repertoire sequencing analysis. Genome Med. 7, (2015). - PMC - PubMed

Publication types