. 2020 Aug 17;16(8):e1008030.

doi: 10.1371/journal.pcbi.1008030. eCollection 2020 Aug.

A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis

Amrit Dhar^{1

2}, Duncan K Ralph², Vladimir N Minin³, Frederick A Matsen 4th²

Affiliations

¹ Department of Statistics, University of Washington, Seattle, Washington, United States of America.
² Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America.
³ Department of Statistics, University of California, Irvine, California, United States of America.

PMID: 32804924
PMCID: PMC7451993
DOI: 10.1371/journal.pcbi.1008030

A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis

Amrit Dhar et al. PLoS Comput Biol. 2020.

. 2020 Aug 17;16(8):e1008030.

doi: 10.1371/journal.pcbi.1008030. eCollection 2020 Aug.

Authors

Amrit Dhar^{1

2}, Duncan K Ralph², Vladimir N Minin³, Frederick A Matsen 4th²

Affiliations

¹ Department of Statistics, University of Washington, Seattle, Washington, United States of America.
² Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America.
³ Department of Statistics, University of California, Irvine, California, United States of America.

PMID: 32804924
PMCID: PMC7451993
DOI: 10.1371/journal.pcbi.1008030

Abstract

The human body generates a diverse set of high affinity antibodies, the soluble form of B cell receptors (BCRs), that bind to and neutralize invading pathogens. The natural development of BCRs must be understood in order to design vaccines for highly mutable pathogens such as influenza and HIV. BCR diversity is induced by naturally occurring combinatorial "V(D)J" rearrangement, mutation, and selection processes. Most current methods for BCR sequence analysis focus on separately modeling the above processes. Statistical phylogenetic methods are often used to model the mutational dynamics of BCR sequence data, but these techniques do not consider all the complexities associated with B cell diversification such as the V(D)J rearrangement process. In particular, standard phylogenetic approaches assume the DNA bases of the progenitor (or "naive") sequence arise independently and according to the same distribution, ignoring the complexities of V(D)J rearrangement. In this paper, we introduce a novel approach to Bayesian phylogenetic inference for BCR sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique not only integrates a naive rearrangement model with a phylogenetic model for BCR sequence evolution but also naturally accounts for uncertainty in all unobserved variables, including the phylogenetic tree, via posterior distribution sampling.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Naive sequence validation plot.**
The hamming distances between the simulated naive DNA sequences and their corresponding linearham, partis, and ARPP estimates versus the tree imbalance values of the simulated trees. Linear regression lines are superimposed for each method to indicate how the results vary as trees get more imbalanced. For reference, we plot the tree imbalance values for the PC64 and VRC01 trees.

**Fig 2. Intermediate ancestral sequence validation plot.**
The positive predictive values and the true positive rates versus the tree imbalance values of the simulated trees, stratified by decision boundary ρ. Positive predictive values and true positive rates are computed on the DNA sequences and for the linearham, RevBayes, and dnaml programs. Linear regression lines are superimposed for each package to indicate how the results vary as trees get more imbalanced. For reference, we plot the tree imbalance values for the PC64 and VRC01 trees (vertical dashed lines).

**Fig 3. Naive sequence posterior probability logos.**
The linearham-inferred (top) and ARPP-inferred (middle) amino acid naive sequence posterior probability logos for (a) the pruned PC64 dataset of 100 sequences and (b) the trimmed VRC01 alignment of 268 sequences. We also display the empirical sequence logo (bottom) for each dataset and highlight the inferred CDR3 regions (black lines).

**Fig 4. Naive-to-tip sequence trajectory graphics.**
The linearham-inferred naive-to-tip amino acid sequence trajectories for the pruned PC64 dataset of 100 sequences and the trimmed VRC01 alignment of 268 sequences, displaying only the edges that satisfy the given posterior probability threshold, and only the nodes that contact edges above the threshold. The tip sequences of interest for the PC64 and VRC01 datasets are chosen to be PCT64-35M and NIH45-46, respectively, and we use 0.04 probability cutoffs for these lineage graphics (such that any edge with probability less than this threshold is discarded). The nodes correspond to unique ancestral sequences filled with red color, where the opacity is proportional to the posterior probability of the associated sequence. The directed edges connecting nodes represent ancestral sequence transitions and are shaded blue with an opacity proportional to the posterior probability of the associated sequence transition. Nodes without any probable edges connecting them are not displayed in these graphics. The absence of many nodes for VRC01 indicates that these naive-to-tip sequence trajectories are highly uncertain. A more detailed version of this graphic, including predicted lineage mutations, is included as S1 Fig.

**Fig 5. Model overview.**
(A) A schematic representation of the naive rearrangement process from [11]. First, V (green), D (orange), and J (purple) genes are randomly selected from the respective gene pools in the body. Then, nucleotides are randomly deleted (red X’s) from both ends of the V-D and D-J junction regions and random bases (blue) are added to the same junction regions before the V, D, and J germline genes can be joined together. The BCR sequences can be partitioned into framework (FWK) and complementarity-determining (CDR) regions. (B) Our Bayesian phylo-HMM jointly models V(D)J recombination at the root of the tree (using an HMM) and then subsequent diversification (via a phylogenetic tree). We do posterior inference conditioning on the observed sequence alignment in a clonal family, but not on a fixed inferred naive sequence.

**Fig 6. A graphical model representation of our phylo-HMM for an example alignment with m = 3 sequences and n = 3 sites.**
The τ, t, π, and e nodes represent the 4-tip unrooted tree topology, the associated 5 branch lengths, the GTR exchangeability rates, and GTR equilibrium base frequencies, respectively. The parameter α denotes the gamma shape parameter associated with the K-class discrete gamma distribution, which is used to model phylogenetic rate variation among sites; r symbolizes the vector of K discrete rates that is deterministically induced by α. The set of nodes $r^{*} = {r_{(1)}^{*}, r_{(2)}^{*}, r_{(3)}^{*}}$ defines the rates that are drawn from r at each particular site. The $Y_{naive} = {Y_{naive}^{(1)}, Y_{naive}^{(2)}, Y_{naive}^{(3)}}$ “hidden state” node collection represents the Markov process that stochastically generates the naive sequence in our phylo-HMM. The node sets ${Y_{i}^{(j)}}_{i = 1 : 2, j = 1 : 3}$ and $D = {D_{i}^{(j)}}_{i = 1 : 3, j = 1 : 3}$ denote the internal nodes of τ excluding the naive sequence Y_naive and the observed MSA, respectively. We draw plates around the $Y_{int}^{(j)}$ and D^(j) node sets for j ∈ {1, 2, 3} to indicate that any directed edges touching a plate apply to all nodes in the plate (except for edges that originate from t, which apply element-wise to the nodes in the plate).

See this image and copyright information in PMC

Cited by

Rationalizing Random Walks: Replicating Protective Antibody Trajectories.
Remmel JL, Ackerman ME. Remmel JL, et al. Trends Immunol. 2021 Mar;42(3):186-197. doi: 10.1016/j.it.2021.01.001. Epub 2021 Jan 26. Trends Immunol. 2021. PMID: 33514459 Free PMC article. Review.
Phylogenetic analysis of migration, differentiation, and class switching in B cells.
Hoehn KB, Pybus OG, Kleinstein SH. Hoehn KB, et al. PLoS Comput Biol. 2022 Apr 25;18(4):e1009885. doi: 10.1371/journal.pcbi.1009885. eCollection 2022 Apr. PLoS Comput Biol. 2022. PMID: 35468128 Free PMC article.
Inference of B cell clonal families using heavy/light chain pairing information.
Ralph DK, Matsen FA 4th. Ralph DK, et al. PLoS Comput Biol. 2022 Nov 28;18(11):e1010723. doi: 10.1371/journal.pcbi.1010723. eCollection 2022 Nov. PLoS Comput Biol. 2022. PMID: 36441808 Free PMC article.
Ecological Processes Shaping Microbiomes of Extremely Low Birthweight Infants.
Zioutis C, Seki D, Bauchinger F, Herbold C, Berger A, Wisgrill L, Berry D. Zioutis C, et al. Front Microbiol. 2022 Feb 28;13:812136. doi: 10.3389/fmicb.2022.812136. eCollection 2022. Front Microbiol. 2022. PMID: 35295290 Free PMC article.
Adaptive Immune Receptor Repertoire (AIRR) Community Guide to Repertoire Analysis.
Marquez S, Babrak L, Greiff V, Hoehn KB, Lees WD, Luning Prak ET, Miho E, Rosenfeld AM, Schramm CA, Stervbo U; AIRR Community. Marquez S, et al. Methods Mol Biol. 2022;2453:297-316. doi: 10.1007/978-1-0716-2115-8_17. Methods Mol Biol. 2022. PMID: 35622333 Free PMC article.

See all "Cited by" articles

References

1. Mascola JR, Haynes BF. HIV-1 neutralizing antibodies: understanding nature’s pathways. Immunological Reviews. 2013;254(1):225–244. 10.1111/imr.12075 - DOI - PMC - PubMed
1. Stamatatos L, Pancera M, McGuire AT. Germline-targeting immunogens. Immunological Reviews. 2017;275(1):203–216. 10.1111/imr.12483 - DOI - PMC - PubMed
1. Liao HX, Lynch R, Zhou T, Gao F, Alam SM, Boyd SD, et al. Co-evolution of a broadly neutralizing HIV-1 antibody and founder virus. Nature. 2013;496(7446):469 10.1038/nature12053 - DOI - PMC - PubMed
1. Doria-Rose NA, Schramm CA, Gorman J, Moore PL, Bhiman JN, DeKosky BJ, et al. Developmental pathway for potent V1V2-directed HIV-neutralizing antibodies. Nature. 2014;509(7498):55 10.1038/nature13036 - DOI - PMC - PubMed
1. Doria-Rose NA, Bhiman JN, Roark RS, Schramm CA, Gorman J, Chuang GY, et al. New Member of the V1V2-Directed CAP256-VRC26 Lineage That Shows Increased Breadth and Exceptional Potency. Journal of Virology. 2016;90(1):76 10.1128/JVI.01791-15 - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

R01 GM113246/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis

Affiliations

A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources