Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 17;16(8):e1008030.
doi: 10.1371/journal.pcbi.1008030. eCollection 2020 Aug.

A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis

Affiliations

A Bayesian phylogenetic hidden Markov model for B cell receptor sequence analysis

Amrit Dhar et al. PLoS Comput Biol. .

Abstract

The human body generates a diverse set of high affinity antibodies, the soluble form of B cell receptors (BCRs), that bind to and neutralize invading pathogens. The natural development of BCRs must be understood in order to design vaccines for highly mutable pathogens such as influenza and HIV. BCR diversity is induced by naturally occurring combinatorial "V(D)J" rearrangement, mutation, and selection processes. Most current methods for BCR sequence analysis focus on separately modeling the above processes. Statistical phylogenetic methods are often used to model the mutational dynamics of BCR sequence data, but these techniques do not consider all the complexities associated with B cell diversification such as the V(D)J rearrangement process. In particular, standard phylogenetic approaches assume the DNA bases of the progenitor (or "naive") sequence arise independently and according to the same distribution, ignoring the complexities of V(D)J rearrangement. In this paper, we introduce a novel approach to Bayesian phylogenetic inference for BCR sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique not only integrates a naive rearrangement model with a phylogenetic model for BCR sequence evolution but also naturally accounts for uncertainty in all unobserved variables, including the phylogenetic tree, via posterior distribution sampling.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Naive sequence validation plot.
The hamming distances between the simulated naive DNA sequences and their corresponding linearham, partis, and ARPP estimates versus the tree imbalance values of the simulated trees. Linear regression lines are superimposed for each method to indicate how the results vary as trees get more imbalanced. For reference, we plot the tree imbalance values for the PC64 and VRC01 trees.
Fig 2
Fig 2. Intermediate ancestral sequence validation plot.
The positive predictive values and the true positive rates versus the tree imbalance values of the simulated trees, stratified by decision boundary ρ. Positive predictive values and true positive rates are computed on the DNA sequences and for the linearham, RevBayes, and dnaml programs. Linear regression lines are superimposed for each package to indicate how the results vary as trees get more imbalanced. For reference, we plot the tree imbalance values for the PC64 and VRC01 trees (vertical dashed lines).
Fig 3
Fig 3. Naive sequence posterior probability logos.
The linearham-inferred (top) and ARPP-inferred (middle) amino acid naive sequence posterior probability logos for (a) the pruned PC64 dataset of 100 sequences and (b) the trimmed VRC01 alignment of 268 sequences. We also display the empirical sequence logo (bottom) for each dataset and highlight the inferred CDR3 regions (black lines).
Fig 4
Fig 4. Naive-to-tip sequence trajectory graphics.
The linearham-inferred naive-to-tip amino acid sequence trajectories for the pruned PC64 dataset of 100 sequences and the trimmed VRC01 alignment of 268 sequences, displaying only the edges that satisfy the given posterior probability threshold, and only the nodes that contact edges above the threshold. The tip sequences of interest for the PC64 and VRC01 datasets are chosen to be PCT64-35M and NIH45-46, respectively, and we use 0.04 probability cutoffs for these lineage graphics (such that any edge with probability less than this threshold is discarded). The nodes correspond to unique ancestral sequences filled with red color, where the opacity is proportional to the posterior probability of the associated sequence. The directed edges connecting nodes represent ancestral sequence transitions and are shaded blue with an opacity proportional to the posterior probability of the associated sequence transition. Nodes without any probable edges connecting them are not displayed in these graphics. The absence of many nodes for VRC01 indicates that these naive-to-tip sequence trajectories are highly uncertain. A more detailed version of this graphic, including predicted lineage mutations, is included as S1 Fig.
Fig 5
Fig 5. Model overview.
(A) A schematic representation of the naive rearrangement process from [11]. First, V (green), D (orange), and J (purple) genes are randomly selected from the respective gene pools in the body. Then, nucleotides are randomly deleted (red X’s) from both ends of the V-D and D-J junction regions and random bases (blue) are added to the same junction regions before the V, D, and J germline genes can be joined together. The BCR sequences can be partitioned into framework (FWK) and complementarity-determining (CDR) regions. (B) Our Bayesian phylo-HMM jointly models V(D)J recombination at the root of the tree (using an HMM) and then subsequent diversification (via a phylogenetic tree). We do posterior inference conditioning on the observed sequence alignment in a clonal family, but not on a fixed inferred naive sequence.
Fig 6
Fig 6. A graphical model representation of our phylo-HMM for an example alignment with m = 3 sequences and n = 3 sites.
The τ, t, π, and e nodes represent the 4-tip unrooted tree topology, the associated 5 branch lengths, the GTR exchangeability rates, and GTR equilibrium base frequencies, respectively. The parameter α denotes the gamma shape parameter associated with the K-class discrete gamma distribution, which is used to model phylogenetic rate variation among sites; r symbolizes the vector of K discrete rates that is deterministically induced by α. The set of nodes r*={r(1)*,r(2)*,r(3)*} defines the rates that are drawn from r at each particular site. The Ynaive={Ynaive(1),Ynaive(2),Ynaive(3)} “hidden state” node collection represents the Markov process that stochastically generates the naive sequence in our phylo-HMM. The node sets {Yi(j)}i=1:2,j=1:3 and D={Di(j)}i=1:3,j=1:3 denote the internal nodes of τ excluding the naive sequence Ynaive and the observed MSA, respectively. We draw plates around the Yint(j) and D(j) node sets for j ∈ {1, 2, 3} to indicate that any directed edges touching a plate apply to all nodes in the plate (except for edges that originate from t, which apply element-wise to the nodes in the plate).

Similar articles

Cited by

References

    1. Mascola JR, Haynes BF. HIV-1 neutralizing antibodies: understanding nature’s pathways. Immunological Reviews. 2013;254(1):225–244. 10.1111/imr.12075 - DOI - PMC - PubMed
    1. Stamatatos L, Pancera M, McGuire AT. Germline-targeting immunogens. Immunological Reviews. 2017;275(1):203–216. 10.1111/imr.12483 - DOI - PMC - PubMed
    1. Liao HX, Lynch R, Zhou T, Gao F, Alam SM, Boyd SD, et al. Co-evolution of a broadly neutralizing HIV-1 antibody and founder virus. Nature. 2013;496(7446):469 10.1038/nature12053 - DOI - PMC - PubMed
    1. Doria-Rose NA, Schramm CA, Gorman J, Moore PL, Bhiman JN, DeKosky BJ, et al. Developmental pathway for potent V1V2-directed HIV-neutralizing antibodies. Nature. 2014;509(7498):55 10.1038/nature13036 - DOI - PMC - PubMed
    1. Doria-Rose NA, Bhiman JN, Roark RS, Schramm CA, Gorman J, Chuang GY, et al. New Member of the V1V2-Directed CAP256-VRC26 Lineage That Shows Increased Breadth and Exceptional Potency. Journal of Virology. 2016;90(1):76 10.1128/JVI.01791-15 - DOI - PMC - PubMed

Publication types