Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 1;26(7):867-72.
doi: 10.1093/bioinformatics/btq056. Epub 2010 Feb 9.

SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements

Affiliations

SoDA2: a Hidden Markov Model approach for identification of immunoglobulin rearrangements

Supriya Munshaw et al. Bioinformatics. .

Abstract

Motivation: The inference of pre-mutation immunoglobulin (Ig) rearrangements is essential in the study of the antibody repertoires produced in response to infection, in B-cell neoplasms and in autoimmune disease. Often, there are several rearrangements that are nearly equivalent as candidates for a given Ig gene, but have different consequences in an analysis. Our aim in this article is to develop a probabilistic model of the rearrangement process and a Bayesian method for estimating posterior probabilities for the comparison of multiple plausible rearrangements.

Results: We have developed SoDA2, which is based on a Hidden Markov Model and used to compute the posterior probabilities of candidate rearrangements and to find those with the highest values among them. We validated the software on a set of simulated data, a set of clonally related sequences, and a group of randomly selected Ig heavy chains from Genbank. In most tests, SoDA2 performed better than other available software for the task. Furthermore, the output format has been redesigned, in part, to facilitate comparison of multiple solutions.

Availability: SoDA2 is available online at https://hippocrates.duhs.duke.edu/soda. Simulated sequences are available upon request.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The basic topology of the HMM for (a) heavy chains and (b) kappa and lambda chains. The HMM starts at the last base of the invariant cysteine of all high-likelihood V segments, runs through all DH segments in the case of heavy chains, and through all high-likelihood J segments till the first base of the invariant tryptophan or phenylalanine.
Fig. 2.
Fig. 2.
Distributions for (a) VH gene recombination site choice, (b) n nucleotides in the VD junction, (c) 5 DH recombination site choice, (d) 3 DH recombination site choice, (e) n nucleotides in the DJ junction, (f) 5 JH recombination site choice. All the data is fit to negative binomial distributions with varying parameters derived from Jackson et al. (2004). These parameters are used for transition probabilities in the HMM.
Fig. 3.
Fig. 3.
Shows a detailed topology of the HMM with all possible transitions. Each nucleotide in the observed sequence is treated as a separate state. The transition probabilities are derived from empirical data (Jackson et al., 2004). The star denotes the start (third position of invariant cystiene) of the HMM and the + denotes the end (first position of invariant tryptophan/phenylalanine). The I and D in every state stand for insertions and deletions, respectively.
Fig. 4.
Fig. 4.
(a) Top rearrangement as chosen by SoDA2 with a higher mutation frequency than the alternative, shown in (b). The different rearrangements represent a trade-off between mutation frequency and number of n nucleotides.
Fig. 5.
Fig. 5.
The alignment of CDR3H of sequence by 1154693 using IGHD1-21*01 by (a) SoDA2, (b) IMGT/V-QUEST, JOINSOLVER and iHHMune, (c) SoDA. Rearrangements (b) and (c) were also provided by SoDA2 at a slightly lower probability.

References

    1. Altschul SF, et al. Basic local alignment tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Basu M, et al. Synthesis of compositionally unique DNA by terminal deoxynucleotidyl transferase. Biochem. Biophys. Res. Commun. 1983;111:1105–1112. - PubMed
    1. Bridges SL. Frequent N addition and clonal relatedness among immunoglobulin lambda light chains expressed in rheumatoid arthritis synovia and PBL, and the influence of V lambda gene segment utilization on CDR3 length. Mol. Med. 1998;4:525–553. - PMC - PubMed
    1. Cowell LG, et al. Enhanced evolvability in immunoglobulin V genes under somatic hypermutation. J. Mol. Evol. 1999;49:23–26. - PubMed
    1. Desiderio SV, et al. Insertion of N regions into heavy-chain genes is correlated with the expression of terminal deoxytransferase in B-cells. Nature. 1984;311:752–757. - PubMed

Publication types

Substances