Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 1;40(17):e134.
doi: 10.1093/nar/gks457. Epub 2012 May 27.

Quantifying selection in high-throughput Immunoglobulin sequencing data sets

Affiliations

Quantifying selection in high-throughput Immunoglobulin sequencing data sets

Gur Yaari et al. Nucleic Acids Res. .

Abstract

High-throughput immunoglobulin sequencing promises new insights into the somatic hypermutation and antigen-driven selection processes that underlie B-cell affinity maturation and adaptive immunity. The ability to estimate positive and negative selection from these sequence data has broad applications not only for understanding the immune response to pathogens, but is also critical to determining the role of somatic hypermutation in autoimmunity and B-cell cancers. Here, we develop a statistical framework for Bayesian estimation of Antigen-driven SELectIoN (BASELINe) based on the analysis of somatic mutation patterns. Our approach represents a fundamental advance over previous methods by shifting the problem from one of simply detecting selection to one of quantifying selection. Along with providing a more intuitive means to assess and visualize selection, our approach allows, for the first time, comparative analysis between groups of sequences derived from different germline V(D)J segments. Application of this approach to next-generation sequencing data demonstrates different selection pressures for memory cells of different isotypes. This framework can easily be adapted to analyze other types of DNA mutation patterns resulting from a mutator that displays hot/cold-spots, substitution preference or other intrinsic biases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
BASELINe. (a) Summary of the basic work flow. (b and d) Posterior distributions for the frequency of replacement mutations (π) for hypothetical sequences with the indicated number of replacement (x) and total mutations (N). The shaded area indicates the fraction of the distribution that exceeds the expected frequency (formula image). (c and e) The posterior distributions that result after transforming to the Σ-space quantifying selection strength for the same sequences in [b] and [d] respectively.
Figure 2.
Figure 2.
Fitting the hyperparameters of the β prior. The observed and expected selection strengths are compared for different choices of the hyperparameters for the β prior for (a) N = 1 and (b) N = 10. In both cases formula image.
Figure 3.
Figure 3.
The interval of optimal estimation depends on formula image. The hyperparameters for the Bayesian prior were estimated for each value of N (N = 10 here) at formula image by fitting within the shaded region (b). Although the hyperparameters remain fixed, the interval of optimal estimation (shaded) will shift for different values of formula image [0.25 in (a) and 0.75 in (c)].
Figure 4.
Figure 4.
Simulation-based validation of BASELINe. Ten thousand mutated sequences were generated using a sequence-based simulation starting from the IGHV3-23 germline segment. The mean estimated selection strength obtained by BASELINe was recorded for each sequence. (a) The mean of these values along with the 50 and 95% confidence intervals. (b) Tighter 95% confidence intervals are obtained by aggregating data from groups of G = 1,2,4,8 or 16 sequences.
Figure 5.
Figure 5.
Applications of BASELINe to estimate selection strength from real data. (a) Posterior probability distributions for Ig sequences from two mice strains with moderate (B1-8) or low (V23) initial affinity for the immunizing antigen at different days post-immunization (10 and 16) (19). (b and c) Posterior probability distributions for different memory cell subsets (b) or the three most frequent IGHV families (c) for data in (2). The top half of each plot shows the estimated selection strength in the CDR, whereas the bottom part provides an estimate for FWR.

References

    1. Boyd SD, Marshall EL, Merker JD, Maniar JM, Zhang LN, Sahaf B, Jones CD, Simen BB, Hanczaruk B, Nguyen KD, et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci. Transl. Med. 2009;1:12ra23. - PMC - PubMed
    1. Wu YC, Kipling D, Leong HS, Martin V, Ademokun AA, Dunn-Walters DK. High throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory b cell populations. Blood. 2010;116 - PMC - PubMed
    1. Jiang N, Weinstein JA, Penland L, White RA, Fisher DS, Quake SR. Determinism and stochasticity during maturation of the zebrafish antibody repertoire. Proc. Natl. Acad. Sci. USA. 2011;108:5348–5353. - PMC - PubMed
    1. Longo NS, Lipsky PE. Why do b cells mutate their immunoglobulin receptors? Trends Immunol. 2006;27:374–380. - PubMed
    1. Shlomchik MJ, Marshak-Rothstein A, Wolfowicz CB, Rothstein TL, Weigert MG. The role of clonal selection and somatic mutation in autoimmunity. Nature. 1987;328:805–811. - PubMed

Publication types

Substances