. 2012 Sep 1;40(17):e134.

doi: 10.1093/nar/gks457. Epub 2012 May 27.

Quantifying selection in high-throughput Immunoglobulin sequencing data sets

Gur Yaari¹, Mohamed Uduman, Steven H Kleinstein

Affiliations

PMID: 22641856
PMCID: PMC3458526
DOI: 10.1093/nar/gks457

Quantifying selection in high-throughput Immunoglobulin sequencing data sets

Gur Yaari et al. Nucleic Acids Res. 2012.

. 2012 Sep 1;40(17):e134.

doi: 10.1093/nar/gks457. Epub 2012 May 27.

Authors

Gur Yaari¹, Mohamed Uduman, Steven H Kleinstein

Affiliation

¹ Department of Pathology, Yale University School of Medicine, New Haven, CT 06520, USA.

PMID: 22641856
PMCID: PMC3458526
DOI: 10.1093/nar/gks457

Abstract

High-throughput immunoglobulin sequencing promises new insights into the somatic hypermutation and antigen-driven selection processes that underlie B-cell affinity maturation and adaptive immunity. The ability to estimate positive and negative selection from these sequence data has broad applications not only for understanding the immune response to pathogens, but is also critical to determining the role of somatic hypermutation in autoimmunity and B-cell cancers. Here, we develop a statistical framework for Bayesian estimation of Antigen-driven SELectIoN (BASELINe) based on the analysis of somatic mutation patterns. Our approach represents a fundamental advance over previous methods by shifting the problem from one of simply detecting selection to one of quantifying selection. Along with providing a more intuitive means to assess and visualize selection, our approach allows, for the first time, comparative analysis between groups of sequences derived from different germline V(D)J segments. Application of this approach to next-generation sequencing data demonstrates different selection pressures for memory cells of different isotypes. This framework can easily be adapted to analyze other types of DNA mutation patterns resulting from a mutator that displays hot/cold-spots, substitution preference or other intrinsic biases.

PubMed Disclaimer

Figures

**Figure 1.**
BASELINe. (a) Summary of the basic work flow. (b and d) Posterior distributions for the frequency of replacement mutations (π) for hypothetical sequences with the indicated number of replacement (x) and total mutations (N). The shaded area indicates the fraction of the distribution that exceeds the expected frequency (). (c and e) The posterior distributions that result after transforming to the Σ-space quantifying selection strength for the same sequences in [b] and [d] respectively.

formula image — **Figure 1.**
BASELINe. (a) Summary of the basic work flow. (b and d) Posterior distributions for the frequency of replacement mutations (π) for hypothetical sequences with the indicated number of replacement (x) and total mutations (N). The shaded area indicates the fraction of the distribution that exceeds the expected frequency (). (c and e) The posterior distributions that result after transforming to the Σ-space quantifying selection strength for the same sequences in [b] and [d] respectively.

**Figure 2.**
Fitting the hyperparameters of the β prior. The observed and expected selection strengths are compared for different choices of the hyperparameters for the β prior for (a) N = 1 and (b) N = 10. In both cases .

**Figure 3.**
The interval of optimal estimation depends on . The hyperparameters for the Bayesian prior were estimated for each value of N (N = 10 here) at by fitting within the shaded region (b). Although the hyperparameters remain fixed, the interval of optimal estimation (shaded) will shift for different values of [0.25 in (a) and 0.75 in (c)].

**Figure 4.**
Simulation-based validation of BASELINe. Ten thousand mutated sequences were generated using a sequence-based simulation starting from the IGHV3-23 germline segment. The mean estimated selection strength obtained by BASELINe was recorded for each sequence. (a) The mean of these values along with the 50 and 95% confidence intervals. (b) Tighter 95% confidence intervals are obtained by aggregating data from groups of G = 1,2,4,8 or 16 sequences.

**Figure 5.**
Applications of BASELINe to estimate selection strength from real data. (a) Posterior probability distributions for Ig sequences from two mice strains with moderate (B1-8) or low (V23) initial affinity for the immunizing antigen at different days post-immunization (10 and 16) (19). (b and c) Posterior probability distributions for different memory cell subsets (b) or the three most frequent IGHV families (c) for data in (2). The top half of each plot shows the estimated selection strength in the CDR, whereas the bottom part provides an estimate for FWR.

See this image and copyright information in PMC

References

1. Boyd SD, Marshall EL, Merker JD, Maniar JM, Zhang LN, Sahaf B, Jones CD, Simen BB, Hanczaruk B, Nguyen KD, et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing. Sci. Transl. Med. 2009;1:12ra23. - PMC - PubMed
1. Wu YC, Kipling D, Leong HS, Martin V, Ademokun AA, Dunn-Walters DK. High throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory b cell populations. Blood. 2010;116 - PMC - PubMed
1. Jiang N, Weinstein JA, Penland L, White RA, Fisher DS, Quake SR. Determinism and stochasticity during maturation of the zebrafish antibody repertoire. Proc. Natl. Acad. Sci. USA. 2011;108:5348–5353. - PMC - PubMed
1. Longo NS, Lipsky PE. Why do b cells mutate their immunoglobulin receptors? Trends Immunol. 2006;27:374–380. - PubMed
1. Shlomchik MJ, Marshak-Rothstein A, Wolfowicz CB, Rothstein TL, Weigert MG. The role of clonal selection and somatic mutation in autoimmunity. Nature. 1987;328:805–811. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quantifying selection in high-throughput Immunoglobulin sequencing data sets

Affiliation

Quantifying selection in high-throughput Immunoglobulin sequencing data sets

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases