. 2013 Apr 4;92(4):530-46.

doi: 10.1016/j.ajhg.2013.03.004. Epub 2013 Mar 28.

Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation

Corey T Watson¹, Karyn M Steinberg, John Huddleston, Rene L Warren, Maika Malig, Jacqueline Schein, A Jeremy Willsey, Jeffrey B Joy, Jamie K Scott, Tina A Graves, Richard K Wilson, Robert A Holt, Evan E Eichler, Felix Breden

Affiliations

PMID: 23541343
PMCID: PMC3617388
DOI: 10.1016/j.ajhg.2013.03.004

Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation

Corey T Watson et al. Am J Hum Genet. 2013.

. 2013 Apr 4;92(4):530-46.

doi: 10.1016/j.ajhg.2013.03.004. Epub 2013 Mar 28.

Authors

Affiliation

¹ Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada.

PMID: 23541343
PMCID: PMC3617388
DOI: 10.1016/j.ajhg.2013.03.004

Abstract

The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (Fst = 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.

PubMed Disclaimer

Figures

**Figure 1**
Schematic Comparisons of IGHV Haplotype between CH17 and the Human Reference Genome Assembly (GRCh37) (A) Functional and ORF IGHV genes annotated from each reference are depicted by filled boxes with corresponding IGHV gene locus and allele identifiers located above and below the haplotypes. (B) The positions of two insertions and two complex events characterized from the CH17 haplotype are shown mapped to the GRCh37 IGH human reference assembly sequence (black line; chr14:105,928,955-107,289,540). The locus is presented in the same orientation as that depicted by IMGT. Functional and ORF IGHV, IGHD, and IGHJ genes (not to scale) and IGHV pseudogenes are shown (to scale); the names of IGHV genes involved in the characterized structural variants are indicated. Segmental duplications downloaded from the UCSC genome browser are shown below GRCh37.

**Figure 2**
Map of IGHV CNVs Identified by Complete Sequencing of Fosmid Clones The positions of three deletions, two insertions, one duplication, and one complex event characterized from fosmid alternative haplotypes are shown mapped to the GRCh37 IGH reference (black line; chr14:105,928,955–107,289,540) with the same parameters as in Figure 1B. The locus is presented in the same orientation as that depicted by IMGT. Functional and ORF IGHV, IGHD, and IGHJ genes (not to scale) as well as IGHV pseudogenes are shown (to scale); the names of IGHV genes involved in the characterized structural variants are indicated. Segmental duplications downloaded from the UCSC genome browser are shown below GRCh37. The large red box indicates a hotspot region of recurrent mutation (see Figure 4 for additional haplotypes associated with this hotspot). The deletion of *IGHV4-61* was identified by Mills et al. (chr14:107,084,861–107,096,738) and was also included in our analyses.

**Figure 3**
Breakpoint Analysis of the *IGHV3-23* Duplication (A) Pairwise BLAST alignment of 38 kbp region surrounding *IGHV3-23* in GRCh37 (chr14:106700594–106738594). Brown and orange dotted arrows point to ∼5.3 kbp repeat sequences (extended homology) suspected to have mediated the *IGHV3-23* duplication. Repeat sequences show 86% sequence identity. (B) Sequence harboring the *IGHV3-23* duplication identified in individual NA18956 (clones AC244473 and AC206018) is compared to GRCh37 (chr14:106700594–106738594). Regions of similarity between the two haplotypes are connected by black lines. Segments colored in blue in both haplotypes indicate the locations of the 10.8 kbp duplicates. Brown and orange bars above the NA18956 haplotype and below the GRCh37 haplotype indicate ∼5.3 kbp repeat sequences identified by BLAST in (A). Labeled IGHV genes and pseudogenes are depicted by green and red chevrons, respectively. (C) A five-way alignment of ∼5.3 kbp repeat sequences from both haplotypes. Alignment of base positions is shown along the top of the diagram. Each repeat sequence (three from NA18956 and two from GRCh37) is represented by a single horizontal black line. Blue tick marks on each line indicate nucleotide (nt) differences and gaps observed between the aligned sequences. The red line tracks the most similar alignment of the middle NA18956 repeat sequence to the other four sequences. Based on nt similarity, the event breakpoint is presumed to have occurred within the 373 bp region (red box) in which all aligned sequences share 99.7% sequence identity (372/373 nt).

**Figure 4**
A Hotspot of IGH Structural Polymorphism Each of the five identified haplotypes harboring diverse CNVs is shown relative to GRCh37. Each haplotype is labeled with a sample identifier, and the length (kbp) is indicated at the right of each haplotype in parentheses. Two haplotypes in this region were identified from the individual NA18555 (haplotypes A and B). One to four ∼25 kbp segmental duplication sequence blocks (or partial blocks), depending on the haplotype, are depicted by shaded blue bars. Deleted regions identified in five of the haplotypes, including GRCh37, are indicated by red dotted lines. The positions and names of functional IGHV genes (green boxes) are shown in each haplotype. The partial haplotype identified in this region from individual NA19240 (AC234301), which overlapped that of NA18555 haplotype A and included the genes *IGHV4-30-2*, *IGHV3-30-3*, *IGHV4-30-4*, and *IGHV3-30-5*, is not depicted but was included in the analysis and allowed for the placement of the NA18502 haplotype (also see Figures S1 and S12–S14).

**Figure 5**
Pairwise F_st of Copy-Number Variant Loci in Nine Human Populations Heatmaps representing pairwise F_st values between populations calculated based on CNV allele frequency data generated at four loci are as follows: (A) *IGHV7-4-1* insertion; (B) *IGHV3-9*, *IGHV1-8*, *IGHV5-a*, and *IGHV3-64D* complex event; (C) *IGHV1-c*, *IGHV3-d*, *IGHV3-43D*, and *IGHV4-b* insertion; and (D) *IGHV1-69D*, *IGHV1-f*, *IGHV3-h*, and *IGHV2-70D* insertion. Population abbreviations are shown on both map axes. Colors in each square correspond to a given F_st value range as indicated by the key.

See this image and copyright information in PMC

References

1. Tuzun E., Sharp A.J., Bailey J.A., Kaul R., Morrison V.A., Pertz L.M., Haugen E., Hayden H., Albertson D., Pinkel D. Fine-scale structural variation of the human genome. Nat. Genet. 2005;37:727–732. - PubMed
1. Kidd J.M., Cooper G.M., Donahue W.F., Hayden H.S., Sampas N., Graves T., Hansen N., Teague B., Alkan C., Antonacci F. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008;453:56–64. - PMC - PubMed
1. Kidd J.M., Sampas N., Antonacci F., Graves T., Fulton R., Hayden H.S., Alkan C., Malig M., Ventura M., Giannuzzi G. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat. Methods. 2010;7:365–371. - PMC - PubMed
1. Sudmant P.H., Kitzman J.O., Antonacci F., Alkan C., Malig M., Tsalenko A., Sampas N., Bruhn L., Shendure J., Eichler E.E., 1000 Genomes Project Diversity of human copy number variation and multicopy genes. Science. 2010;330:641–646. - PMC - PubMed
1. Mills R.E., Walter K., Stewart C., Handsaker R.E., Chen K., Alkan C., Abyzov A., Yoon S.C., Ye K., Cheetham R.K., 1000 Genomes Project Mapping copy number variation by population-scale genome sequencing. Nature. 2011;470:59–65. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Associated data

Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide
Actions
- Search in PubMed
- Search in Nucleotide

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- GlyGen glycoinformatics resource
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation

Affiliation

Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Miscellaneous