Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 4;92(4):530-46.
doi: 10.1016/j.ajhg.2013.03.004. Epub 2013 Mar 28.

Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation

Affiliations

Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation

Corey T Watson et al. Am J Hum Genet. .

Abstract

The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (Fst = 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic Comparisons of IGHV Haplotype between CH17 and the Human Reference Genome Assembly (GRCh37) (A) Functional and ORF IGHV genes annotated from each reference are depicted by filled boxes with corresponding IGHV gene locus and allele identifiers located above and below the haplotypes. (B) The positions of two insertions and two complex events characterized from the CH17 haplotype are shown mapped to the GRCh37 IGH human reference assembly sequence (black line; chr14:105,928,955-107,289,540). The locus is presented in the same orientation as that depicted by IMGT. Functional and ORF IGHV, IGHD, and IGHJ genes (not to scale) and IGHV pseudogenes are shown (to scale); the names of IGHV genes involved in the characterized structural variants are indicated. Segmental duplications downloaded from the UCSC genome browser are shown below GRCh37.
Figure 2
Figure 2
Map of IGHV CNVs Identified by Complete Sequencing of Fosmid Clones The positions of three deletions, two insertions, one duplication, and one complex event characterized from fosmid alternative haplotypes are shown mapped to the GRCh37 IGH reference (black line; chr14:105,928,955–107,289,540) with the same parameters as in Figure 1B. The locus is presented in the same orientation as that depicted by IMGT. Functional and ORF IGHV, IGHD, and IGHJ genes (not to scale) as well as IGHV pseudogenes are shown (to scale); the names of IGHV genes involved in the characterized structural variants are indicated. Segmental duplications downloaded from the UCSC genome browser are shown below GRCh37. The large red box indicates a hotspot region of recurrent mutation (see Figure 4 for additional haplotypes associated with this hotspot). The deletion of IGHV4-61 was identified by Mills et al. (chr14:107,084,861–107,096,738) and was also included in our analyses.
Figure 3
Figure 3
Breakpoint Analysis of the IGHV3-23 Duplication (A) Pairwise BLAST alignment of 38 kbp region surrounding IGHV3-23 in GRCh37 (chr14:106700594–106738594). Brown and orange dotted arrows point to ∼5.3 kbp repeat sequences (extended homology) suspected to have mediated the IGHV3-23 duplication. Repeat sequences show 86% sequence identity. (B) Sequence harboring the IGHV3-23 duplication identified in individual NA18956 (clones AC244473 and AC206018) is compared to GRCh37 (chr14:106700594–106738594). Regions of similarity between the two haplotypes are connected by black lines. Segments colored in blue in both haplotypes indicate the locations of the 10.8 kbp duplicates. Brown and orange bars above the NA18956 haplotype and below the GRCh37 haplotype indicate ∼5.3 kbp repeat sequences identified by BLAST in (A). Labeled IGHV genes and pseudogenes are depicted by green and red chevrons, respectively. (C) A five-way alignment of ∼5.3 kbp repeat sequences from both haplotypes. Alignment of base positions is shown along the top of the diagram. Each repeat sequence (three from NA18956 and two from GRCh37) is represented by a single horizontal black line. Blue tick marks on each line indicate nucleotide (nt) differences and gaps observed between the aligned sequences. The red line tracks the most similar alignment of the middle NA18956 repeat sequence to the other four sequences. Based on nt similarity, the event breakpoint is presumed to have occurred within the 373 bp region (red box) in which all aligned sequences share 99.7% sequence identity (372/373 nt).
Figure 4
Figure 4
A Hotspot of IGH Structural Polymorphism Each of the five identified haplotypes harboring diverse CNVs is shown relative to GRCh37. Each haplotype is labeled with a sample identifier, and the length (kbp) is indicated at the right of each haplotype in parentheses. Two haplotypes in this region were identified from the individual NA18555 (haplotypes A and B). One to four ∼25 kbp segmental duplication sequence blocks (or partial blocks), depending on the haplotype, are depicted by shaded blue bars. Deleted regions identified in five of the haplotypes, including GRCh37, are indicated by red dotted lines. The positions and names of functional IGHV genes (green boxes) are shown in each haplotype. The partial haplotype identified in this region from individual NA19240 (AC234301), which overlapped that of NA18555 haplotype A and included the genes IGHV4-30-2, IGHV3-30-3, IGHV4-30-4, and IGHV3-30-5, is not depicted but was included in the analysis and allowed for the placement of the NA18502 haplotype (also see Figures S1 and S12–S14).
Figure 5
Figure 5
Pairwise Fst of Copy-Number Variant Loci in Nine Human Populations Heatmaps representing pairwise Fst values between populations calculated based on CNV allele frequency data generated at four loci are as follows: (A) IGHV7-4-1 insertion; (B) IGHV3-9, IGHV1-8, IGHV5-a, and IGHV3-64D complex event; (C) IGHV1-c, IGHV3-d, IGHV3-43D, and IGHV4-b insertion; and (D) IGHV1-69D, IGHV1-f, IGHV3-h, and IGHV2-70D insertion. Population abbreviations are shown on both map axes. Colors in each square correspond to a given Fst value range as indicated by the key.

References

    1. Tuzun E., Sharp A.J., Bailey J.A., Kaul R., Morrison V.A., Pertz L.M., Haugen E., Hayden H., Albertson D., Pinkel D. Fine-scale structural variation of the human genome. Nat. Genet. 2005;37:727–732. - PubMed
    1. Kidd J.M., Cooper G.M., Donahue W.F., Hayden H.S., Sampas N., Graves T., Hansen N., Teague B., Alkan C., Antonacci F. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008;453:56–64. - PMC - PubMed
    1. Kidd J.M., Sampas N., Antonacci F., Graves T., Fulton R., Hayden H.S., Alkan C., Malig M., Ventura M., Giannuzzi G. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat. Methods. 2010;7:365–371. - PMC - PubMed
    1. Sudmant P.H., Kitzman J.O., Antonacci F., Alkan C., Malig M., Tsalenko A., Sampas N., Bruhn L., Shendure J., Eichler E.E., 1000 Genomes Project Diversity of human copy number variation and multicopy genes. Science. 2010;330:641–646. - PMC - PubMed
    1. Mills R.E., Walter K., Stewart C., Handsaker R.E., Chen K., Alkan C., Abyzov A., Yoon S.C., Ye K., Cheetham R.K., 1000 Genomes Project Mapping copy number variation by population-scale genome sequencing. Nature. 2011;470:59–65. - PMC - PubMed

Publication types

Substances

Associated data