Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 2:16:891.
doi: 10.1186/s12864-015-2123-y.

Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data

Affiliations

Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data

Diego Forni et al. BMC Genomics. .

Abstract

Background: Copy number variation (CNV) is a major component of genomic variation, yet methods to accurately type genomic CNV lag behind methods that type single nucleotide variation. High-throughput sequencing can contribute to these methods by using sequence read depth, which takes the number of reads that map to a given part of the reference genome as a proxy for copy number of that region, and compares across samples. Furthermore, high-throughput sequencing also provides information on the sequence differences between copies within and between individuals.

Methods: In this study we use high-coverage phase 3 exome sequences of the 1000 Genomes project to infer diploid copy number of the beta-defensin genomic region, a well-studied CNV that carries several beta-defensin genes involved in the antimicrobial response, signalling, and fertility. We also use these data to call sequence variants, a particular challenge given the multicopy nature of the region.

Results: We confidently call copy number and sequence variation of the beta-defensin genes on 1285 samples from 26 global populations, validate copy number using Nanostring nCounter and triplex paralogue ratio test data. We use the copy number calls to verify the genomic extent of the CNV and validate sequence calls using analysis of cloned PCR products. We identify novel variation, mostly individually rare, predicted to alter amino-acid sequence in the beta-defensin genes. Such novel variants may alter antimicrobial properties or have off-target receptor interactions, and may contribute to individuality in immunological response and fertility.

Conclusions: Given that 81% of identified sequence variants were not previously in dbSNP, we show that sequence variation in multiallelic CNVs represent an unappreciated source of genomic diversity.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Distribution of reads-per-kilobase-per-million-reads (RPKM) values of different samples stratified by sequencing centre. The kernel density plot shows density of RPKM values from mrsFAST alignments for four different sequencing centres, distinguished by the different colours. The vertical dotted line indicates the cutoff value at RPKM = 50, with samples above that threshold taken on for copy number calling
Fig. 2
Fig. 2
Effects of continental group batch of origin on copy number clustering. The histograms show normalised sequence depth coverage data for the beta-defensin region generated by the BGI sequencing centre. X-axis values represent raw mean SVD-ZRPKM values, and the y-axis represents number of samples. Curved lines indicating the Gaussian curves used to call integer copy number. a) samples from East Asian populations (n = 269), b) samples from South Asian populations (n = 165), c) samples from South Asian and East Asian populations analysed as one batch (n = 434)
Fig. 3
Fig. 3
Effects of sequencing centre and batch size on copy number clustering. The histograms show normalised sequence depth coverage data for the beta-defensin region for sub-Saharan African samples. X-axis values represent raw mean SVD-ZRPKM values a) BCM sequencing centre, n = 81 (15 YRI, 57 LWK, 9 ASW). b) BGI sequencing centre, n = 172 (26 YRI, 3 LWK, 25 GWD, 43 MSL, 47 ESN, 5 ASW, 23 ACB), with curved lines indicating the Gaussian mixture model used to call integer copy number
Fig. 4
Fig. 4
Validated of beta-defensin copy number calling. The plots show comparisons between two methods of calling integer beta-defensin copy number. a) comparison with triplex paralogue ratio test and Nanostring nCounter. b) comparison with integer calls from phase 1 low coverage whole genome data [4]. The figures in red indicate the numbers of samples concordant for that particular copy number. The numbers in blue indicate the numbers of discordant samples
Fig. 5
Fig. 5
Correlation of SVD-ZRPKM values between genes at 8p23.1. Plot of pairwise correlation between SVD-ZRPKM values among genes at chromosome region 8p23.1. The SVD-ZRPKM mean for all exons belonging to each gene was calculated and the pairwise correlation for each pair of genes was evaluated by the r2 metric (the correlation is increasing with gray shading). Gene presence and location is based on the annotation of the hg19 human genome assembly. Complex repeat-rich regions REPP and REPD are indicated, and several genes between REPP and REPD are omitted to save space, as indicated by the red dashed line
Fig. 6
Fig. 6
Summary of predicted amino acid changes inferred from sequence variation. The six beta-defensin proteins encoded by the genes analysed in this study are shown. The prepro region, which is cleaved during processing, is shown under the blue bar; with the mature peptide sequence is shown under the red bar. The canonical six cysteines are highlighted in red, with sequence variants identified in this study shown in green. X represents a stop codon, and hbd2, hbd3, hbd4, hbd5, hbd6, and hbd7 are the proteins encoded by DEFB4, DEFB103, DEFB104, DEFB105, DEFB106 and DEFB107 respectively

References

    1. Hollox EJ, Hoh B-P. Human gene copy number variation and infectious disease. Hum Genet. 2014;133(10):1217–33. doi: 10.1007/s00439-014-1457-x. - DOI - PubMed
    1. Schrider DR, Hahn MW. Gene copy-number polymorphism in nature. Proc R Soc B Biol Sci. 2010;277(1698):3213–3221. doi: 10.1098/rspb.2010.1180. - DOI - PMC - PubMed
    1. Wain LV, Armour JAL, Tobin MD. Genomic copy number variation, human health, and disease. Lancet. 2009;374(9686):340–350. doi: 10.1016/S0140-6736(09)60249-X. - DOI - PubMed
    1. Handsaker RE, Van Doren V, Berman JR, Genovese G, Kashin S, Boettger LM, et al. Large multiallelic copy number variations in humans. Nat Genet. 2015;47(3):296–303. doi: 10.1038/ng.3200. - DOI - PMC - PubMed
    1. Polley S, Louzada S, Forni D, Sironi M, Balsakas T, Hains D, et al. Evolution of the rapidly-mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy. Proc Natl Acad Sci. 2015;112(15):5105–5110. doi: 10.1073/pnas.1416531112. - DOI - PMC - PubMed

MeSH terms