Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Nov;17(11):1665-74.
doi: 10.1101/gr.6861907. Epub 2007 Oct 5.

PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data

Affiliations

PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data

Kai Wang et al. Genome Res. 2007 Nov.

Abstract

Comprehensive identification and cataloging of copy number variations (CNVs) is required to provide a complete view of human genetic variation. The resolution of CNV detection in previous experimental designs has been limited to tens or hundreds of kilobases. Here we present PennCNV, a hidden Markov model (HMM) based approach, for kilobase-resolution detection of CNVs from Illumina high-density SNP genotyping data. This algorithm incorporates multiple sources of information, including total signal intensity and allelic intensity ratio at each SNP marker, the distance between neighboring SNPs, the allele frequency of SNPs, and the pedigree information where available. We applied PennCNV to genotyping data generated for 112 HapMap individuals; on average, we detected approximately 27 CNVs for each individual with a median size of approximately 12 kb. Excluding common rearrangements in lymphoblastoid cell lines, the fraction of CNVs in offspring not detected in parents (CNV-NDPs) was 3.3%. Our results demonstrate the feasibility of whole-genome fine-mapping of CNVs via high-density SNP genotyping.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An illustration of log R Ratio (LRR) and B Allele Freq (BAF) values for the chromosome 15 q-arm of an individual. A normal chromosome region has three BAF genotype clusters, as represented as AA, AB, and BB genotypes in boxes, and with LRR values centered around zero. The copy-neutral LOH region has normal LRR values, but without the AB genotype cluster. The increased copy number for a CNV region can be detected based on an increased number of peaks in the BAF distribution, as well as increased LRR values. The patterns of LRR and BAF for different CNV regions, normal regions, and copy-neutral LOH regions are distinct from each other, thus the combination of LRR and BAF can be used to generate CNV calls.
Figure 2.
Figure 2.
A flowchart outlining the procedure for CNV calling from genotyping data. The first step for LRR and BAF calculation can be alternatively performed by the BeadStudio software, given a clustering file containing canonical genotype cluster positions. The HMM integrates several sources of information to give CNV calls. When genotype data are available for family members, the pedigree information can be incorporated to model CNV events more accurately.
Figure 3.
Figure 3.
(A) A predicted ∼700-bp CNV within an intronic region of the FBXL7 gene; (B) a predicted ∼1-kb CNV within an intronic region of the EYA1 gene; and (C) a predicted ∼4-kb CNV within an intronic region of the CTDSPL gene are inherited from parent to offspring. The scatterplots for log R Ratio and B Allele Frequency are shown for the father, mother, and offspring; (red dots) the SNPs within the CNVs. The presence of CNVs and their copy numbers are validated by PCR amplification of the region encompassing breakpoints for FBXL7 and EYA1, or by PCR primer walking for CTDSPL (see Fig. 4 for more detail on primer locations).
Figure 4.
Figure 4.
UCSC Genome Browser (Kuhn et al. 2007) shots of the CNVs within the FBXL7 (A), EYA1 (B), and CTDSPL (C) genes, as well as the location of SNPs and PCR primers. The predicted CNV regions with (gray solid boxes) deletion of one copy or (black solid boxes) deletion of two copies on the “CNV calls” track; the actual CNV breakpoints identified by resequencing are shown in the “BLAT Search” track. For the CNV within FBXL7, a pair of PCR primers (P1 and P2) is able to generate two PCR products, thus resequencing of shorter PCR products identifies the CNV breakpoint. For the CNV within EYA1, the primer pair P1–P2, but not P1–P3, generates two PCR products, indicating that the breakpoint is between P2 and P3; thus resequencing by P2 identifies the exact breakpoint. For the CNV within CTDSPL, the primer pairs P1–P2, P1–P3, and P1–P4 all generate two PCR products, indicating that the breakpoint is between P1 and P4; thus resequencing of the shortest PCR product in Figure 3C by P1 and P4 from both ends identifies the breakpoint. These examples illustrate that the combined PCR-resequencing approach can pinpoint the exact location of predicted CNVs in the human genome.

References

    1. Aardema M.J., Crosby L.L., Gibson D.P., Kerckaert G.A., LeBoeuf R.A., Crosby L.L., Gibson D.P., Kerckaert G.A., LeBoeuf R.A., Gibson D.P., Kerckaert G.A., LeBoeuf R.A., Kerckaert G.A., LeBoeuf R.A., LeBoeuf R.A. Aneuploidy and consistent structural chromosome changes associated with transformation of Syrian hamster embryo cells. Cancer Genet. Cytogenet. 1997;96:140–150. - PubMed
    1. Bailey J.A., Yavor A.M., Massa H.F., Trask B.J., Eichler E.E., Yavor A.M., Massa H.F., Trask B.J., Eichler E.E., Massa H.F., Trask B.J., Eichler E.E., Trask B.J., Eichler E.E., Eichler E.E. Segmental duplications: Organization and impact within the current human genome project assembly. Genome Res. 2001;11:1005–1017. - PMC - PubMed
    1. Baum L.E., Petrie T., Soules G., Weiss N., Petrie T., Soules G., Weiss N., Soules G., Weiss N., Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math. Statist. 1970;41:164–171.
    1. Carter N. Methods and strategies for analyzing copy number variation using DNA microarrays. Nat. Genet. 2007;39:S16–S21. - PMC - PubMed
    1. Colella S., Yau C., Taylor J.M., Mirza G., Butler H., Clouston P., Bassett A.S., Seller A., Holmes C.C., Ragoussis J., Yau C., Taylor J.M., Mirza G., Butler H., Clouston P., Bassett A.S., Seller A., Holmes C.C., Ragoussis J., Taylor J.M., Mirza G., Butler H., Clouston P., Bassett A.S., Seller A., Holmes C.C., Ragoussis J., Mirza G., Butler H., Clouston P., Bassett A.S., Seller A., Holmes C.C., Ragoussis J., Butler H., Clouston P., Bassett A.S., Seller A., Holmes C.C., Ragoussis J., Clouston P., Bassett A.S., Seller A., Holmes C.C., Ragoussis J., Bassett A.S., Seller A., Holmes C.C., Ragoussis J., Seller A., Holmes C.C., Ragoussis J., Holmes C.C., Ragoussis J., Ragoussis J. QuantiSNP: An objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 2007;35:2013–2025. - PMC - PubMed

Publication types