Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007;35(6):2013-25.
doi: 10.1093/nar/gkm076. Epub 2007 Mar 6.

QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data

Affiliations

QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data

Stefano Colella et al. Nucleic Acids Res. 2007.

Abstract

Array-based technologies have been used to detect chromosomal copy number changes (aneuploidies) in the human genome. Recent studies identified numerous copy number variants (CNV) and some are common polymorphisms that may contribute to disease susceptibility. We developed, and experimentally validated, a novel computational framework (QuantiSNP) for detecting regions of copy number variation from BeadArray SNP genotyping data using an Objective Bayes Hidden-Markov Model (OB-HMM). Objective Bayes measures are used to set certain hyperparameters in the priors using a novel re-sampling framework to calibrate the model to a fixed Type I (false positive) error rate. Other parameters are set via maximum marginal likelihood to prior training data of known structure. QuantiSNP provides probabilistic quantification of state classifications and significantly improves the accuracy of segmental aneuploidy identification and mapping, relative to existing analytical tools (Beadstudio, Illumina), as demonstrated by validation of breakpoint boundaries. QuantiSNP identified both novel and validated CNVs. QuantiSNP was developed using BeadArray SNP data but it can be adapted to other platforms and we believe that the OB-HMM framework has widespread applicability in genomic research. In conclusion, QuantiSNP is a novel algorithm for high-resolution CNV/aneuploidy detection with application to clinical genetics, cancer and disease association studies.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Chromosome-wide data. Log R ratio values (top) and B allele frequencies (bottom) plotted for each SNP from one individual on chromosome 6. A deletion on the p-arm can be identified by the shift in the log R downwards and the loss-of-heterozygosity indicated by the disappearance of heterozygous state (0.5) in the B allele frequencies (as indicated by the arrows).
Figure 2.
Figure 2.
(a) and (b) QuantiSNP is able to detect as many as 3–4 SNPs in simulated 5 SNP aberration region but only if we accept false calls rates of around 10 in 100 000 SNPs. However, in (c) and (d), when the length of the event increases to 10 SNPs, QuantiSNP successfully detects nearly all affected SNPs in the deletion and duplication events even at very stringent false call rates of less than 1 in 100 000 SNPs. In all cases, the localization of the true boundary is good, with less than one extra SNP called outside of the true aberrant region.
Figure 3.
Figure 3.
Multi-sample detection rates. Comparison of single-sample (red) and multi-sample analysis (blue) performance in (a,c) duplications and (b,d) deletions. In (a) and (b), the multi-sample analysis has greatly improved the detection capability of QuantiSNP for a 5 SNP duplication and deletion event respectively by increasing the number of SNPs called aberrant. In (c) and (d), the multi-sample analysis reduces the number of SNPs that are falsely called as aberrant towards zero.
Figure 4.
Figure 4.
QuantiSNP Output. An example of output from QuantiSNP, shown are log R Ratio, B allele frequency, HMM copy number estimate and associated log Bayes Factor. (a) Sample No. 4 chromosome 6 deletion case; (b) Sample No.15 duplication on chromosome 17. In Supplementary Figure S1, the same data were visualized as a custom track in the UCSC Genome browser.
Figure 5.
Figure 5.
Calibration of false call rates. Our false call rates obtained by simulation (red) fall within the bounds of the empirical false call rate derived from the experimental sample analysis (black). We chose formula image for the analysis of the (a) Human-1 and (b) HumanHap300 datasets. Sample 10 was excluded from the analysis as this dataset shown unusually high levels of noise. Errors were derived from bootstrap simulations using the empirical and simulated datasets. There appears to be a good matching between the two boundaries for the Human-1 dataset. The comparison using the HumanHap300 data is less favourable, possibly due to the change in number of probes per SNP in versions of the HumanHap300 used in the experiments (see Supplementary Data, Materials and Methods S1 for details).
Figure 6.
Figure 6.
Breakpoint mapping. Comparison of breakpoint mapping using BeadStudio (orange arrows) and QuantiSNP (blue arrows) on HumanHap300 data shown in context with previous data (full data in Supplementary Table S1B and S2C and Table 1, respectively). A star indicates the detection of the event in multiple fragments. The schematic image of the chromosome is not to scale: other technology defined deletion/duplication boundary is indicated in black, the deleted/duplicated area is in grey (see Table 1 for details). (a) Samples characterized by FISH (boundary mapped with a ±1 00 000 bp confidence). (b) Samples characterized by molecular genetics; sample No. 18 breakpoint was successfully identified above significance (log Bayes Factor = 37.5) in the combined data only (light blue arrows).
Figure 7.
Figure 7.
DMD deletion mapping in Sample 14. (a) QuantiSNP output for sample 14, the chromosome X deletion is identified; (b) Sequence results across the deletion; (c) Mapping of the sequence to the genome location on chromosome X; (d) Blat results for the sequence (in panel c) and the visualization in the UCSC browser. Orange custom QuantiSNP (QS) log Bayes Factor track and in red (deletions)/green(duplications) QuantiSNP (QS) copy number (0 correspond to the normal state). RefSeq genes and SNPs present in different array platforms (including HumanHap300 labelled as ‘Illumina_300 K’) are also shown in the example.

References

    1. Speicher MR, Carter NP. The new cytogenetics: blurring the boundaries with molecular biology. Nat. Rev. Genet. 2005;6:782–792. - PubMed
    1. Hochstenbach R, Ploos van Amstel HK, Poot M. Microarray-based genome investigation: molecular karyotyping or segmental aneuploidy profiling? Eur. J. Hum. Genet. 2006;14:262–265. - PubMed
    1. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW, Wei W, et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004;14:287–295. - PMC - PubMed
    1. Rauch A, Ruschendorf F, Huang J, Trautmann U, Becker C, Thiel C, Jones KW, Reis A, Nurnberg P. Molecular karyotyping using an SNP array for genomewide genotyping. J. Med. Genet. 2004;41:916–922. - PMC - PubMed
    1. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, Minna J, Christiani D, et al. An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 2004;64:3060–3071. - PubMed

Publication types