Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul;39(12):e79.
doi: 10.1093/nar/gkr197. Epub 2011 Apr 12.

cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate

Affiliations

cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate

Djork-Arné Clevert et al. Nucleic Acids Res. 2011 Jul.

Abstract

Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, 'cn.FARMS', which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The copy number hierarchy probes-fragment-region. Fragment copy numbers serve as meta-probes used for ‘multi-loci modeling’ which yields region copy numbers. Inner boxes: the probes which target a fragment (often at a SNP position) are summarized to a raw copy number of this fragment. Note, that instead of fragments a DNA probe loci can be summarized. Outer box: the raw fragment copy numbers are the meta-probes for a DNA region and are summarized to a raw region copy number.
Figure 2.
Figure 2.
Copy number analysis for (Affymetrix) DNA genotyping arrays as a three-step pipeline: (i) normalization, (ii) modeling and (iii) segmentation. Modeling is divided into ‘single-locus modeling’ and ‘multi-loci modeling’ with ‘fragment length correction’ as an optional intermediate step. As described in subsection ‘cn.FARMS: FARMS for CNV Detection’, cn.FARMS' pipeline is as follows: normalization by sparse overcomplete representation, single-locus modeling by FARMS, fragment length correction and multi-loci modeling by FARMS.
Figure 3.
Figure 3.
Sparse overcomplete representation of allele A and B probes. The smooth scatter plot for a HapMap Affymetrix 250K_NSP array sample (CEU_NA12878, G/A allele probes). The three clouds going outwards from the origin correspond to genotypes AA (upper left cloud), AB (middle cloud), and BB (lower right cloud). For the genotype AA, allele A probes show a strong signal and allele B probes shows a weak signal due to cross-hybridization (analog for genotype BB). Note, that the middle cloud is closer to the left cloud than to the right (violating CRMA's ACC assumptions). The lines are the estimates of sparse overcomplete representation. They are used to correct for cross-hybridization by moving the left cloud to be vertical, the middle cloud to be at the 45° line and the lower right cloud to be horizontal.
Figure 4.
Figure 4.
ROC curves for cn.FARMS, CRMA_v2 and dChip at the sex classification task for 59 HapMap CEU founders based on the X chromosome copy numbers. The panels show (A) single-locus and (B) three-loci modeling of Affymetrix Mapping250K_NSP arrays. While panels show (C) single-locus and (D) three-loci modeling of Affymetrix SNP 6.0 arrays. ROC curves more at the upper left indicate better performing methods (AUC values for Affymetrix Mapping250K_NSP and Affymetrix SNP 6.0 are given in Table 1). cn.FARMS performs better than CRMA_v2 and dChip.
Figure 5.
Figure 5.
Precision-recall curves (PRCs) on HapMap SNP 6.0 arrays for cn.FARMS, CRMA_v2, and dChip at detecting previously multiple confirmed CNVs reported in Conrad et al. (38). cn.FARMS detection criteria is the I/NI call, whereas CRMA_v2 and dChip use the variance of raw copy numbers. A PRC more in the upper-right hand corner indicates better performance. Note, that precision is (1−FDR) thus the FDR is the distance of the curve to the upper limit. Panels (A–D) give the PRC for chromosome 4, 8, chromosome X and the whole genome for 3 loci. Panels (E–H) show the same for 5 loci. cn.FARMS (solid green) has a clear advantage over dChip (dashed purple) and CRMA_v2 (dotted blue). cn.FARMS has a considerable lower FDR compared to the other methods.
Figure 6.
Figure 6.
(A) CNV calling plots across chromosome 4 for 3 loci regions (each point in the plot summarizes 3 loci). The y-axis gives the I/NI call estimated by cn.FARMS and for both CRMA_v2 and dChip it gives the variance. Calling values are scaled such that the maximum is one. Local calling densities are encode by blue color shades. True CNVs [reported in Conrad et al. (38)] are marked as light rose bars and calls at these loci by red circles. A perfect calling method would call all true CNVs (red circles at 1) and does not call others (dark blue background at 0). cn.FARMS separates called true positives (true CNVs) from true negatives better than other methods which can be seen at less variance in true negatives indicated by dark blue density at the bottom. The red arrows, e.g. at positions 65 or 85 Mb in the upper cn.FARMS panel, indicate verified CNVs which were detected by one method, in this case cn.FARMS, but not by both others. cn.FARMS identifies true CNVs with a lower FDR than CRMA_v2 and dChip. (B) The same plot for 5 loci (each point in the plot summarizes 5 loci). The FDR is further reduced, as can be seen by the lower variance of non-call values at the bottom. Again, cn.FARMS identifies true CNVs with a lower FDR than CRMA_v2 and dChip.

Similar articles

Cited by

References

    1. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. - PMC - PubMed
    1. Conrad DF, Hurles ME. The population genetics of structural variation. Nat. Genet. 2007;39:S30–S36. - PMC - PubMed
    1. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung H-C, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008;451:998–1003. - PubMed
    1. Fanciulli M, Norsworthy PJ, Petretto E, Dong R, Harper L, Kamesh L, Heward JM, Gough SC, deSmith A, Blakemore AI, et al. FCGR3B copy number variation is associated with susceptibility to systemic, but not organ-specific, autoimmunity. Nat. Genet. 2007;39:721–723. - PMC - PubMed
    1. Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Nibbs RJ, Freedman BI, Quinones MP, Bamshad MJ, et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science. 2005;307:1434–1440. - PubMed

Publication types