Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar 22:11:147.
doi: 10.1186/1471-2105-11-147.

Identification of recurrent regions of Copy-Number Variants across multiple individuals

Affiliations

Identification of recurrent regions of Copy-Number Variants across multiple individuals

Teo Shu Mei et al. BMC Bioinformatics. .

Abstract

Background: Algorithms and software for CNV detection have been developed, but they detect the CNV regions sample-by-sample with individual-specific breakpoints, while common CNV regions are likely to occur at the same genomic locations across different individuals in a homogenous population. Current algorithms to detect common CNV regions do not account for the varying reliability of the individual CNVs, typically reported as confidence scores by SNP-based CNV detection algorithms. General methodologies for identifying these recurrent regions, especially those directed at SNP arrays, are still needed.

Results: In this paper, we describe two new approaches for identifying common CNV regions based on (i) the frequency of occurrence of reliable CNVs, where reliability is determined by high confidence scores, and (ii) a weighted frequency of occurrence of CNVs, where the weights are determined by the confidence scores. In addition, motivated by the fact that we often observe partially overlapping CNV regions as a mixture of two or more distinct subregions, regions identified using the two approaches can be fine-tuned to smaller sub-regions using a clustering algorithm. We compared the performance of the methods with sequencing-based results in terms of discordance rates, rates of departure from Hardy-Weinberg equilibrium (HWE) and average frequency and size of the identified regions. The discordance rates as well as the rates of departure from HWE decrease when we select CNVs with higher confidence scores. We also performed comparisons with two previously published methods, STAC and GISTIC, and showed that the methods we consider are better at identifying low-frequency but high-confidence CNV regions.

Conclusions: The proposed methods for identifying common CNV regions in multiple individuals perform well compared to existing methods. The identified common regions can be used for downstream analyses such as group comparisons in association studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An example of a common CNV region found based on COVER method with threshold u = 2 and c = 60. This figure illustrates a common CNV region in part of chromosome 22, found using the COVER method with threshold u = 2 and confidence cutoff at 60th percentile. 41 out of 112 individuals have CNVs that overlap with this common region, indicated by the horizontal lines. We can see that despite being identified as a common region, the individual regions still portray a mixture phenomenon of several distinct subregions.
Figure 2
Figure 2
Hypothetical example of a identified common CNV region with 2 distinct clusters. Hypothetical situation where an identified common CNV region is common to four individuals. From the figure, it is clear that the common region consists of two partially overlapping regions. The first two individual regions are clustered together to the left of the common region, while the last two individual regions are clustered to the right.
Figure 3
Figure 3
Results of COVER method. (a) Discordant Rates, (b) Proportion of diallelic CNVs that failed HWE, (c) mean minor allele frequency (MAF) of diallelic CNVs and (d) Mean CNVs size (kilo-bases) as a function of confidence scores cut-off points and minimum number of individuals.
Figure 4
Figure 4
Results of COMPOSITE method. (a) Discordant Rates, (b) Proportion of diallelic CNVs that failed HWE, (c) mean minor allele frequency (MAF) of diallelic CNVs and (d) Median size of CNV regions (kb) as a function of composite confidence scores cut-off points. Solid line is median CNV size found by Kidd et al.
Figure 5
Figure 5
Results of applying CLUSTER to common regions identified by COVER method. (a) Average number of clusters, (b) rates of departure from HWE, (c) First and second components of PCA based on subjects' integer copy-number calls at common regions found using COVER (with u = 3 and c = 60), (d) First and second components of PCA based on subjects' integer copy-number calls at common regions found using complete-linkage CLUSTER (with cluster.limit = 0.6).
Figure 6
Figure 6
Comparison to McCarroll's CNVs. (a) Discordance rates when comparing regions found using COVER and those found by McCarroll et al., plotted against confidence score thresholds for different values of u. (b) Discordance rates when comparing regions found using COMPOSITE and those found by McCarroll et al., plotted against composite score thresholds.
Figure 7
Figure 7
The third and fourth principal components. (a) Using COVER (with u = 3 and c = 60). (b) the same as (a) but using the output of complete-linkage CLUSTER (with cluster.limit = 0.6).

References

    1. Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004;5:557–572. doi: 10.1093/biostatistics/kxh008. - DOI - PubMed
    1. Rueda OM, Diaz-Uriarte R. Flexible and accurate detection of genomic copy-number changes from aCGH. PLoS Computational Biology. 2007;3(6):e122. doi: 10.1371/journal.pcbi.0030122. - DOI - PMC - PubMed
    1. Erdman C, Emerson JW. A fast Bayesian change point analysis for the segmentation of microarray data. Bioinformatics. 2008;24:2143–2148. doi: 10.1093/bioinformatics/btn404. - DOI - PubMed
    1. Pique-Regi R. Sparse representation and Bayesian detection of genome copy number alterations from microarray data. Bioinformatics. 2008;24:309–3182. doi: 10.1093/bioinformatics/btm601. - DOI - PMC - PubMed
    1. Pique-Regi R. Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA. Bioinformatics. 2009;25(10):1223–1230. doi: 10.1093/bioinformatics/btp119. - DOI - PMC - PubMed

Publication types

LinkOut - more resources