Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr;26(4):295-304.
doi: 10.1089/cmb.2018.0143. Epub 2019 Feb 21.

Measuring DNA Copy Number Variation Using High-Density Methylation Microarrays

Affiliations

Measuring DNA Copy Number Variation Using High-Density Methylation Microarrays

Soonweng Cho et al. J Comput Biol. 2019 Apr.

Abstract

Genetic and epigenetic changes drive carcinogenesis, and their integrated analysis provides insights into mechanisms of cancer development. Computational methods have been developed to measure copy number variation (CNV) from methylation array data, including ChAMP-CNV, CN450K, and, introduced here, Epicopy. Using paired single nucleotide polymorphism (SNP) and methylation array data from the public The Cancer Genome Atlas repository, we optimized CNV calling and benchmarked the performance of these methods. We optimized the thresholds of all three methods and showed comparable performance across methods. Using Epicopy as a representative analysis of Illumina450K array, we show that Illumina450K-derived CNV methods achieve a sensitivity of 0.7 and a positive predictive value of 0.75 in identifying CNVs, which is similar to results achieved when comparing competing SNP microarray platforms with each other.

Keywords: CNV; TCGA; copy number variation; methylation microarray; microarray.

PubMed Disclaimer

Conflict of interest statement

The authors declare that no competing financial interests exist.

Figures

<b>FIG. 1.</b>
FIG. 1.
Probe coverage of HM450K. Probe coverage of Illumina Infinium Human Methylation 450K microarray (450K) across the human genome. To better highlight the distribution of probes across the genome, chromosomes are not to scale compared with each other. Despite having only 485,577 probes across the genome, there is good coverage of all but 1 autosome (Ch 21) and Chromosome X. This implies that CNV can be estimated well for most of the genome. Epicopy is written to profile the CNV on autosomes, excluding Chromosome X and Y. CNV, copy number variation; HM450K, Illumina Human Methylation 450K microarray.
<b>FIG. 2.</b>
FIG. 2.
Epicopy results and metrics. (a) Representative example of copy number profiles of the same breast cancer TCGA sample from SNP array (top) and Epicopy generated segmentation values (bottom). Note the lower copy number values on the Epicopy-derived CNV information. The y-axis represents copy number or LRR. The x-axis represents genomic location with each dotted red line signaling a transition across chromosomes. The horizontal blue and black bars represent the segments of that sample with alternating colors signifying chromosome transition. The dotted horizontal line represents the threshold of making a CNV call. (b) Performance of gene-level Epicopy calls against SNP analysis in three TCGA datasets; THCA, BRCA, and LUSC using a copy number (CN) threshold of 0.15 and 200 probes per segment. In the top panels, the tan line represents specificity whereas the black line represents sensitivity. The bottom panel shows the concordance, or accuracy, of gene-level data. BRCA, breast carcinoma; CN; LRR, log R ratio; LUSC, lung squamous cell carcinoma; SNP, single nucleotide polymorphism; TCGA, The Cancer Genome Atlas; THCA, thyroid carcinoma. (c) Reproducibility Index as a function of CN-altered genes for a given sample across 3 TCGA datasets. Each point is a sample and the blue line represents the local regression line generated using the Locfit function in R. The vertical dotted grey line represents the average number of CN-altered genes of samples for a tumor type.
<b>FIG. 3.</b>
FIG. 3.
Percent CNV from the SNP6 array detected by Epicopy as a function of LRR from SNP6 microarray. (a) Each point represents the average of all segments identified by SNP array in the THCA dataset, disregarding the length or probe number within the segment. The x-axis represents the LRR of the segment in the SNP array, and the y-axis represents the percent of those segments identified by Epicopy. The blue line is the local regression line fitted using the locfit function in R. (b) Comparison of the GISTIC results obtained by SNP analysis and Epicopy-derived values. Left panel: Frequent (recurrent) amplifications identified by SNP (left)- and Epicopy (right)-derived results. Right panel: Frequent deletions identified by SNP (left)- and Epicopy (right)-derived CNV results. There was 72% overlap between the recurrently altered peaks identified across both platforms. GISTIC, Genomic Identification of Significant Targets in Cancer.

References

    1. Aryee M.J., Jaffe A.E., Corrada-Bravo H., et al. . 2014. Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 - PMC - PubMed
    1. Baumbusch L.O., Aaroe J., Johansen F.E., et al. . 2008. Comparison of the Agilent, ROMA/NimbleGen and Illumina platforms for classification of copy number alterations in human breast tumors. BMC Genomics 9, 379. - PMC - PubMed
    1. Cancer Genome Atlas, N. 2012. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 - PMC - PubMed
    1. Cancer Genome Atlas Research, N. 2012. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 - PMC - PubMed
    1. Cancer Genome Atlas Research, N. 2014. Integrated genomic characterization of papillary thyroid carcinoma. Cell 159, 676–690 - PMC - PubMed

Publication types

LinkOut - more resources