Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 1;33(15):2266-2272.
doi: 10.1093/bioinformatics/btx156.

cnAnalysis450k: an R package for comparative analysis of 450k/EPIC Illumina methylation array derived copy number data

Affiliations

cnAnalysis450k: an R package for comparative analysis of 450k/EPIC Illumina methylation array derived copy number data

Maximilian Knoll et al. Bioinformatics. .

Abstract

Motivation: Detailed copy number (CN) variation data can be obtained from 450k or EPIC Illumina methylation assays. However, the effects of different preprocessing strategies (normalization, transformation and selection of gain/loss cutoff values) on variant calling have not been evaluated systematically.

Results: We provide an R package which allows to directly compare any preprocessed CN data. It provides its own CN alteration detection methodology: segments are identified through detection of changes in variance of CN data and are subsequently filtered for significance. Meaningful cutoffs for gain/loss definition can be identified automatically through analysis of the resulting ΔCN distributions of all analyzed samples. Three exemplary datasets (2x450k, 1xEPIC) were selected for comparative analyses of Raw, Illumina, SWAN, Quantile, Noob, Funnorm and Dasen normalizations. Importantly, all CN data distributions were skewed (-0.66 to -1.2) therefore requiring different gain/loss cutoffs. Depending on the normalization method, prominent baseline differences between samples could be observed. We present a workflow, which alleviates both issues: Z-transformation removes baseline differences between samples, and automatic cutoff selection circumvents the problems accompanying the skewed distributions. Additional filtering of candidates by significance yields comparable results for most enumerated normalization methods except for SWAN. In contrast, manual cutoff determination results in highly variable numbers of variant calls, highly dependent on the selected normalization method. Taken together, we present a workflow which allows to robustly identify copy number alterations in methylation array data fairly independent of the applied normalization.

Availability and implementation: The cnAnalysis450k package is available on github ( https://github.com/mknoll/cnAnalysis450k ).

Contact: m.knoll@dkfz.de.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

LinkOut - more resources