Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Oct 12;17(1):799.
doi: 10.1186/s12864-016-3074-7.

CNARA: reliability assessment for genomic copy number profiles

Affiliations

CNARA: reliability assessment for genomic copy number profiles

Ni Ai et al. BMC Genomics. .

Abstract

Background: DNA copy number profiles from microarray and sequencing experiments sometimes contain wave artefacts which may be introduced during sample preparation and cannot be removed completely by existing preprocessing methods. Besides, large derivative log ratio spread (DLRS) of the probes correlating with poor DNA quality is sometimes observed in genome screening experiments and may lead to unreliable copy number profiles. Depending on the extent of these artefacts and the resulting misidentification of copy number alterations/variations (CNA/CNV), it may be desirable to exclude such samples from analyses or to adapt the downstream data analysis strategy accordingly.

Results: Here, we propose a method to distinguish reliable genomic copy number profiles from those containing heavy wave artefacts and/or large DLRS. We define four features that adequately summarize the copy number profiles for reliability assessment, and train a classifier on a dataset of 1522 copy number profiles from various microarray platforms. The method can be applied to predict the reliability of copy number profiles irrespective of the underlying microarray platform and may be adapted for those sequencing platforms from which copy number estimates could be computed as a piecewise constant signal. Further details can be found at https://github.com/baudisgroup/CNARA .

Conclusions: We have developed a method for the assessment of genomic copy number profiling data, and suggest to apply the method in addition to and after other state-of-the-art noise correction and quality control procedures. CNARA could be instrumental in improving the assessment of data used for genomic data mining experiments and support the reliable functional attribution of copy number aberrations especially in cancer research.

Keywords: CNA; Copy number profile; Reliability assessment.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
An example of simulated copy number profiles. Samples of 10,000 dimensions (n=10,000) were generated for 3 reliability groups, i.e. A: reliable copy number profiles containing many CNAs; B: unreliable copy number profiles having indiscernible CNAs due to wave artefacts; C: hyper-segmented copy number profiles. In group A, both CBS and step-fitting recovered majority of copy number segments well; In group B, both methods fitted noise and CBS detected more change-points than step-fitting; In group C, CBS detected too many change-points whereas step-fitting recovered majority of segments well
Fig. 2
Fig. 2
CBS versus step-fitting for 200 set of simulated copy number profiles. Scatter plot of the number of segments detected by CBS (purple) and step-fitting (green), plotted against the number of true CNA segments. In group A, the number of segments recovered by CBS approximates true CNA segments very well; the number of segments recovered by CBS is close to that of step-fitting. In group B, both methods fitted noise in which CBS found more change-points than step-fitting in general. In group C, CBS recovered much more segments than the number of true segments, whereas step-fitting found majority of CNA segments well
Fig. 3
Fig. 3
Five example specimen of copy number profiles for each of the reliability groups. 3a: Case 1, hyper-segmented, discernible CNAs with some waves; 3b: Case 2, reliable, discernible CNAs with few waves; 3c: Case 3, unreliable, indiscernible CNAs with heavy waves; 3d: Case 4, unreliable, large DLRS, undetectable CNAs; 3e: Case 5, reliable, control sample or without many CNAs. In each subgraph 3a to 3e, the upper panel shows the copy number profile segmented by the CBS algorithm, and the lower panel displays the same copy number profile segmented by step-fitting in the optimal iteration when S peak was attained, where the red line is the fit and the blue line is the counter-fit. In Fig. 3 f, S values are shown for the same five copy number profiles. For each curve the S-values for 120 iterations are shown. The GEO accession numbers [29] for the five cases are: Case 1, GSM360756 [30]; Case 2, GSM491138 [31]; Case 3, GSM360643 [30]; Case 4, GSM187938 [32]; and Case 5, GSM182894 [33]
Fig. 4
Fig. 4
CNARA versus medASP. ROC plots of CNARA and medASP on the validation set. The 1522 samples were split into training and validation sets (50:50 %) at random, and the SVM classifier of CNARA was trained on the training set. The AUC of CNARA is 0.9994, compared to the AUC of medASP (0.7372)

References

    1. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science. 1992;258(5083):818–21. doi: 10.1126/science.1359641. - DOI - PubMed
    1. du Manoir S, Speicher MR, Joos S, Schröck E, Popp S, Döhner H, Kovacs G, Robert-Nicoud M, Lichter P, Cremer T. Detection of complete and partial chromosome gains and losses by comparative genomic in situ hybridization. Hum Genet. 1993;90(6):590–610. doi: 10.1007/BF00202476. - DOI - PubMed
    1. van de Wiel MA, Brosens R, Eilers PH, Kumps C, Meijer GA, Menten B, Sistermans E, Speleman F, Timmerman ME, Ylstra B. Smoothing waves in array CGH tumor profiles. Bioinformatics. 2009;25(9):1099–104. doi: 10.1093/bioinformatics/btp132. - DOI - PubMed
    1. Redon R, Carter NP. Comparative genomic hybridization: microarray design and data interpretation. DNA Microarrays for Biomedical Research: Methods and Protocols. 2009:37–49. - PMC - PubMed
    1. Koren A, Handsaker RE, Kamitaki N, Karlić R, Ghosh S, Polak P, Eggan K, McCarroll SA. Genetic variation in human DNA replication timing. Cell. 2014;159(5):1015–26. doi: 10.1016/j.cell.2014.10.025. - DOI - PMC - PubMed

Publication types

LinkOut - more resources