Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov 1;29(21):2678-82.
doi: 10.1093/bioinformatics/btt479. Epub 2013 Sep 16.

Use of autocorrelation scanning in DNA copy number analysis

Affiliations

Use of autocorrelation scanning in DNA copy number analysis

Liangcai Zhang et al. Bioinformatics. .

Abstract

Motivation: Data quality is a critical issue in the analyses of DNA copy number alterations obtained from microarrays. It is commonly assumed that copy number alteration data can be modeled as piecewise constant and the measurement errors of different probes are independent. However, these assumptions do not always hold in practice. In some published datasets, we find that measurement errors are highly correlated between probes that interrogate nearby genomic loci, and the piecewise-constant model does not fit the data well. The correlated errors cause problems in downstream analysis, leading to a large number of DNA segments falsely identified as having copy number gains and losses.

Method: We developed a simple tool, called autocorrelation scanning profile, to assess the dependence of measurement error between neighboring probes.

Results: Autocorrelation scanning profile can be used to check data quality and refine the analysis of DNA copy number data, which we demonstrate in some typical datasets.

Contact: lzhangli@mdanderson.org.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Typical ASP patterns. (A) Piecewise-constant CNP; (B) High ASP throughout genome; (C) Gradual changes in CNP; (D) CNP fluctuates rapidly. Data source: Case A is from sample GSM315239 in GEO dataset GSE12532 (Hallor et al., 2009); Case B from sample GSM487724 in GSE19574 (Uchida et al., 2010); Case C from sample GSM315235, GEO accession number is GSE12532 (Hallor et al., 2009); and Case D is from GSM535545, GEO accession number is GSE21420 (Barrow et al., 2011). In each case, the top shows the log-transformed CNP. The red points and the black points in the CNP profile show the copy number data of individual SNP sites. The green curve shows denoised CNP using Tukey's running median smoothing. The bottom shows the ASP. The horizontal black line, around 0.2, marks the threshold value obtained from random permuted data. Points above the line have P < 0.01. All data presented in this figure are from the same microarray platform CGH 244A manufactured by Agilent technologies
Fig. 2.
Fig. 2.
Boxplots of ASPs. (A) FFPE samples; (B) Fresh-frozen samples. Data source: GEO accession number GSE17047 (stage II colorectal cancer, tissue samples) for the A and B set. The data are generated using Agilent HD CGH Microarray 2 × 105 k array. The boxplots show the inter-quartile ranges
Fig. 3.
Fig. 3.
Simulation results. (A) Relationship between median of ASP and number of segments according to CBS algorithm. (B) Relationship between median of ASP and FP rate, FN rate, TP rate and TN rate. CNP data are generated using random values with no significant copy number changes. The size of each CNP is 100 000. Autocorrelation is incorporated through coupling the signals of neighboring probes (see Methods)
Fig. 4.
Fig. 4.
High ASP corresponds to hypersegmentation. The relationship between median ASP and number of segments identified by the CBS algorithm is shown. Data source: the data are from Thompson et al. (2011)
Fig. 5.
Fig. 5.
Distribution of correlation of measurement errors between neighboring probes. The raw data are from GEO Web site with accession number GSE5173. The dataset is from a healthy population and is used as normal controls for normalization. Array platform: Affymetrix Mapping 250 k Nsp SNP array. Black line denotes the distribution of correlation of errors in the copy number estimates between neighboring SNP sites. The red line shows the distribution of correlation using randomly permuted residuals

References

    1. Abyzov A, et al. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21:974–984. - PMC - PubMed
    1. Ahmad A, Iqbal MA. Significance of genome-wide analysis of copy number alterations and UPD in myelodysplastic syndromes using combined CGH-SNP arrays. Curr. Med. Chem. 2012;19:3739–3747. - PubMed
    1. Baross A, et al. Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics. 2007;8:368. - PMC - PubMed
    1. Barrow J, et al. Homozygous loss of ADAM3A revealed by genome-wide analysis of pediatric high-grade glioma and diffuse intrinsic pontine gliomas. Neuro. Oncol. 2011;13:212–222. - PMC - PubMed
    1. Broet P, Richardson S. Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model. Bioinformatics. 2006;22:911–918. - PubMed

Publication types