Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep 15;27(18):2601-2.
doi: 10.1093/bioinformatics/btr446. Epub 2011 Jul 29.

ContEst: estimating cross-contamination of human samples in next-generation sequencing data

Affiliations

ContEst: estimating cross-contamination of human samples in next-generation sequencing data

Kristian Cibulskis et al. Bioinformatics. .

Abstract

Summary: Here, we present ContEst, a tool for estimating the level of cross-individual contamination in next-generation sequencing data. We demonstrate the accuracy of ContEst across a range of contamination levels, sources and read depths using sequencing data mixed in silico at known concentrations. We applied our tool to published cancer sequencing datasets and report their estimated contamination levels.

Availability and implementation: ContEst is a GATK module, and distributed under a BSD style license at http://www.broadinstitute.org/cancer/cga/contest

Contact: kcibul@broadinstitute.org; gadgetz@broadinstitute.org

Supplementary information: Supplementary data is available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A) False positive somatic mutations detected per megabase on in silico contaminated data; most cancers have ~1 true event per megabase (B) accuracy with single contaminating sample (C) accuracy with multiple contaminating samples (D) accuracy with respect to read depth; shaded areas indicate 95% confidence interval (E) contamination estimates of TCGA Ovarian dataset.

Similar articles

Cited by

References

    1. Berger M.F., et al. The genomic complexity of primary human prostate cancer. Nature. 2011;470:214–220. - PMC - PubMed
    1. Chapman M.A., et al. Initial genome sequencing and analysis of multiple myeloma. Nature. 2011;471:467–472. - PMC - PubMed
    1. Gnirke A., et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 2009;27:182–189. - PMC - PubMed
    1. Li H., et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. - PMC - PubMed
    1. McKenna A., et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. - PMC - PubMed

Publication types