. 2011 Nov 8;10(1):Article 52.

doi: 10.2202/1544-6115.1732.

Modeling read counts for CNV detection in exome sequencing data

Michael I Love¹, Alena Myšičková, Ruping Sun, Vera Kalscheuer, Martin Vingron, Stefan A Haas

Affiliations

PMID: 23089826
PMCID: PMC3517018
DOI: 10.2202/1544-6115.1732

Modeling read counts for CNV detection in exome sequencing data

Michael I Love et al. Stat Appl Genet Mol Biol. 2011.

. 2011 Nov 8;10(1):Article 52.

doi: 10.2202/1544-6115.1732.

Authors

Michael I Love¹, Alena Myšičková, Ruping Sun, Vera Kalscheuer, Martin Vingron, Stefan A Haas

Affiliation

¹ Max Planck Institute for Molecular Genetics.

PMID: 23089826
PMCID: PMC3517018
DOI: 10.2202/1544-6115.1732

Abstract

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

PubMed Disclaimer

Figures

None — Figure 1: Distribution of read counts in windows covering the CCDS regions of chromosome 1 for one exome sequencing sample, cropped at 100 reads per window.

See this image and copyright information in PMC

References

1. 1000 Genomes Project Consortium (2010): “A map of human genome variation from population-scale sequencing,” Nature, 467, 1061–1073. - PMC - PubMed
1. Alkan, C., J. M. Kidd, T. Marques-Bonet, G. Aksay, F. Antonacci, F. Hormozdiari, J. O. Kitzman, C. Baker, M. Malig, O. Mutlu, S. C. Sahinalp, R. A. Gibbs, and E. E. Eichler (2009): “Personalized copy number and segmental duplication maps using next-generation sequencing,” Nature Genetics, 41, 1061–1067. - PMC - PubMed
1. Anders, S. and W. Huber (2010): “Differential expression analysis for sequence count data.” Genome biology, 11, R106+. - PMC - PubMed
1. Benjamini, Y. and T. P. Speed (2011): “Estimation and correction for GC-content bias in high throughput sequencing,” Technical report, University of California at Berkeley.
1. Bliss, C. I. and R. A. Fisher (1953): “Fitting the Negative Binomial Distribution to Biological Data,” Biometrics, 9.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modeling read counts for CNV detection in exome sequencing data

Affiliation

Modeling read counts for CNV detection in exome sequencing data

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous