Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 15;26(20):2501-8.
doi: 10.1093/bioinformatics/btq460. Epub 2010 Sep 24.

A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags

Affiliations

A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags

Jianrong Wang et al. Bioinformatics. .

Abstract

Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is widely used in biological research. ChIP-seq experiments yield many ambiguous tags that can be mapped with equal probability to multiple genomic sites. Such ambiguous tags are typically eliminated from consideration resulting in a potential loss of important biological information.

Results: We have developed a Gibbs sampling-based algorithm for the genomic mapping of ambiguous sequence tags. Our algorithm relies on the local genomic tag context to guide the mapping of ambiguous tags. The Gibbs sampling procedure we use simultaneously maps ambiguous tags and updates the probabilities used to infer correct tag map positions. We show that our algorithm is able to correctly map more ambiguous tags than existing mapping methods. Our approach is also able to uncover mapped genomic sites from highly repetitive sequences that can not be detected based on unique tags alone, including transposable elements, segmental duplications and peri-centromeric regions. This mapping approach should prove to be useful for increasing biological knowledge on the too often neglected repetitive genomic regions.

Availability: http://esbg.gatech.edu/jordan/software/map

Contact: king.jordan@biology.gatech.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Scheme of our Glibbs sampling algorithm. Possible tag map sites along with their likelihood ratios are shown prior to stochastic mapping. Gray boxes represent incorrect sites, and the black box represents the correct site. An arrow between a tag and a site means the tag could possibly be mapped to that site. One iterative cycle of joint stochastic mapping and parameter updating is shown. The black arrows point to selected sites for each tag after stochastic mapping.
Fig. 2.
Fig. 2.
Fractions of correctly mapped ambiguous tags for each library. Library descriptions are given in Supplementary Table S1. Gray bars show result based on MAQ, and black bars show results based on our Gibbs sampling algorithm.
Fig. 3.
Fig. 3.
Compariion of algorithm performance. (A) Illustration of data used to test algorithm performance. (B) Variant tag count thresholds could used in the algorithm tests. (C) Recall and precision fractions for map sites are shown for the algorithms compared here (MAQ, blue; fraction method, dark blue; Gibbs sampling method, green) over eight tag libraries. (D) Recall and precision are shown for the larger tag library across three tag thresholds.
Fig. 4.
Fig. 4.
Examples of ambiguous tag mapping results. Tracks are shown through UCSC Genome Browser. The track of real sites shows the sites in the benchmark libraries. The track of Fraction method shows the mapping result by fraction method and the track of Gibbs method shows the mapping result by our Gibbs method. The heights of data represent the number of tags mapped to those sites. The tracks of repetitive genomic regions (segmental duplications, interspersed repeats and simple repeats) are also shown.
Fig. 5.
Fig. 5.
(A) The number of correctly discovered sites in various genomic features by unique tags alone (white) and our Gibbs method (black) compared with the corresponding numbers in the benchmark library. (B) The fractions of correctly discovered sites in various genomic features by unique tag alone (white) and our Gibbs method (Black). [TE, transposable element; s_r, simple repeats; microSat, microsatellites; seg_dup, segmental duplication; centro, peri-centromeric region].

Similar articles

Cited by

References

    1. Barski A, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. - PubMed
    1. Bock C, Lengauer T. Computational epigenetics. Bioinformatics. 2008;24:1–10. - PubMed
    1. Faulkner GJ, et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics. 2008;91:281–288. - PubMed
    1. Feschotte C. Transposable elements and the evolution of regulatory networks. Nat. Rev. Genet. 2008;9:397–405. - PMC - PubMed
    1. Hashimoto T, et al. Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite. Bioinformatics. 2009;25:2613–2614. - PubMed

Publication types

Substances