. 2010;11(6):R69.

doi: 10.1186/gb-2010-11-6-r69. Epub 2010 Jun 28.

Estimating enrichment of repetitive elements from high-throughput sequence data

Daniel S Day¹, Lovelace J Luquette, Peter J Park, Peter V Kharchenko

Affiliations

PMID: 20584328
PMCID: PMC2911117
DOI: 10.1186/gb-2010-11-6-r69

Estimating enrichment of repetitive elements from high-throughput sequence data

Daniel S Day et al. Genome Biol. 2010.

. 2010;11(6):R69.

doi: 10.1186/gb-2010-11-6-r69. Epub 2010 Jun 28.

Authors

Daniel S Day¹, Lovelace J Luquette, Peter J Park, Peter V Kharchenko

Affiliation

¹ Harvard-MIT Health Sciences and Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.

PMID: 20584328
PMCID: PMC2911117
DOI: 10.1186/gb-2010-11-6-r69

Abstract

We describe computational methods for analysis of repetitive elements from short-read sequencing data, and apply them to study histone modifications associated with the repetitive elements in human and mouse cells. Our results demonstrate that while accurate enrichment estimates can be obtained for individual repeat types and small sets of repeat instances, there are distinct combinatorial patterns of chromatin marks associated with major annotated repeat families, including H3K27me3/H3K9me3 differences among the endogenous retroviral element classes.

PubMed Disclaimer

Figures

**Figure 1**
**Aggregating reads for repeat type enrichment estimation**. To increase the accuracy, the enrichment estimate combines reads mapping to the canonical repeat sequence and reads mapping to the body or boundaries of the repeat instances incorporated into the genome assembly. The calculation utilizes reads that align to multiple positions in the genome, if all such positions belong to the instances of the same repeat type. Reads aligning to more than one repeat type are excluded.

**Figure 2**
**Phylogenetic analysis of repeat enrichment patterns**. **(a)** Aiming to provide most informative estimates of repeat enrichment that can be attained for a given dataset, repeats are organized into a phylogenetic tree on the basis of read set similarity to maximize the number of uniquely assignable reads that can be used for enrichment estimation. The estimates are illustrated on the resulting tree branches using colors. The nodes of the tree represent sets of repetitive sequences. The gray labels show the fraction of total number of ChIP reads that map to a given set of sequences (node) that can be associated uniquely. The tree is constructed in a way that maximizes the number of additional uniquely associated reads gained at each step. For instance, considering repeats A and B together allows 1,000 uniquely associated ChIP reads to be to utilized for enrichment estimation, even though the sum of the reads uniquely associated with repeat A and repeat B separately is 600. The 400 additional reads are those that map to both A and B repeats, but do not map to any other repeats (in the same way the discarded read in Figure 1 maps to both C and D). The length of each branch corresponds to the number of the unique reads gained using a log scale when collapsing sequences of the descendant nodes into a single set. The statistical significance of the observed enrichment or depletion is shown as a Z-score (green numbers). Large positive Z-score values denote statistically significant enrichment (Z-score of 3.1 corresponds to a P-value of 10^-3), and negative values correspond to significant depletion. The Z-score magnitude is capped at 10. **(b)** A fragment of the enrichment phylogeny of the Repbase repeat types for H3K9me3 enrichment in mES cells. The example illustrates grouping of repeats from ERV-K class, all of which, with the exception of RLTR19-int, are highly enriched for the H3K9me3 modification. Additional examples are shown in Figure S4 of Additional file 1. **(c)** A small fragment of H3K9me3 enrichment phylogeny for the individual instances of the intracisternal A particle (IAP) interspersed repeats (IAPEz-int). The fragment clusters instances located within a specific region on chromosome X due to a high degree of sequence identity between them. While the lack of discriminating sequences precludes evaluation of each instance individually, considering nearly identical instances together allows the demonstration of statistically significant enrichment of this localized group of instances or the H3K9me3 mark in mES cells. LTR, long terminal repeat.

**Figure 3**
**Repeat enrichment patterns in mouse cell lines**. **(a)** Combinatorial patterns of repeat enrichment in mES cells. The repeat types (rows) were clustered according to the MLE enrichment in different marks (disregarding depletion; see Materials and methods), with red colors corresponding to enrichment, and blue colors corresponding to depletion. Repeat types that do not show statistically significant enrichment or depletion are shown in white. Prominent sets of repeat types are highlighted on the left-hand side (1 to 4; see text). The bottom part of the plot is omitted as it contains repeats devoid of enrichment in any examined modifications. See Figure S5 in Additional file 1 for a complete, magnified view showing all the repeat type labels. **(b)** An enlarged view of a portion of set 1, illustrating ERV1/ERV-K repeats enriched for H3K9me3 and H4K20me3. **(c)** A portion of set 2 showing enrichment for H3K4me3 and H3K27me3 at tRNA repeats.

**Figure 4**
**Comparison of histone modification profiles in mES, neuronal progenitor and mouse embryonic fibroblast cells**. The repeat types were clustered based on their enrichment profiles across four different histone marks in three cells, so that the order of repeat types is the same for each histone methylation mark shown. Green bars mark major clusters of enrichment, with numbers corresponding to mES clusters from Figure 3a. The orange bar marks a set of repeats, composed predominantly of LTRs, that acquires H3K27me3 in NP cells. See Figure S7 in Additional file 1 for a complete plot.

**Figure 5**
**Repeat enrichment patterns in human CD4+ T cells**. **(a)** To normalize read counts in the absence of input sequencing data, the enrichment values were estimated relative to other marks using distributions of enrichment coefficients observed for each repeat type (see Materials and methods). The plot shows enrichment coefficients for various chromatin marks for the HSAT6 repeat. H3K9me3 and H4K20me3 deviate from the nominal levels exhibited by most of the marks in a statistically significant way. **(b)** Similar to Figure 3a, the repeat types were clustered according to their combined enrichment estimates across measured chromatin marks. Part of the plot containing no enrichment clusters is omitted (see Figure S9 in Additional file 1 for a complete plot). Repeat families over-abundant within major clusters are labeled on the plot. **(c)** Enlarged view of the tRNA-dominated cluster located at the top of the plot shown in (b).

See this image and copyright information in PMC

Cited by

Transposable elements are regulated by context-specific patterns of chromatin marks in mouse embryonic stem cells.
He J, Fu X, Zhang M, He F, Li W, Abdul MM, Zhou J, Sun L, Chang C, Li Y, Liu H, Wu K, Babarinde IA, Zhuang Q, Loh YH, Chen J, Esteban MA, Hutchins AP. He J, et al. Nat Commun. 2019 Jan 3;10(1):34. doi: 10.1038/s41467-018-08006-y. Nat Commun. 2019. PMID: 30604769 Free PMC article.
Differential enrichment of H3K9me3 at annotated satellite DNA repeats in human cell lines and during fetal development in mouse.
Vojvoda Zeljko T, Ugarković Đ, Pezer Ž. Vojvoda Zeljko T, et al. Epigenetics Chromatin. 2021 Oct 18;14(1):47. doi: 10.1186/s13072-021-00423-6. Epigenetics Chromatin. 2021. PMID: 34663449 Free PMC article.
New players in heterochromatin silencing: histone variant H3.3 and the ATRX/DAXX chaperone.
Voon HP, Wong LH. Voon HP, et al. Nucleic Acids Res. 2016 Feb 29;44(4):1496-501. doi: 10.1093/nar/gkw012. Epub 2016 Jan 14. Nucleic Acids Res. 2016. PMID: 26773061 Free PMC article. Review.
PML protein organizes heterochromatin domains where it regulates histone H3.3 deposition by ATRX/DAXX.
Delbarre E, Ivanauskiene K, Spirkoski J, Shah A, Vekterud K, Moskaug JØ, Bøe SO, Wong LH, Küntziger T, Collas P. Delbarre E, et al. Genome Res. 2017 Jun;27(6):913-921. doi: 10.1101/gr.215830.116. Epub 2017 Mar 24. Genome Res. 2017. PMID: 28341773 Free PMC article.
Identifying and mitigating bias in next-generation sequencing methods for chromatin biology.
Meyer CA, Liu XS. Meyer CA, et al. Nat Rev Genet. 2014 Nov;15(11):709-21. doi: 10.1038/nrg3788. Epub 2014 Sep 16. Nat Rev Genet. 2014. PMID: 25223782 Free PMC article. Review.

See all "Cited by" articles

References

1. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009;10:669–680. doi: 10.1038/nrg2641. - DOI - PMC - PubMed
1. Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, Loh YH, Yeo HC, Yeo ZX, Narang V, Govindarajan KR, Leong B, Shahab A, Ruan Y, Bourque G, Sung WK, Clarke ND, Wei CL, Ng HH. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008;133:1106–1117. doi: 10.1016/j.cell.2008.04.043. - DOI - PubMed
1. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. doi: 10.1016/j.cell.2007.05.009. - DOI - PubMed
1. Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, Lee W, Mendenhall E, O'Donovan A, Presser A, Russ C, Xie X, Meissner A, Wernig M, Jaenisch R, Nusbaum C, Lander ES, Bernstein BE. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448:553–560. doi: 10.1038/nature06008. - DOI - PMC - PubMed
1. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Estimating enrichment of repetitive elements from high-throughput sequence data

Affiliation

Estimating enrichment of repetitive elements from high-throughput sequence data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources