Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Apr 19:arXiv:2304.09729v1.

De novo reconstruction of satellite repeat units from sequence data

Affiliations

De novo reconstruction of satellite repeat units from sequence data

Yujie Zhang et al. ArXiv. .

Update in

Abstract

Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.

PubMed Disclaimer

Conflict of interest statement

6COMPETING INTEREST STATEMENT H.L. is a consualtant for Integrated DNA Technologies, Inc.

Figures

Fig. 1.
Fig. 1.
Normalized counts of 179-mers in three A. thaliana read datasets. Raw 179-mer counts in reads are normalized by coverage. A 179-mer is selected in the plot if it matches the CEN180 satellite and if its normalized count is at least 50 in one of the datasets. (a) Counts between two different samples from the same strain. (b) Counts between two different strains.
Fig. 2.
Fig. 2.
Abundance of satellite DNA in 21 species.

Similar articles

References

    1. Altemose N. (2022). A classical revival: Human satellite dnas enter the genomics era. Semin Cell Dev Biol, 128:2–14. - PubMed
    1. Altemose N., Logsdon G. A., Bzikadze A. V., Sidhwani P., Langley S. A., Caldas G. V., Hoyt S. J., Uralsky L., Ryabov F. D., Shew C. J., et al. (2022). Complete genomic and epigenetic maps of human centromeres. Science, 376:eabl4178. - PMC - PubMed
    1. Ananiev E. V., Phillips R. L., and Rines H. W. (1998a). Chromosome-specific molecular organization of maize (zea mays l.) centromeric regions. Proc Natl Acad Sci U S A, 95:13073–8. - PMC - PubMed
    1. Ananiev E. V., Phillips R. L., and Rines H. W. (1998b). Complex structure of knob dna on maize chromosome 9. retrotransposon invasion into heterochromatin. Genetics, 149(4):2025–37. - PMC - PubMed
    1. Arora U. P., Charlebois C., Lawal R. A., and Dumont B. L. (2021). Population and subspecies diversity at mouse centromere satellites. BMC Genomics, 22:279. - PMC - PubMed

Publication types

LinkOut - more resources