Repetitive DNA and next-generation sequencing: computational challenges and solutions
- PMID: 22124482
- PMCID: PMC3324860
- DOI: 10.1038/nrg3117
Repetitive DNA and next-generation sequencing: computational challenges and solutions
Erratum in
- Nat Rev Genet. 2012 Feb;13(2):146
Abstract
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.
Conflict of interest statement
The authors declare no competing financial interests.
Figures





Similar articles
-
A sensitive repeat identification framework based on short and long reads.Nucleic Acids Res. 2021 Sep 27;49(17):e100. doi: 10.1093/nar/gkab563. Nucleic Acids Res. 2021. PMID: 34214175 Free PMC article.
-
Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs.Bioinformatics. 2002 Mar;18(3):379-88. doi: 10.1093/bioinformatics/18.3.379. Bioinformatics. 2002. PMID: 11934736
-
Computational and bioinformatics frameworks for next-generation whole exome and genome sequencing.ScientificWorldJournal. 2013;2013:730210. doi: 10.1155/2013/730210. Epub 2013 Jan 13. ScientificWorldJournal. 2013. PMID: 23365548 Free PMC article. Review.
-
Alignment of Next-Generation Sequencing Reads.Annu Rev Genomics Hum Genet. 2015;16:133-51. doi: 10.1146/annurev-genom-090413-025358. Epub 2015 May 4. Annu Rev Genomics Hum Genet. 2015. PMID: 25939052 Review.
-
ReRep: computational detection of repetitive sequences in genome survey sequences (GSS).BMC Bioinformatics. 2008 Sep 9;9:366. doi: 10.1186/1471-2105-9-366. BMC Bioinformatics. 2008. PMID: 18782453 Free PMC article.
Cited by
-
A practical method to detect SNVs and indels from whole genome and exome sequencing data.Sci Rep. 2013;3:2161. doi: 10.1038/srep02161. Sci Rep. 2013. PMID: 23831772 Free PMC article.
-
Histone H3.3 is required for endogenous retroviral element silencing in embryonic stem cells.Nature. 2015 Jun 11;522(7555):240-244. doi: 10.1038/nature14345. Epub 2015 May 4. Nature. 2015. PMID: 25938714 Free PMC article.
-
Genome analysis and avirulence gene cloning using a high-density RADseq linkage map of the flax rust fungus, Melampsora lini.BMC Genomics. 2016 Aug 22;17(1):667. doi: 10.1186/s12864-016-3011-9. BMC Genomics. 2016. PMID: 27550217 Free PMC article.
-
Hybrid assembly using long reads resolves repeats and completes the genome sequence of a laboratory strain of Staphylococcus aureus subsp. aureus RN4220.Heliyon. 2022 Nov 2;8(11):e11376. doi: 10.1016/j.heliyon.2022.e11376. eCollection 2022 Nov. Heliyon. 2022. PMID: 36387480 Free PMC article.
-
Identification of microRNAs and gene regulatory networks in cleft lip common in humans and mice.Hum Mol Genet. 2021 Sep 15;30(19):1881-1893. doi: 10.1093/hmg/ddab151. Hum Mol Genet. 2021. PMID: 34104955 Free PMC article.
References
-
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods. 2008;5:621–628. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources