This is a preprint.
Benchmarking of small and large variants across tandem repeats
- PMID: 37961319
- PMCID: PMC10634962
- DOI: 10.1101/2023.10.29.564632
Benchmarking of small and large variants across tandem repeats
Update in
-
Analysis and benchmarking of small and large genomic variants across tandem repeats.Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. Epub 2024 Apr 26. Nat Biotechnol. 2025. PMID: 38671154 Free PMC article.
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ∼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.
Similar articles
-
Analysis and benchmarking of small and large genomic variants across tandem repeats.Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. Epub 2024 Apr 26. Nat Biotechnol. 2025. PMID: 38671154 Free PMC article.
-
Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery.BMC Genomics. 2022 Feb 22;23(1):155. doi: 10.1186/s12864-022-08365-3. BMC Genomics. 2022. PMID: 35193511 Free PMC article.
-
A Census of Tandemly Repeated Polymorphic Loci in Genic Regions Through the Comparative Integration of Human Genome Assemblies.Front Genet. 2018 May 2;9:155. doi: 10.3389/fgene.2018.00155. eCollection 2018. Front Genet. 2018. PMID: 29770143 Free PMC article.
-
Statistical approaches to detecting and analyzing tandem repeats in genomic sequences.Front Bioeng Biotechnol. 2015 Mar 17;3:31. doi: 10.3389/fbioe.2015.00031. eCollection 2015. Front Bioeng Biotechnol. 2015. PMID: 25853125 Free PMC article. Review.
-
Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences.Genome Res. 2022 Jan;32(1):1-27. doi: 10.1101/gr.269530.120. Epub 2021 Dec 29. Genome Res. 2022. PMID: 34965938 Free PMC article. Review.
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources