Analysis and benchmarking of small and large genomic variants across tandem repeats
- PMID: 38671154
- PMCID: PMC11952744
- DOI: 10.1038/s41587-024-02225-z
Analysis and benchmarking of small and large genomic variants across tandem repeats
Abstract
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.
© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.
Conflict of interest statement
Competing interests: F.J.S. receives research support from Illumina, Genentech, PacBio and ONT. E.D. and M.A.E. are employees and shareholders of PacBio. S.K.M. is an employee and shareholder of ONT. W.D.C. has received free consumables from ONT. The other authors declare no competing interests.
Figures




Update of
-
Benchmarking of small and large variants across tandem repeats.bioRxiv [Preprint]. 2023 Nov 1:2023.10.29.564632. doi: 10.1101/2023.10.29.564632. bioRxiv. 2023. Update in: Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. PMID: 37961319 Free PMC article. Updated. Preprint.
Similar articles
-
Benchmarking of small and large variants across tandem repeats.bioRxiv [Preprint]. 2023 Nov 1:2023.10.29.564632. doi: 10.1101/2023.10.29.564632. bioRxiv. 2023. Update in: Nat Biotechnol. 2025 Mar;43(3):431-442. doi: 10.1038/s41587-024-02225-z. PMID: 37961319 Free PMC article. Updated. Preprint.
-
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948. Health Technol Assess. 2024. PMID: 39367772 Free PMC article.
-
Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies.Am J Hum Genet. 2021 May 6;108(5):919-928. doi: 10.1016/j.ajhg.2021.03.014. Epub 2021 Mar 30. Am J Hum Genet. 2021. PMID: 33789087 Free PMC article.
-
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
-
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.Cochrane Database Syst Rev. 2008 Jul 16;(3):CD001230. doi: 10.1002/14651858.CD001230.pub2. Cochrane Database Syst Rev. 2008. PMID: 18646068
Cited by
-
A Hitchhiker's Guide to long-read genomic analysis.Genome Res. 2025 Apr 14;35(4):545-558. doi: 10.1101/gr.279975.124. Genome Res. 2025. PMID: 40228901 Review.
-
STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci.Genome Med. 2025 Mar 26;17(1):29. doi: 10.1186/s13073-025-01454-4. Genome Med. 2025. PMID: 40140942 Free PMC article.
-
TRGT-denovo: accurate detection of de novo tandem repeat mutations.bioRxiv [Preprint]. 2024 Jul 19:2024.07.16.600745. doi: 10.1101/2024.07.16.600745. bioRxiv. 2024. PMID: 39071386 Free PMC article. Preprint.
-
The Platinum Pedigree: a long-read benchmark for genetic variants.Nat Methods. 2025 Aug;22(8):1669-1676. doi: 10.1038/s41592-025-02750-y. Epub 2025 Aug 4. Nat Methods. 2025. PMID: 40759746
-
Enhanced detection and genotyping of disease-associated tandem repeats using HMMSTR and targeted long-read sequencing.Nucleic Acids Res. 2025 Jan 11;53(2):gkae1202. doi: 10.1093/nar/gkae1202. Nucleic Acids Res. 2025. PMID: 39676678 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources