Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul;31(7):738-743.
doi: 10.1038/s41431-023-01352-6. Epub 2023 Apr 13.

STRavinsky STR database and PGTailor PGT tool demonstrate superiority of CHM13-T2T over hg38 and hg19 for STR-based applications

Affiliations

STRavinsky STR database and PGTailor PGT tool demonstrate superiority of CHM13-T2T over hg38 and hg19 for STR-based applications

Noam Hadar et al. Eur J Hum Genet. 2023 Jul.

Abstract

Short-Tandem-Repeats (STRs) have long been studied for possible roles in biological phenomena, and are utilized in multiple applications such as forensics, evolutionary studies and pre-implantation-genetic-testing (PGT). The two reference genomes most used by clinicians and researchers are GRCh37/hg19 and GRCh38/hg38, both constructed using mainly short-read-sequencing (SRS) in which all-STR-containing-reads cannot be assembled to the reference genome. With the introduction of long-read-sequencing (LRS) methods and the generation of the CHM13 reference genome, also known as T2T, many previously unmapped STRs were finally localized within the human genome. We generated STRavinsky, a compact STR database for three reference genomes, including T2T. We proceeded to demonstrate the advantages of T2T over hg19 and hg38, identifying nearly double the number of STRs throughout all chromosomes. Through STRavinsky, providing a resolution down to a specific genomic coordinate, we demonstrated extreme propensity of TGGAA repeats in p arms of acrocentric chromosomes, substantially corroborating early molecular studies suggesting a possible role in formation of Robertsonian translocations. Moreover, we delineated unique propensity of TGGAA repeats specifically in chromosome 16q11.2 and in 9q12. Finally, we harness the superior capabilities of T2T and STRavinsky to generate PGTailor, a novel web application dramatically facilitating design of STR-based PGT tests in mere minutes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Comparison of short-tandem-repeat amounts between hg19, hg38 and T2T reference genomes.
Count of different Short-Tandem-Repeats per chromosome as calculated from STRavinsky, a short-tandem-repeat database for the three most used reference genomes in chromosomes 1–22 and X (A) and in chromosome Y (B).
Fig. 2
Fig. 2. TGGAA short-tandem-repeats in human acrocentric chromosomes are more abundant in T2T compared with hg38 in orders of magnitude.
STRavinsky was used to find most repeating sequence patterns in each chromosome for human reference genomes hg38 (blue) and T2T (red). For each repeating sequence, the absolute value of the fold-change between the two reference genomes was calculated and used for sorting the results from highest to lowest. Top 10 most variable STRs are shown.
Fig. 3
Fig. 3. TGGAA short-tandem-repeats of acrocentric chromosomes in T2T.
Coordinates in which an STR with the base sequence of TGGAA appears are marked by blue vertical lines on the idiograms. Idiograms were generated by UCSC genome browser (https://genome.ucsc.edu).
Fig. 4
Fig. 4. TGGAA short-tandem-repeats of chromosomes 9 and 16 in T2T and hg38.
Coordinates in which an STR with the base sequence of TGGAA appears are marked by blue vertical lines on the idiograms. Idiograms were generated by UCSC genome-browser (https://genome.ucsc.edu).
Fig. 5
Fig. 5. PGTailor user interface.
PGTailor is an online web application for designing STR-based PGT assays. An input of a genomic coordinate in either hg19/hg38/T2T human reference genomes will provide the user with the presented screen, in which one can choose specific primers and then download a report with additional information, including specifications for PCR amplification of desired STRs. https://fohs.bgu.ac.il/birklab/PGTailor.

References

    1. Weber JL, Myers EW. Human whole-genome shotgun sequencing. Genome Res. 1997;7:401–9. doi: 10.1101/gr.7.5.401. - DOI - PubMed
    1. Craig Venter J, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science. 2001;291:1304–51. doi: 10.1126/science.1058040. - DOI - PubMed
    1. Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods. 2010;8:61–5. doi: 10.1038/nmeth.1527. - DOI - PMC - PubMed
    1. Rhoads A, Au KF. PacBio sequencing and its applications. Genom Proteom Bioinforma. 2015;13:278–89. doi: 10.1016/j.gpb.2015.08.002. - DOI - PMC - PubMed
    1. Jain M, Olsen HE, Paten B, Akeson M. The Oxford nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 2016;17:1–11. - PMC - PubMed

Publication types