Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 20;34(11):1785-1797.
doi: 10.1101/gr.279346.124.

Leveraging the T2T assembly to resolve rare and pathogenic inversions in reference genome gaps

Affiliations

Leveraging the T2T assembly to resolve rare and pathogenic inversions in reference genome gaps

Kristine Bilgrav Saether et al. Genome Res. .

Abstract

Chromosomal inversions (INVs) are particularly challenging to detect due to their copy-number neutral state and association with repetitive regions. Inversions represent about 1/20 of all balanced structural chromosome aberrations and can lead to disease by gene disruption or altering regulatory regions of dosage-sensitive genes in cis Short-read genome sequencing (srGS) can only resolve ∼70% of cytogenetically visible inversions referred to clinical diagnostic laboratories, likely due to breakpoints in repetitive regions. Here, we study 12 inversions by long-read genome sequencing (lrGS) (n = 9) or srGS (n = 3) and resolve nine of them. In four cases, the inversion breakpoint region was missing from at least one of the human reference genomes (GRCh37, GRCh38, T2T-CHM13) and a reference agnostic analysis was needed. One of these cases, an INV9 mappable only in de novo assembled lrGS data using T2T-CHM13 disrupts EHMT1 consistent with a Mendelian diagnosis (Kleefstra syndrome 1; MIM#610253). Next, by pairwise comparison between T2T-CHM13, GRCh37, and GRCh38, as well as the chimpanzee and bonobo, we show that hundreds of megabases of sequence are missing from at least one human reference, highlighting that primate genomes contribute to genomic diversity. Aligning population genomic data to these regions indicated that these regions are variable between individuals. Our analysis emphasizes that T2T-CHM13 is necessary to maximize the value of lrGS for optimal inversion detection in clinical diagnostics. These results highlight the importance of leveraging diverse and comprehensive reference genomes to resolve unsolved molecular cases in rare diseases.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Reference genome-dependent detection of inversions analyzed by srGS and lrGS. (A) An inversion 6 (P4855_501) visible in srGS, linked read genome sequencing (lirGS), and lrGS using GRCh38. (B) An inversion 10 (P4855_106) visible in srGS and lirGS data using T2T-CHM13. (C) An inversion 9 (BH16643-1) only visible by lrGS de novo assembly using T2T-CHM13. (D) An inversion 12 (RD_P541) within a 8 kbp DRR.
Figure 2.
Figure 2.
Comparison of the inversion breakpoint region on Chromosome 6p12.3, Chromosome 10q11, and Chromosome 9q12. Reference sequences were aligned with each other and shown as dot plots. The dashed line or dot represents the position of the breakpoint of the inversions. (A) The Chromosome 6p inversion breakpoint is located in a 127 kbp region in GRCh38 missing from GRCh37. (B) The Chromosome 6p inversion breakpoint in GRCh38 and T2T. (C) The Chromosome 10q breakpoint is located in a 69 kbp region missing in GRCh38, with a surrounding 4 kbp duplication which occurs only once in T2T. (D) The Chromosome 9q12 breakpoint is located in a 28 Mbp region missing in GRCh38 shaded in blue.
Figure 3.
Figure 3.
Inversion affecting Chromosome 9 (BH16643-1). (A) Pedigree displaying inheritance pattern for inversion 9. (B) G-banded chromosome analysis showed a paracentric inversion in the long arm of one Chromosome 9 between bands 9q12 and 9q34.3 in the proband. The abnormal Chromosome 9 is to the right. Parental chromosome analysis revealed no evidence of this inversion in either parent, suggesting that this is a de novo event. (C) Chromosome 9 inversion disrupted intron 25 out of 26 of EHMT1 at 9q34.3. (D) Nucleotide sequence alignment of inversion breakpoint junctions 1 (top) and 2 (bottom).
Figure 4.
Figure 4.
Complex inversion on Chromosome 19 (RD_P546). (Upper panel) Inversion structure with duplicated segments in color and nonduplicated segments in gray. Junction numbers are given below the resulting derivative. (Lower panel) Breakpoint junction sequences with number of base pairs inserted in parentheses.
Figure 5.
Figure 5.
Shared DRR in T2T-CHM13 and GRCh38. (A) Bar plot of all T2T DRRs, (B) Venn diagram of Mbp overlap between all GRCh38 DRRs, and (C) Venn diagram of Mbp overlap between all T2T DRRs.
Figure 6.
Figure 6.
Repeat characterization across DRRs. (A) Percentage of repeat elements (masked by RepeatMasker) in the DRR sequences from GRCh38–GRCh37 and T2T-GRCh38. (B) Distribution of DRR sequences and their repeat percentage in GRCh38–GRCh37 and T2T-GRCh38. (C) Pie chart displaying repeat content in the GRCh38–GRCh37 DRR sequences affected by the inversion 6 at the 6p12 junction in GRCh38.

Update of

References

    1. Abergel C, Monchois V, Byrne D, Chenivesse S, Lembo F, Lazzaroni JC, Claverie JM. 2007. Structure and evolution of the Ivy protein family, unexpected lysozyme inhibitors in Gram-negative bacteria. Proc Natl Acad Sci 104: 6394–6399. 10.1073/pnas.0611019104 - DOI - PMC - PubMed
    1. Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kähäri AK, Lundin P, Che H, et al. 2017. SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population. Eur J Hum Genet 25: 1253–1260. 10.1038/ejhg.2017.130 - DOI - PMC - PubMed
    1. Ameur A, Che H, Martin M, Bunikis I, Dahlberg J, Höijer I, Häggqvist S, Vezzi F, Nordlund J, Olason P, et al. 2018. De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data. Genes (Basel) 9: 486. 10.3390/genes9100486 - DOI - PMC - PubMed
    1. Brechtmann F, Mertes C, Matusevičiūte A, Yépez VA, Avsec Z, Herzog M, Bader DM, Prokisch H, Gagneur J. 2018. OUTRIDER: a statistical method for detecting aberrantly expressed genes in RNA sequencing data. Am J Hum Genet 103: 907–917. 10.1016/j.ajhg.2018.10.025 - DOI - PMC - PubMed
    1. Carvalho CM, Lupski JR. 2016. Mechanisms underlying structural variant formation in genomic disorders. Nat Rev Genet 17: 224–238. 10.1038/nrg.2015.25 - DOI - PMC - PubMed

LinkOut - more resources