Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 6;112(2):450-456.
doi: 10.1016/j.ajhg.2024.12.013. Epub 2025 Jan 13.

HiFi long-read genomes for difficult-to-detect, clinically relevant variants

Affiliations

HiFi long-read genomes for difficult-to-detect, clinically relevant variants

Wolfram Höps et al. Am J Hum Genet. .

Abstract

Clinical short-read exome and genome sequencing approaches have positively impacted diagnostic testing for rare diseases. Yet, technical limitations associated with short reads challenge their use for the detection of disease-associated variation in complex regions of the genome. Long-read sequencing (LRS) technologies may overcome these challenges, potentially qualifying as a first-tier test for all rare diseases. To test this hypothesis, we performed LRS (30× high-fidelity [HiFi] genomes) for 100 samples with 145 known clinically relevant germline variants that are challenging to detect using short-read sequencing and necessitate a broad range of complementary test modalities in diagnostic laboratories. We show that relevant variant callers readily re-identified the majority of variants (120/145, 83%), including ∼90% of structural variants, SNVs/insertions or deletions (indels) in homologous sequences, and expansions of short tandem repeats. Another 10% (n = 14) was visually apparent in the data but not automatically detected. Our analyses also identified systematic challenges for the remaining 7% (n = 11) of variants, such as the detection of AG-rich repeat expansions. Titration analysis showed that 90% of all automatically called variants could also be identified using 15-fold coverage. Long-read genomes thus identified 93% of challenging pathogenic variants from our dataset. Even with reduced coverage, the vast majority of variants remained detectable, possibly enhancing cost-effective diagnostic implementation. Most importantly, we show the potential to use a single technology to accurately identify all types of clinically relevant variants.

Keywords: challenging variants; clinical utility; diagnostics; genomics; long-read sequencing; omics; rare disease; short-read sequencing.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests T.M., E.D., X.C., and M.A.E. are employees and shareholders of Pacific Biosciences, a company commercializing DNA sequencing technologies. Pacific Biosciences also kindly provided part of the reagents required for this study.

Figures

Figure 1
Figure 1
Samples, variants, and LRS-based recovery (A) Pie chart depicting the cohort composition by variant type for all 145 variants. The number of samples is indicated within parentheses. (B) Different test modalities (y axis) that were used in a diagnostic laboratory to identify all 145 clinically relevant variants in the 100 selected samples (x axis). The number of assays required per patient is shown in an inlay pie chart. (C) Sensitivity of LRS by automated variant detection and visual inspection for all 145 variants from 100 analyzed samples (x axis), stratified by a priori known disease-associated variant types (y axis). LRS-based detection rates are indicated in green (detected by a variant caller, 83% [n = 120]), orange (detection by visual read inspection, 10% [n = 14]), and red (undetected variant, 7% [n = 11]).
Figure 2
Figure 2
Examples of variants identified in an automated fashion or by visual inspection (A and B) IGV screenshots of long-read sequencing data for specific variants from samples P50-A1 and P4-H11, respectively. Reads are colored by phase. (C) Visualization of the de novo assembly of a deletion of STRC, with the pseudogene STRCP1 intact for sample P3-E11. Using the mapping quality metric, the breakpoint can be narrowed down to a ∼30 kbp window (light blue squares). A schematic view of the genes and assembly mapping is indicated on top, and raw dot-plot mappings of GRCh38 (x axis) vs. the assembled region (y axis) are displayed on the bottom. (D) The same variant from (C) visualized with Paraphase. Reads are grouped by inferred (pseudo)gene identity. Only one haplotype of STRC is observed, thus indicating a deletion of the other allele. (E) An imprinting defect on the maternal chromosome 14 due to a uniparental heterodisomy for sample P50-G3. Reads are colored by methylation status, with blue indicating unmethylated CpGs and red methylated CpGs. (F) De novo assembly of a locus containing a ∼200 kbp complex genomic rearrangement for sample P50-E5.
Figure 3
Figure 3
Variant recall in titration experiments Results of automated variant detection per variant and calling tool for different genome-wide coverage levels (10×, 15×, 20×, and 30×) based on the initial 177 calls (Table S4). Boxplots are based on 10 random selections of different reads from the original 30× coverage sample. For SNVs, we distinguished between all SNVs (in red; n = 43) and SNVs not overlapping a homologous region (in light orange with asterisk; n = 18).

References

    1. Nguengang Wakap S., Lambert D.M., Olry A., Rodwell C., Gueydan C., Lanneau V., Murphy D., Le Cam Y., Rath A. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur. J. Hum. Genet. 2020;28:165–173. doi: 10.1038/s41431-019-0508-0. - DOI - PMC - PubMed
    1. Wojcik M.H., Lemire G., Berger E., Zaki M.S., Wissmann M., Win W., White S.M., Weisburd B., Wieczorek D., Waddell L.B., et al. Genome Sequencing for Diagnosing Rare Diseases. N. Engl. J. Med. 2024;390:1985–1997. doi: 10.1056/NEJMoa2314761. - DOI - PMC - PubMed
    1. Turro E., Astle W.J., Megy K., Gräf S., Greene D., Shamardina O., Allen H.L., Sanchis-Juan A., Frontini M., Thys C., et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature. 2020;583:96–102. doi: 10.1038/s41586-020-2434-2. - DOI - PMC - PubMed
    1. Schobers G., Derks R., den Ouden A., Swinkels H., van Reeuwijk J., Bosgoed E., Lugtenberg D., Sun S.M., Corominas Galbany J., Weiss M., et al. Genome sequencing as a generic diagnostic strategy for rare disease. Genome Med. 2024;16:32. doi: 10.1186/s13073-024-01301-y. - DOI - PMC - PubMed
    1. Gorzynski J.E., Marwaha S., Reuter C.M., Jensen T., Ferrasse A., Raja A., Fernandez L., Kravets E., Carter J., Bonner D., et al. Clinical application of Complete Long Read genome sequencing identifies a 16kb intragenic duplication in EHMT1 in a patient with suspected Kleefstra syndrome. medRxiv. 2024 doi: 10.1101/2024.03.28.24304304. Preprint at. - DOI

MeSH terms

LinkOut - more resources