Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Apr 14;35(4):593-598.
doi: 10.1101/gr.280120.124.

The impact of long-read sequencing on human population-scale genomics

Affiliations
Review

The impact of long-read sequencing on human population-scale genomics

Tobias Rausch et al. Genome Res. .

Abstract

Long-read sequencing technologies, particularly those from Pacific Biosciences and Oxford Nanopore Technologies, are revolutionizing genome research by providing high-resolution insights into complex and repetitive regions of the human genome that were previously inaccessible. These advances have been particularly enabling for the comprehensive detection of genomic structural variants (SVs), which is critical for linking genotype to phenotype in population-scale and rare disease studies, as well as in cancer. Recent developments in sequencing throughput and computational methods, such as pangenome graphs and haplotype-resolved assemblies, are paving the way for the future inclusion of long-read sequencing in clinical cohort studies and disease diagnostics. DNA methylation signals directly obtained from long reads enhance the utility of single-molecule long-read sequencing technologies by enabling molecular phenotypes to be interpreted, and by allowing the identification of the parent of origin of de novo mutations. Despite this recent progress, challenges remain in scaling long-read technologies to large populations due to cost, computational complexity, and the lack of tools to facilitate the efficient interpretation of SVs in graphs. This perspective provides a succinct review on the current state of long-read sequencing in genomics by highlighting its transformative potential and key hurdles, and emphasizing future opportunities for advancing the understanding of human genetic diversity and diseases through population-scale long-read analysis.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Long-read sequencing facilitates accurate discovery of SVs and their breakpoints at single-nucleotide resolution, with PacBio HiFi sequencing showing very high accuracy in terms of mapping repeat polymorphisms and ONT ultra-long (ONT-UL) being an essential method for determining centromeric or telomeric variant repeat (TVR) structures as well as balanced inversions which are often embedded within large DNA repeats. Short reads remain highly cost-efficient and thus, scalable to tens of thousands of genomes. Notably, some applications such as telomere-to-telomere assembly projects presently require a combination of technologies, with current studies using PacBio Hifi sequencing, ONT-UL, and Strand-seq (short reads) (Logsdon et al. 2024a).
Figure 2.
Figure 2.
Relationship between sequencing coverage, base accuracy, read length, DNA quality, and feasible cohort size—showing how different study designs affect outcomes such as variant discovery and genotyping accuracy. For rare disease studies, long-read sequencing offers unmatched power in uncovering novel variants, whereas in common diseases, the emphasis will be toward larger sample sizes with lower coverage to increase statistical power in population-scale (PS) genetics and genome-wide association studies (GWAS).
Figure 3.
Figure 3.
Pangenome graphs require methodological developments for scalable graph construction, accurate long-read alignments, graph-based variant discovery, and flexible graph augmentation. The upper graph shows a small portion of a pangenome graph with segments representing nucleotide sequences and links delineating possible paths through the graph (one possible haplotype path shown in black). Biallelic bubbles have two possible paths, while multiallelic bubbles with more than two paths pose significant graph construction challenges, especially for highly polymorphic and multiallelic variable number of tandem repeats (Li et al. 2020). Population-scale long-read sequencing efforts enable iterative cycles of alignments to a pangenome graph to facilitate genetic variant discovery (exemplified by a new Alu element insertion) with subsequent graph augmentation using new alleles.

References

    1. Afflerbach A-K, Albers A, Appelt A, Schweizer L, Paulus W, Bockmayr M, Schüller U, Thomas C. 2024. Nanopore sequencing from formalin-fixed paraffin-embedded specimens for copy-number profiling and methylation-based CNS tumor classification. Acta Neuropathol 147: 74. 10.1007/s00401-024-02731-z - DOI - PMC - PubMed
    1. Akbari V, Hanlon VCT, O'Neill K, Lefebvre L, Schrader KA, Lansdorp PM, Jones SJM. 2023. Parent-of-origin detection and chromosome-scale haplotyping using long-read DNA methylation sequencing and Strand-seq. Cell Genom 3: 100233. 10.1016/j.xgen.2022.100233 - DOI - PMC - PubMed
    1. Altemose N, Logsdon GA, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, Hoyt SJ, Uralsky L, Ryabov FD, Shew CJ, et al. 2022. Complete genomic and epigenetic maps of human centromeres. Science 376: eabl4178. 10.1126/science.abl4178 - DOI - PMC - PubMed
    1. Barbosa M, Joshi RS, Garg P, Martin-Trujillo A, Patel N, Jadhav B, Watson CT, Gibson W, Chetnik K, Tessereau C, et al. 2018. Identification of rare de novo epigenetic variations in congenital disorders. Nat Commun 9: 2064. 10.1038/s41467-018-04540-x - DOI - PMC - PubMed
    1. Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, Atlason BA, Kristmundsdottir S, Mehringer S, Hardarson MT, et al. 2021. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 53: 779–786. 10.1038/s41588-021-00865-4 - DOI - PubMed

LinkOut - more resources