Genome-wide profiling of highly similar paralogous genes using HiFi sequencing
- PMID: 40057485
- PMCID: PMC11890787
- DOI: 10.1038/s41467-025-57505-2
Genome-wide profiling of highly similar paralogous genes using HiFi sequencing
Abstract
Variant calling is hindered in segmental duplications by sequence homology. We developed Paraphase, a HiFi-based informatics method that resolves highly similar genes by phasing all haplotypes of paralogous genes together. We applied Paraphase to 160 long (>10 kb) segmental duplication regions across the human genome with high (>99%) sequence similarity, encoding 316 genes. Analysis across five ancestral populations revealed highly variable copy numbers of these regions. We identified 23 paralog groups with exceptionally low within-group diversity, where extensive gene conversion and unequal crossing over contribute to highly similar gene copies. Furthermore, our analysis of 36 trios identified 7 de novo SNVs and 4 de novo gene conversion events, 2 of which are non-allelic. Finally, we summarized extensive genetic diversity in 9 medically relevant genes previously considered challenging to genotype. Paraphase provides a framework for resolving gene paralogs, enabling accurate testing in medically relevant genes and population-wide studies of previously inaccessible genes.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: X.C., D.B., Egor D., and M.A.E. are employees of PacBio. J.M.D., J.N., A.S.B., R.B., K.S.H., L.L., P.K. and S.N. are employees of GeneDx. The remaining authors declare no competing interests.
Figures
References
-
- Mandelker, D. et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing. Genet. Med. J. Am. Coll. Med. Genet.18, 1282–1289 (2016). - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
