Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 7;26(1):197.
doi: 10.1186/s13059-025-03663-x.

Conserved missense variant pathogenicity and correlated phenotypes across paralogous genes

Affiliations

Conserved missense variant pathogenicity and correlated phenotypes across paralogous genes

Tobias Brünger et al. Genome Biol. .

Abstract

Background: The majority of missense variants in clinical genetic tests are classified as variants of uncertain significance. Prior research shows that the deleterious effects and the subsequent molecular consequences of variants are often conserved among paralogous protein sequences within a gene family. Here, we systematically quantify on an exome-wide scale whether the existence of pathogenic variants in paralogous genes at a conserved position can serve as evidence for the pathogenicity of a new variant. For the gene family of voltage-gated sodium channels, where variants and expert-curated clinical phenotypes are available, we also assess whether phenotype patterns of multiple disorders for each gene are conserved across variant positions within the gene family.

Results: Mapping 590,000 pathogenic and 1.9 million population variants onto 9928 genes grouped into 2054 paralogous families increases the number of residues with classifiable evidence 5.1-fold compared with gene-specific data alone. The presence of a pathogenic variant in a paralogous gene is associated with a positive likelihood ratio of 13.0 for variant pathogenicity. Across ten genes encoding voltage-gated sodium channels and 22 expert-curated disorders, we identify cross-paralog correlated phenotypes based on 3D structure spatial position. For example, multiple established loss-of-function related disorders across SCN1A, SCN2A, SCN5A, and SCN8A show overlapping spatial variant clusters. Finally, we show that phenotype integration in paralog variant selection improves variant classification.

Conclusion: Conserved pathogenic missense variants in paralogous genes provide robust, quantifiable support for clinical variant interpretation, and phenotype-informed mapping further improves predictions.

Keywords: ACMG; Genetics; missense; sodium channel; variant classification.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate.: Not applicable. Consent for publication: Not applicable. Competing interests: The authors report no competing interests.

Figures

Fig. 1
Fig. 1
Graphical summary of the study
Fig. 2
Fig. 2
Individual pathogenic paralogous variants can serve as a proxy for variant pathogenicity. A Number of amino acid residues in 414 gene families that have a pathogenic variant (ClinVar, one star +) at the same protein position in the same gene or a corresponding protein residue in a paralogous gene. B Amino acids with a paralogous pathogenic variant at a paralogous aliment position have an increased positive likelihood ratio (LR + > 1). In contrast, amino acids with a paralogous control variant (Regeneron, not present in gnomAD) at a paralogous alignment position are not enriched for pathogenic variants. Each data point represents the gene-wise LR +. The gene-wise LR + was calculated for genes where 10 or more pathogenic variants (ClinVar, one star +)) and control variants (Regeneron, not present in gnomAD) could be mapped
Fig. 3
Fig. 3
Comparison to established gene family-based methods. A The forest plot illustrates the enrichment of pathogenic versus control variants applying the para-SAME criterium for residues with similar paralog conservation levels, as defined in Lal et al. (2020) [8]. B Similar to A, but for the para-DIFF criterium. C The bar plot shows the number (N) of amino acid residues across all genes where a previously established approach (Pathogenic Enriched Region, PER; Perez-Palma et al., 2019 [15]) and/or our para-SAME/para-DIFF ACMG criteria extension can be applied. D The forest plot compares the likelihood ratios (LR +) for amino acid residues within a PER and amino acid residues where para-SAME/para-DIFF criteria can be applied (see the “ Methods” section for details)
Fig. 4
Fig. 4
Leveraging phenotype correlations to enhance the application of paralogous pathogenic variants. A Displayed is a correlation matrix that delineates the relationships between the 3D variant distributions across various phenotypes. Phenotypes that share significantly (after Bonferroni adjustment) similar 3D-variant distributions are color-coded in purple, whereas those with significantly distinct distributions are in orange. Statistically significant correlations are marked with stars (* for Padj < 0.05 and ** for P < 0.001). B Presented is a forest plot capturing the positive likelihood ratio for four pivotal phenotypes that is derived from a comparison of affected individuals and control variants sourced from gnomAD. These ratios were computed by either (1) employing paralogous variants from affected individuals that exhibited a significantly positive correlation based on 3D position (depicted in purple), (2) utilizing paralogous variants from affected individuals displaying a 3D position-based negative correlation (showcased in orange), and (3) considering paralogous control variants (represented in gray). Abbreviations: DEE, Developmental Epileptic Encephalopathy; Dravet, Dravet syndrome

References

    1. Lek M, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–91. - PMC - PubMed
    1. Choi M, et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A. 2009;106:19096–101. - PMC - PubMed
    1. Milman A, et al. Genotype-phenotype correlation of SCN5A genotype in patients with brugada syndrome and arrhythmic events: insights from the SABRUS in 392 probands. Circ Genom Precis Med. 2021;14:e003222. - PubMed
    1. Johannesen KM, et al. Genotype-phenotype correlations in SCN8A-related disorders reveal prognostic and therapeutic implications. Brain. 2022;145:2991–3009. - PMC - PubMed
    1. Kamada F, et al. A novel KCNQ4 one-base deletion in a large pedigree with hearing loss: implication for the genotype-phenotype correlation. J Hum Genet. 2006;51:455–60. - PubMed

Substances

LinkOut - more resources