Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 16:9:966927.
doi: 10.3389/fmolb.2022.966927. eCollection 2022.

Pathogenic variation types in human genes relate to diseases through Pfam and InterPro mapping

Affiliations

Pathogenic variation types in human genes relate to diseases through Pfam and InterPro mapping

Giulia Babbi et al. Front Mol Biosci. .

Abstract

Grouping residue variations in a protein according to their physicochemical properties allows a dimensionality reduction of all the possible substitutions in a variant with respect to the wild type. Here, by using a large dataset of proteins with disease-related and benign variations, as derived by merging Humsavar and ClinVar data, we investigate to which extent our physicochemical grouping procedure can help in determining whether patterns of variation types are related to specific groups of diseases and whether they occur in Pfam and/or InterPro gene domains. Here, we download 75,145 germline disease-related and benign variations of 3,605 genes, group them according to physicochemical categories and map them into Pfam and InterPro gene domains. Statistically validated analysis indicates that each cluster of genes associated to Mondo anatomical system categorizations is characterized by a specific variation pattern. Patterns identify specific Pfam and InterPro domain-Mondo category associations. Our data suggest that the association of variation patterns to Mondo categories is unique and may help in associating gene variants to genetic diseases. This work corroborates in a much larger data set previous observations from our group.

Keywords: InterPro domain; Pfam domain; disease associated variant; mondo anatomical system categories; variation physicochemical type.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Distribution of Union genes as a function of the number of associated diseases (5,223 diseases) (Table 1).
FIGURE 2
FIGURE 2
Distribution of the 43,917 LP/P variations in the Union data set as a function of the number of associated diseases (5,223).
FIGURE 3
FIGURE 3
Frequency of variation types of the Union variations. Blue bars: LP/P variations; Red bars: LB/B variations. Labels are as follows: a, nonpolar; r, aromatic; p, polar; and c, charged.
FIGURE 4
FIGURE 4
Log-odd scores of variation types associated to the different Mondo anatomical system categories. The heatmap shows the log-odd score of each variation type with respect to the corresponding LP/P background (shown in Supplementary Figure S1). For each Mondo category, we show the number of diseases, genes (italic) and disease related variations. In variation types, labels are as follows: a, nonpolar; r, aromatic; p, polar; and c, charged. The log-odd values are affected by a relative error lower than 5%, as estimated with a bootstrapping procedure. Statistical validation of the and resulting FDR-corrected p-values are reported in Supplementary Table S2.
FIGURE 5
FIGURE 5
Log-odd scores of variation types in Pfam entries sorted by number of genes covered (the first 20, out of 1,940 Pfams, Supplementary Table S3). Log-odds are computed with respect to the whole dataset LP/P background (Figure 3). For each Pfam, the corresponding InterPro accession is also included. Numbers within parentheses report the number of genes, variations, and diseases, respectively. The log-odd values are affected by a relative error lower than 5%, as estimated with a bootstrapping procedure. Statistical validation and resulting FDR-corrected p-values for each Pfam entry are reported in Supplementary Table S3.
FIGURE 6
FIGURE 6
Log-odd scores of variation types for the first 20 InterPro entries (out of 5,357, Table 2), sorted by number of genes covered and not including Pfam signatures. Log-odds are computed with respect to the whole dataset LP/P background (Figure 3). Numbers in parentheses report, for each InterPro, the number of genes, of SRVs and of diseases, respectively. The log-odd values are affected by a relative error lower than 5%, as estimated with a bootstrapping procedure. Statistical validation and resulting FDR-corrected p-values for each InterPro entry are reported in Supplementary Table S3.
FIGURE 7
FIGURE 7
Log-odd scores for disease categories associated to different Pfam domains. Log-odds are calculated with respect to the whole-dataset background of disease categories (Supplementary Table S4). For each Pfam the corresponding InterPro accession is indicated. Numbers in parentheses report the number of genes, of SRVs, the median number of SRVs per gene and the number of diseases (for statistical validation see Supplementary Table S4).
FIGURE 8
FIGURE 8
Log-odd scores for disease categories associated to different InterPro domains. Log-odds are calculated with respect to the whole-dataset background of disease categories (Supplementary Table S4). Numbers in parentheses report the number of genes, of SRVs, the median number of SRVs per gene and the number of diseases (for statistical validation see Supplementary Table S4).

Similar articles

Cited by

References

    1. Amberger J. S., Bocchini C. A., Scott A. F., Hamosh A. (2019). OMIM.org: Everaging knowledge across phenotype–gene relationships. Nucleic Acids Res. 47, D1038-D1043–D1043. 10.1093/nar/gky1151 - DOI - PMC - PubMed
    1. Claussnitzer M., Cho J. H., Collins R., Cox N. J., Dermitzakis E. T., Hurles M. E., et al. (2020). A brief history of human disease genetics. Nature 577, 179–189. 10.1038/s41586-019-1879-7 - DOI - PMC - PubMed
    1. Glusman G., Rose P. W., Prlić A., Dougherty J., Duarte J. M., Hoffman A. S., et al. (2017). Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: proposed framework. Genome Med. 9, 113. 10.1186/s13073-017-0509-y - DOI - PMC - PubMed
    1. Grissa D., Junge A., Oprea T. I., Jensen L. J. (2022). Diseases 2.0: Weekly updated database of disease-gene associations from text mining and data integration. 10.1093/database/baac019 - DOI - PMC - PubMed
    1. Hebbar P., Sowmya S. K. (2022). “Genomic ariant nnotation: A omprehensive eview of ools and echniques,” in Intelligent ystems esign and pplications. ISDA 2021. Lecture Notes in Networks and ystems 418. Editors Abraham A., Gandhi N., Hanne T., Hong T. P., Nogueira Rios T., Ding W.. 10.1007/978-3-030-96308-8_98 - DOI

LinkOut - more resources