Spatial distribution of disease-associated variants in three-dimensional structures of protein complexes

A Gress^{1

2}, V Ramensky^{3

4}, O V Kalinina¹

Affiliations

¹ Department for Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.
² Graduate School of Computer Science, Saarland University, Saarbrücken, Germany.
³ Center for Neurobehavioral Genetics, University of California, Los Angeles, CA, USA.
⁴ Moscow Institute of Physics and Technology, Moscow Region, Russian Federation.

PMID: 28945216
PMCID: PMC5623905
DOI: 10.1038/oncsis.2017.79

Spatial distribution of disease-associated variants in three-dimensional structures of protein complexes

A Gress et al. Oncogenesis. 2017.

. 2017 Sep 25;6(9):e380.

doi: 10.1038/oncsis.2017.79.

Authors

A Gress^{1

2}, V Ramensky^{3

4}, O V Kalinina¹

Affiliations

¹ Department for Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.
² Graduate School of Computer Science, Saarland University, Saarbrücken, Germany.
³ Center for Neurobehavioral Genetics, University of California, Los Angeles, CA, USA.
⁴ Moscow Institute of Physics and Technology, Moscow Region, Russian Federation.

PMID: 28945216
PMCID: PMC5623905
DOI: 10.1038/oncsis.2017.79

Abstract

Next-generation sequencing enables simultaneous analysis of hundreds of human genomes associated with a particular phenotype, for example, a disease. These genomes naturally contain a lot of sequence variation that ranges from single-nucleotide variants (SNVs) to large-scale structural rearrangements. In order to establish a functional connection between genotype and disease-associated phenotypes, one needs to distinguish disease drivers from neutral passenger variants. Functional annotation based on experimental assays is feasible only for a limited number of candidate mutations. Thus alternative computational tools are needed. A possible approach to annotating mutations functionally is to consider their spatial location relative to functionally relevant sites in three-dimensional (3D) structures of the harboring proteins. This is impeded by the lack of available protein 3D structures. Complementing experimentally resolved structures with reliable computational models is an attractive alternative. We developed a structure-based approach to characterizing comprehensive sets of non-synonymous single-nucleotide variants (nsSNVs): associated with cancer, non-cancer diseases and putatively functionally neutral. We searched experimentally resolved protein 3D structures for potential homology-modeling templates for proteins harboring corresponding mutations. We found such templates for all proteins with disease-associated nsSNVs, and 51 and 66% of proteins carrying common polymorphisms and annotated benign variants. Many mutations caused by nsSNVs can be found in protein-protein, protein-nucleic acid or protein-ligand complexes. Correction for the number of available templates per protein reveals that protein-protein interaction interfaces are not enriched in either cancer nsSNVs, or nsSNVs associated with non-cancer diseases. Whereas cancer-associated mutations are enriched in DNA-binding proteins, they are rarely located directly in DNA-interacting interfaces. In contrast, mutations associated with non-cancer diseases are in general rare in DNA-binding proteins, but enriched in DNA-interacting interfaces in these proteins. All disease-associated nsSNVs are overrepresented in ligand-binding pockets, and nsSNVs associated with non-cancer diseases are additionally enriched in protein core, where they probably affect overall protein stability.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Distance between residues corresponding to nsSNVs and the nearest interaction partner (log scale). Biological data sets are shown in a darker shade. The fraction of mapped nsSNVs, for which a template with a co-resolved corresponding interaction partner is provided below boxes representing distribution of distances to protein, ligand and DNA interaction partners for each biological data set. For randomized data sets, all 10 replicas are used to create the plots. (a) Distances to the nearest protein chain. (b) Distances to the nearest ligand. (c) Distances to the nearest DNA chain.

**Figure 2**
Chemical difference between wild-type and mutated residues. Gray bars indicate biological data sets, light-gray bars indicate randomized data sets. Chemical distance is calculated as Euclidean distances between the end points of the vectors representing five most important numerical descriptors of physical and chemical properties of the wild-type and mutant amino acids.

**Figure 3**
Spatial distribution of nsSNVs in the analyzed data sets. For randomized data sets, mean values over 10 replicas are used. (a) For templates with ⩾35% sequence identity. (b) For templates with ⩾90% sequence identity.

**Figure 4**
Protein complexes with nsSNVs in multiple subunits. (a) Mitochondrial respiratory complex II (mapped onto a homologous complex from porcine heart, PDB id 1ZOY) and the corresponding sub-network (see text). FAD-binding protein is shown in green, mutations therein in pink; iron–sulfur protein is shown in cyan, mutations therein in orange; large cytochrome binding protein is shown in magenta, mutations therein in purple; small cytochrome binding protein is shown in yellow, mutation therein in limegreen. In the sub-network, nodes correspond to individual proteins, edges depict interactions between them. (b) Sub-network corresponding to complexes of CDK6 with its inhibitors CDKN2A and CDKN2C. Stoichiometry of the complexes is not accounted for, and nodes with a single loop edge correspond to associations of multiple identical subunits. (c) Sub-network corresponding to NRas, KRas and HRas and their downstream kinase RAF1 and activity factors SOS1 and PLCE1. (d) PIK3CA-PIK3R1 complex with mutations corresponding to cancer-associated somatic nsSNVs (top) and to nsSNVs associated with non-cancer diseases (bottom), PDB id 4L1B and the PIK3CA-PIK3R1 sub-network. PIK3CA subunit is shown in green, mutations therein in magenta and purple. PIK3R1 subunit is shown in cyan, mutations therein in orange and red.

**Figure 5**
Contacts and distance distributions for oncogenes and tumor-suppressor genes (TSG). (a) Distribution of nsSNVs into structural classes. (b–d) Distances to the nearest interaction partners: (b) protein chain, (c) ligand, (d) DNA chain.

See this image and copyright information in PMC

Cited by

Alterations in SAMD9, AHSG, FRG2C, and FGFR4 Genes in a Case of Late-Onset Massive Tumoral Calcinosis.
Leow MKS, Ang J, Bi X, Koh ET, McFarlane C. Leow MKS, et al. AACE Clin Case Rep. 2023 May 11;9(5):153-157. doi: 10.1016/j.aace.2023.05.004. eCollection 2023 Sep-Oct. AACE Clin Case Rep. 2023. PMID: 37736313 Free PMC article.
StructMAn 2.0 Web: a web server for structural annotation of protein sequences and mutations.
Yurtseven A, Keller S, Hirsch P, Kalinina OV, Gress A. Yurtseven A, et al. Nucleic Acids Res. 2025 Jul 7;53(W1):W528-W533. doi: 10.1093/nar/gkaf381. Nucleic Acids Res. 2025. PMID: 40326516 Free PMC article.
Single Amino Acid Substitution the DNA Repairing Gene Radiation-Sensitive 4 Contributes to Ultraviolet Tolerance of a Plant Pathogen.
Wang YP, Yang LN, Feng YY, Liu S, Zhan J. Wang YP, et al. Front Microbiol. 2022 Jul 14;13:927139. doi: 10.3389/fmicb.2022.927139. eCollection 2022. Front Microbiol. 2022. PMID: 35910660 Free PMC article.
d-StructMAn: Containerized structural annotation on the scale from genetic variants to whole proteomes.
Gress A, Srikakulam SK, Keller S, Ramensky V, Kalinina OV. Gress A, et al. Gigascience. 2022 Sep 20;11:giac086. doi: 10.1093/gigascience/giac086. Gigascience. 2022. PMID: 36130085 Free PMC article.
HawkDock: a web server to predict and analyze the protein-protein complex based on computational docking and MM/GBSA.
Weng G, Wang E, Wang Z, Liu H, Zhu F, Li D, Hou T. Weng G, et al. Nucleic Acids Res. 2019 Jul 2;47(W1):W322-W330. doi: 10.1093/nar/gkz397. Nucleic Acids Res. 2019. PMID: 31106357 Free PMC article.

See all "Cited by" articles

References

1. Cooper DN, Krawczak M, Polychronakos C, Tyler-Smith C, Kehrer-Sawatzki H. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human in-herited disease. Hum Genet 2013; 132: 1077–1130. - PMC - PubMed
1. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 2010; 467: 1061–1073. - PMC - PubMed
1. Landrum MJ, Lee JN, Bensom M, Brown G, Chao C, Chitipiralla S et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 2016; 44: D862–D868. - PMC - PubMed
1. Petukh M, Kucukkal TG, Alexov E. On human disease-causing amino acid variants: statistical study of sequence and structural patterns. Hum Mutat 2015; 36: 524–534. - PMC - PubMed
1. De Beer TAP, Laskowski RA, Parks SL, Sipos B, Goldman N, Thornton JM. Aminoacid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset. PLoS Comput Biol 2013; 9: 1–15. - PMC - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Spatial distribution of disease-associated variants in three-dimensional structures of protein complexes

Affiliations

Spatial distribution of disease-associated variants in three-dimensional structures of protein complexes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources