Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;43(9):1268-1285.
doi: 10.1002/humu.24392. Epub 2022 May 10.

Large scale genotype- and phenotype-driven machine learning in Von Hippel-Lindau disease

Affiliations

Large scale genotype- and phenotype-driven machine learning in Von Hippel-Lindau disease

Andreea Chiorean et al. Hum Mutat. 2022 Sep.

Abstract

Von Hippel-Lindau (VHL) disease is a hereditary cancer syndrome where individuals are predisposed to tumor development in the brain, adrenal gland, kidney, and other organs. It is caused by pathogenic variants in the VHL tumor suppressor gene. Standardized disease information has been difficult to collect due to the rarity and diversity of VHL patients. Over 4100 unique articles published until October 2019 were screened for germline genotype-phenotype data. Patient data were translated into standardized descriptions using Human Genome Variation Society gene variant nomenclature and Human Phenotype Ontology terms and has been manually curated into an open-access knowledgebase called Clinical Interpretation of Variants in Cancer. In total, 634 unique VHL variants, 2882 patients, and 1991 families from 427 papers were captured. We identified relationship trends between phenotype and genotype data using classic statistical methods and spectral clustering unsupervised learning. Our analyses reveal earlier onset of pheochromocytoma/paraganglioma and retinal angiomas, phenotype co-occurrences and genotype-phenotype correlations including hotspots. It confirms existing VHL associations and can be used to identify new patterns and associations in VHL disease. Our database serves as an aggregate knowledge translation tool to facilitate sharing information about the pathogenicity of VHL variants.

Keywords: CIViC; Von Hippel-Lindau; genotype-phenotype; machine learning; spectral clustering.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Number of VHL papers analyzed at each state of the identification, screening, eligibility, and inclusion process. Four hundred and twenty‐seven papers were included in the final analysis assessing VHL genotype–phenotype correlations. VHL, Von Hippel‐Lindau.
Figure 2
Figure 2
Age‐related penetrance for patients that present with a single phenotype. PPGL and RA have an earlier age of onset compared to the other phenotypes (RCT, CHB, RCC, and PCT). CHB, CNS hemangioblastoma; CNS, central nervous system; PCT, pancreatic cysts or tumors; PPGL, pheochromocytoma/paraganglioma; RA, retinal angioma; RCC, renal cell carcinoma; RCT, renal cysts or tumors.
Figure 3
Figure 3
Phenotype co‐occurrence ratios for (a) patient‐, (b) family‐, and (c) variant‐based data. In all instances, PPGL had a low co‐occurrence ratio for all data‐types, except PNET. CHB had a high co‐occurrence ratio with all phenotypes except PPGL and PNET in patient‐ and family‐based data; a low co‐occurrence ratio was seen between CHB and PPGL, moderate co‐occurrence between CHB and PNET, and high co‐occurrence of CHB with all other phenotypes. CHB, CNS hemangioblastoma; CNS, central nervous system; PNET, pancreatic neuroendocrine tumor; PPGL, pheochromocytoma/paraganglioma.
Figure 4
Figure 4
Frequency of missense variants along the VHL gene for (a) patient‐, (b) family‐, and (c) variant‐based data, and BLOSUM90‐adjusted missense frequency for (d) patient‐ and (e) family‐based data. Codons identified as hotspots are labeled with numbers and asterisks (*) indicate highly significant hotspots. The α‐domain is indicated by the red background region and the β‐domain is indicated by the blue background region.
Figure 5
Figure 5
Distribution of truncating and nontruncating variant by phenotype for (a) patient‐, (b) family‐, and (c) variant‐based data. Significant differences were seen between truncating/nontruncating distributions between phenotypes. In particular, nontruncating variants were favored over truncating variants for PPGL; results of PNETs distribution was also statistically different from all other phenotypes except RA, to favor nontruncating variants. PNET, pancreatic neuroendocrine tumor; PPGL, pheochromocytoma/paraganglioma; RA, retinal angioma.
Figure 6
Figure 6
Frequency of coding variants in protein and functional domains for (a) patient‐, (b) family‐, and (c) variant‐based data. PPGL and PNET were statistically different (p < 0.00238) than all other phenotypes, but not each other and favored variants being distributed in the α‐domain over the β‐domain. PNET, pancreatic neuroendocrine tumor; PPGL, pheochromocytoma/paraganglioma.
Figure 7
Figure 7
Cluster phenotype, variant type, variant domain and codon distribution for two patient clusters. Patient cluster 2 was PPGL dominant (a) and had slightly more variants in the α‐domain, whereas cluster 1 had more variants in the β‐domain (b). Cluster 2 heavily favored nontruncating variants over truncating, whereas cluster 1 only marginally favored nontruncating variants (c) and had more variants around the 161 and 167 hotspots compared to cluster 1 (d). PPGL, pheochromocytoma/paraganglioma.
Figure 8
Figure 8
Cluster phenotype (a), variant type (b), variant domain (c), and codon statistics (d) for each of the four patient clusters. Patient cluster 1 was CHB, RCC, and PCT dominant, patient cluster 2 was RA and CHB dominant, patient cluster 3 was CHB dominant, and patient cluster 4 was PPGL dominant (a). Clusters 1, 2, and 3 had the majority of variants in the β‐domain compared to α‐domain, whereas the opposite trend was present in cluster 4 (b). Clusters 1 and 2 had more nontruncating than truncating variants, cluster 4 had vastly more nontruncating than truncating variants, and cluster 3 had a roughly equal distribution of truncating and nontruncating variants (c). Clusters 2 and 4 had more variants in α‐domain hotspots, especially codon 167, whereas clusters 1 and 3 had variants distributed more across the β‐domain (d). CHB, CNS hemangioblastoma; CNS, central nervous system; PCT, pancreatic cysts or tumors; PPGL, pheochromocytoma/paraganglioma; RCC, renal cell carcinoma.

References

    1. Alosi, D. , Bisgaard, M. L. , Hemmingsen, S. N. , Krogh, L. N. , Mikkelsen, H. B. , & Binderup, M. L. M. (2017). Management of gene variants of unknown significance: Analysis method and risk assessment of the VHL mutation p.P81S (c.241C>T). Current Genomics, 18(1), 93–103. 10.2174/1389202917666160805153221 - DOI - PMC - PubMed
    1. Amendola, L. M. , Jarvik, G. P. , Leo, M. C. , McLaughlin, H. M. , Akkari, Y. , Amaral, M. D. , Berg, J. S. , Biswas, S. , Bowling, K. M. , Conlin, L. K. , Cooper, G. M. , Dorschner, M. O. , Dulik, M. C. , Ghazani, A. A. , Ghosh, R. , Green, R. C. , Hart, R. , Horton, C. , Johnston, J. J. , … Rehm, H. L. (2016). Performance of ACMG‐AMP variant‐interpretation guidelines among nine laboratories in the clinical sequencing exploratory research consortium. American Journal of Human Genetics, 98(6), 1067–1076. 10.1016/j.ajhg.2016.03.024 - DOI - PMC - PubMed
    1. Armstrong, R. A. (2014). When to use the Bonferroni correction. Ophthalmic and Physiological Optics, 34(5), 502–508. 10.1111/opo.12131 - DOI - PubMed
    1. Aronoff, L. , Malkin, D. , van Engelen, K. , Gallinger, B. , Wasserman, J. , Kim, R. H. , Villani, A. , Meyn, M. S. , & Druker, H. (2018). Evidence for genetic anticipation in von Hippel‐Lindau syndrome. Journal of Medical Genetics, 55(6), 395–402. 10.1136/jmedgenet-2017-104882 - DOI - PubMed
    1. Barlier, A. , & Mohamed, A. (2021, April 6). The UMD‐VHL mutations database. http://www.umd.be/VHL/

Publication types

MeSH terms

Substances