Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases

doi:10.1186/s12920-022-01173-4

. 2022 Feb 10;15(1):26.

doi: 10.1186/s12920-022-01173-4.

Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases

M Tarozzi¹, A Bartoletti-Stella^{2

3}, D Dall'Olio⁴, T Matteuzzi⁴, S Baiardi^{2

3}, P Parchi^{2

3}, G Castellani^#⁵, S Capellari^#^{3

6}

Affiliations

¹ Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy.
² Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy.
³ IRCCS Institute of Neurological Sciences of Bologna, Bologna, Italy.
⁴ Department of Physics and Astronomy, University of Bologna, Bologna, Italy.
⁵ Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy. gastone.castellani@unibo.it.
⁶ Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy.

^# Contributed equally.

PMID: 35144616
PMCID: PMC8830183
DOI: 10.1186/s12920-022-01173-4

Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases

M Tarozzi et al. BMC Med Genomics. 2022.

. 2022 Feb 10;15(1):26.

doi: 10.1186/s12920-022-01173-4.

Authors

M Tarozzi¹, A Bartoletti-Stella^{2

3}, D Dall'Olio⁴, T Matteuzzi⁴, S Baiardi^{2

3}, P Parchi^{2

3}, G Castellani^#⁵, S Capellari^#^{3

6}

Affiliations

¹ Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy.
² Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy.
³ IRCCS Institute of Neurological Sciences of Bologna, Bologna, Italy.
⁴ Department of Physics and Astronomy, University of Bologna, Bologna, Italy.
⁵ Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy. gastone.castellani@unibo.it.
⁶ Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, Italy.

^# Contributed equally.

PMID: 35144616
PMCID: PMC8830183
DOI: 10.1186/s12920-022-01173-4

Abstract

Background: Targeted Next Generation Sequencing is a common and powerful approach used in both clinical and research settings. However, at present, a large fraction of the acquired genetic information is not used since pathogenicity cannot be assessed for most variants. Further complicating this scenario is the increasingly frequent description of a poli/oligogenic pattern of inheritance showing the contribution of multiple variants in increasing disease risk. We present an approach in which the entire genetic information provided by target sequencing is transformed into binary data on which we performed statistical, machine learning, and network analyses to extract all valuable information from the entire genetic profile. To test this approach and unbiasedly explore the presence of recurrent genetic patterns, we studied a cohort of 112 patients affected either by genetic Creutzfeldt-Jakob (CJD) disease caused by two mutations in the PRNP gene (p.E200K and p.V210I) with different penetrance or by sporadic Alzheimer disease (sAD).

Results: Unsupervised methods can identify functionally relevant sources of variation in the data, like haplogroups and polymorphisms that do not follow Hardy-Weinberg equilibrium, such as the NOTCH3 rs11670823 (c.3837 + 21 T > A). Supervised classifiers can recognize clinical phenotypes with high accuracy based on the mutational profile of patients. In addition, we found a similar alteration of allele frequencies compared the European population in sporadic patients and in V210I-CJD, a poorly penetrant PRNP mutation, and sAD, suggesting shared oligogenic patterns in different types of dementia. Pathway enrichment and protein-protein interaction network revealed different altered pathways between the two PRNP mutations.

Conclusions: We propose this workflow as a possible approach to gain deeper insights into the genetic information derived from target sequencing, to identify recurrent genetic patterns and improve the understanding of complex diseases. This work could also represent a possible starting point of a predictive tool for personalized medicine and advanced diagnostic applications.

Keywords: Alzheimer’s Disease; CJD; Complex diseases; Gene panels; Genetic modifiers; Machine learning; NGS; Neurodegeneration; Polygenic score.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests to declare.

Figures

**Fig. 1**
2D plot of the Principal Component Analysis (PCA) computed on the 1046 × 112 ternary matrix. PCA is a dimensionality reduction technique that computes an orthogonal linear transformation of the data to a new 2D coordinate system so that the greatest variance is on the x-axis (PC1) and the second greatest variance on y-axis. Each dot represents a patient, that is plotted in the 2D space accordingly to its genetic profile expressed in the ternary matrix. PC1 and PC2 show the main sources of variance in our data, accounting for 22% of overall variance, that are represented by variants on *MAPT* and *NOTCH3* genes, respectively. PCA plot and hierarchical clustering recognize clusters that correspond to the *MAPT* haplotypes on the x-axis, as shown by coloured labels in the picture legend. Similarly, the distribution along the y-axis matches haplotypes in the *notch3* gene (not shown)

**Fig. 2**
Dataset classification according to decision trees analysis: this supervised method computes on the 1046 × 112 matrix a classification based on the labels provided. The classifier correctly identifies the two disease groups on the two disease-causing mutations

**Fig. 3**
Result of Decision Trees analysis on the dataset deprived of the information about gCJD-causing mutations. Classification is accomplished with 0.71 accuracy for sAD and 0.85 for gCJD. Classification is based on the reported eight variants harboured in six genes. Four of these are variants of uncertain significance not reported in the GnomAD database harbored in the genes *APP* c.*1A > C (rs748508166), GRN c.1179 + 100A > T, *DCTN1* p.Lys519Glu, *PRKAR1B* c.595 + 369 T > C (rs1342588350), two of them are rare (Minor Allele Frequency < 0.05) variants in the European population, *APP* p.Phe435 = (rs148180403, MAF = 0.001), *DCTN1* p.Ala816 = (rs1130484, MAF = 0.007) and two are common benign variants in *CHCHD10* (c.261 + 99A > G) and *GSN* (c.666 + 53 T > C). “Value” indicates the number of samples at the given node that fall into each category. The “Gini” score quantifies the purity of the node/leaf, when greater than zero implies that samples contained within that node belong to different classes while a gini score of zero means that within that node only a single class of samples exist

**Fig. 4**
Result of functional enrichment analysis performed on genes harbouring variants with significantly altered allele frequency compared to European population reported in the GnomAd database. Results of pathway analysis are reported as significantly (p < 0.05) enriched pathways in the first group but not in the second of each coupled comparison. Since part of the affected pathways are shared among the considered conditions, results are reported as differences between comparisons of two groups. Complete results of the functional analysis with Gene Ontology and of the Protein–Protein Interaction networks are reported in Supplementary materials

See this image and copyright information in PMC

Cited by

Genomic, transcriptomic and RNA editing analysis of human MM1 and VV2 sporadic Creutzfeldt-Jakob disease.
Tarozzi M, Baiardi S, Sala C, Bartoletti-Stella A, Parchi P, Capellari S, Castellani G. Tarozzi M, et al. Acta Neuropathol Commun. 2022 Dec 14;10(1):181. doi: 10.1186/s40478-022-01483-9. Acta Neuropathol Commun. 2022. PMID: 36517866 Free PMC article.
Database and AI Diagnostic Tools Improve Understanding of Lung Damage, Correlation of Pulmonary Disease and Brain Damage in COVID-19.
Karpiel I, Starcevic A, Urzeniczok M. Karpiel I, et al. Sensors (Basel). 2022 Aug 22;22(16):6312. doi: 10.3390/s22166312. Sensors (Basel). 2022. PMID: 36016071 Free PMC article.
Dementia-related genetic variants in an Italian population of early-onset Alzheimer's disease.
Bartoletti-Stella A, Tarozzi M, Mengozzi G, Asirelli F, Brancaleoni L, Mometto N, Stanzani-Maserati M, Baiardi S, Linarello S, Spallazzi M, Pantieri R, Ferriani E, Caffarra P, Liguori R, Parchi P, Capellari S. Bartoletti-Stella A, et al. Front Aging Neurosci. 2022 Sep 5;14:969817. doi: 10.3389/fnagi.2022.969817. eCollection 2022. Front Aging Neurosci. 2022. PMID: 36133075 Free PMC article.
Syndrome Pattern Recognition Method Using Sensed Patient Data for Neurodegenerative Disease Progression Identification.
Anjum M, Shahab S, Yu Y. Anjum M, et al. Diagnostics (Basel). 2023 Feb 26;13(5):887. doi: 10.3390/diagnostics13050887. Diagnostics (Basel). 2023. PMID: 36900031 Free PMC article.

References

1. Kousi M, Katsanis N. Genetic modifiers and oligogenic inheritance. Cold Spring Harb Perspect Med. 2015;5:1–22. - PMC - PubMed
1. Rahit KMTH, Tarailo-Graovac M. Genetic modifiers and rare mendelian disease. Genes (Basel). 2020;11 - PMC - PubMed
1. Paré G, Mao S, Deng WQ. A machine-learning heuristic to improve gene score prediction of polygenic traits. Sci Rep. 2017;7:1–11. - PMC - PubMed
1. Xu C, Jackson SA. Machine learning and complex biological data. Genome Biol. 2019;20:76. - PMC - PubMed
1. Laing C, et al. The application of unsupervised clustering methods to Alzheimer’s disease. Front Comput Neurosci. 2019;1:31. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

[1] Kousi M, Katsanis N. Genetic modifiers and oligogenic inheritance. Cold Spring Harb Perspect Med. 2015;5:1–22. - PMC - PubMed

[2] Kousi M, Katsanis N. Genetic modifiers and oligogenic inheritance. Cold Spring Harb Perspect Med. 2015;5:1–22. - PMC - PubMed

[3] Rahit KMTH, Tarailo-Graovac M. Genetic modifiers and rare mendelian disease. Genes (Basel). 2020;11 - PMC - PubMed

[4] Rahit KMTH, Tarailo-Graovac M. Genetic modifiers and rare mendelian disease. Genes (Basel). 2020;11 - PMC - PubMed

[5] Paré G, Mao S, Deng WQ. A machine-learning heuristic to improve gene score prediction of polygenic traits. Sci Rep. 2017;7:1–11. - PMC - PubMed

[6] Paré G, Mao S, Deng WQ. A machine-learning heuristic to improve gene score prediction of polygenic traits. Sci Rep. 2017;7:1–11. - PMC - PubMed

[7] Xu C, Jackson SA. Machine learning and complex biological data. Genome Biol. 2019;20:76. - PMC - PubMed

[8] Xu C, Jackson SA. Machine learning and complex biological data. Genome Biol. 2019;20:76. - PMC - PubMed

[9] Laing C, et al. The application of unsupervised clustering methods to Alzheimer’s disease. Front Comput Neurosci. 2019;1:31. - PMC - PubMed

[10] Laing C, et al. The application of unsupervised clustering methods to Alzheimer’s disease. Front Comput Neurosci. 2019;1:31. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases

Affiliations

Identification of recurrent genetic patterns from targeted sequencing panels with advanced data science: a case-study on sporadic and genetic neurodegenerative diseases

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical

Miscellaneous