Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 21;13(5):727.
doi: 10.3390/genes13050727.

A Machine Learning Approach to Parkinson's Disease Blood Transcriptomics

Affiliations

A Machine Learning Approach to Parkinson's Disease Blood Transcriptomics

Ester Pantaleo et al. Genes (Basel). .

Abstract

The increased incidence and the significant health burden associated with Parkinson's disease (PD) have stimulated substantial research efforts towards the identification of effective treatments and diagnostic procedures. Despite technological advancements, a cure is still not available and PD is often diagnosed a long time after onset when irreversible damage has already occurred. Blood transcriptomics represents a potentially disruptive technology for the early diagnosis of PD. We used transcriptome data from the PPMI study, a large cohort study with early PD subjects and age matched controls (HC), to perform the classification of PD vs. HC in around 550 samples. Using a nested feature selection procedure based on Random Forests and XGBoost we reached an AUC of 72% and found 493 candidate genes. We further discussed the importance of the selected genes through a functional analysis based on GOs and KEGG pathways.

Keywords: Parkinson’s disease; blood transcriptomics; feature selection; inflammation; machine learning; mitochondrial dysfunction; oxidative stress; xgboost.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Schematic workflow of the performed analyses. The main phases are: (i) preprocessing, (ii) learning and (iii) performance evaluation.
Figure 2
Figure 2
Samples were collected across 25 different sites labeled with an integer number. Sites “14”, “26”, “55”, and “59” had 0 or 1 control sample only (horizontal dotted line) and were excluded from the classification analysis as batch effects due to site could not be estimated and therefore corrected for.
Figure 3
Figure 3
In black, the median AUC over 20 runs of 10-fold cross validation; in red, the median AUC ± its mean absolute deviation; in blue, the number of features (genes) where the maximum median AUC (72%) was reached. For each run, we collected the AUC values obtained at different thresholds C (or equivalently a different number of genes) and we interpolated these values to build a curve. Then we obtained the black curve as the median of 20 curves, one for each 10-fold Cross-Validation (CV) run.
Figure 4
Figure 4
Histogram of the frequency of occurrence of the top 493 genes over 20 repetitions. At each repetition we collected the 493 most important genes; over 20 repetitions we gathered in total around 800 genes, many of which (365) appeared in all 20 repetitions.
Figure 5
Figure 5
List of all the GO Biological Processes that are enriched in the selected genes, with the respective number of genes belonging to each term. The analysis was performed with enrichR at an FDR < 0.05.
Figure 6
Figure 6
List of all the GO Cellular Components that are enriched in the selected genes with the respective number of genes belonging to each term. The analysis was performed with enrichR at an FDR < 0.05.
Figure 7
Figure 7
List of all the KEGG pathways that are enriched in the selected genes with the respective number of genes belonging to each term. The analysis was performed with enrichR at an FDR < 0.05.

References

    1. GBD Disease Incidence, Prevalence Collaborators Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet. 2018;392:1789–1858. doi: 10.1016/S0140-6736(18)32279-7. - DOI - PMC - PubMed
    1. Schapira A.H.V., Chaudhuri K.R., Jenner P. Non-motor features of Parkinson disease. Nat. Rev. Neurosci. 2017;18:435–450. doi: 10.1038/nrn.2017.62. - DOI - PubMed
    1. Angelopoulou E., Paudel Y.N., Papageorgiou S.G., Piperi C. Environmental Impact on the Epigenetic Mechanisms Underlying Parkinson’s Disease Pathogenesis: A Narrative Review. Brain Sci. 2022;12:175. doi: 10.3390/brainsci12020175. - DOI - PMC - PubMed
    1. Nido G.S., Dick F., Toker L., Petersen K., Alves G., Tysnes O.B., Jonassen I., Haugarvoll K., Tzoulis C. Common gene expression signatures in Parkinson’s disease are driven by changes in cell composition. Acta Neuropathol. Commun. 2020;8:55. doi: 10.1186/s40478-020-00932-7. - DOI - PMC - PubMed
    1. Sullivan P.F., Fan C., Perou C.M. Evaluating the comparability of gene expression in blood and brain. Am. J. Med. Genet. B Neuropsychiatr. Genet. 2006;141:261–268. doi: 10.1002/ajmg.b.30272. - DOI - PubMed