Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 1;19(7):e0306028.
doi: 10.1371/journal.pone.0306028. eCollection 2024.

Probability density and information entropy of machine learning derived intracranial pressure predictions

Affiliations

Probability density and information entropy of machine learning derived intracranial pressure predictions

Anmar Abdul-Rahman et al. PLoS One. .

Abstract

Even with the powerful statistical parameters derived from the Extreme Gradient Boost (XGB) algorithm, it would be advantageous to define the predicted accuracy to the level of a specific case, particularly when the model output is used to guide clinical decision-making. The probability density function (PDF) of the derived intracranial pressure predictions enables the computation of a definite integral around a point estimate, representing the event's probability within a range of values. Seven hold-out test cases used for the external validation of an XGB model underwent retinal vascular pulse and intracranial pressure measurement using modified photoplethysmography and lumbar puncture, respectively. The definite integral ±1 cm water from the median (DIICP) demonstrated a negative and highly significant correlation (-0.5213±0.17, p< 0.004) with the absolute difference between the measured and predicted median intracranial pressure (DiffICPmd). The concordance between the arterial and venous probability density functions was estimated using the two-sample Kolmogorov-Smirnov statistic, extending the distribution agreement across all data points. This parameter showed a statistically significant and positive correlation (0.4942±0.18, p< 0.001) with DiffICPmd. Two cautionary subset cases (Case 8 and Case 9), where disagreement was observed between measured and predicted intracranial pressure, were compared to the seven hold-out test cases. Arterial predictions from both cautionary subset cases converged on a uniform distribution in contrast to all other cases where distributions converged on either log-normal or closely related skewed distributions (gamma, logistic, beta). The mean±standard error of the arterial DIICP from cases 8 and 9 (3.83±0.56%) was lower compared to that of the hold-out test cases (14.14±1.07%) the between group difference was statistically significant (p<0.03). Although the sample size in this analysis was limited, these results support a dual and complementary analysis approach from independently derived retinal arterial and venous non-invasive intracranial pressure predictions. Results suggest that plotting the PDF and calculating the lower order moments, arterial DIICP, and the two sample Kolmogorov-Smirnov statistic may provide individualized predictive accuracy parameters.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: We would like to declare that the authors Anmar Abdul-Rahman, William Morgan, and Dao-Yi Yu are the inventors of the Modified Photoplethysmography method. Furthermore, we have no financial interest in the results of this study. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Data workflow schematic for probability density function analysis of Extreme Gradient Boost derived intracranial pressure predictions.
Two cautionary subset cases (8 and 9) were subsetted from seven hold-out test cases and demonstrated a wide and conflicting difference between measured and predicted intracranial pressure from the arterial and venous XGB models. All cases underwent data pre-processing, descriptive statistics, and hypothesis tests were computed from the retinal vascular pulse parameters. Intracranial pressure predictions were derived from the retinal arterial and venous parameters independently. Probability density functions were generated from intracranial pressure predictions from both XGB models, where the median was considered the most favorable compared to the mean and the mode. This was likely because the median represents the geometric mean of a log-normal distribution and is supported by findings from previous work [3]. Correlations were computed between the absolute difference between the median predicted and measured intracranial pressure (DiffICPmd) and imaging characteristics: (n = number of vascular data points analyzed, Bilateral = both eyes tested, nIOPi = number of induced intraocular pressure levels applied during imaging) and distribution characteristics: (DIICP = definite integral ± 1cm water of the median, tsKS = two-sample Kolmogorov-Smirnov statistic, ADS = Anderson-Darling statistic, KS = Kolmogorov-Smirnov statistic), sEnt = Shannon entropy.
Fig 2
Fig 2. Violin plot.
Comparing the distribution of the harmonic regression wave amplitude in the hold-out and cautionary subsets.
Fig 3
Fig 3. Ridgeline plot for the probability density function of intracranial pressure predictions derived from the Extreme Gradient Boost algorithm of the arterial and venous pulsation data.
In contrast to the dominant right-skewed distribution in most cases, in cases 8 and 9, the distribution of the arterial predictions converges on a uniform distribution.
Fig 4
Fig 4. Overlapping ridgeline plots of intracranial pressure predictions derived from the Extreme Gradient Boost algorithm of the arterial and venous pulsation data.
The two-sample Kolmogorov-Smirnov statistic (tsKS) provides a quantitative comparison between two distributions across the whole range rather than just a point estimate. Within a single case, the closer the approximation of the distributions from the arterial and venous models, the lower the tsKS value. Case 4 demonstrates the lowest tsKS statistic (0.080897, p<0.003), and case 7 is the highest (0.48101).
Fig 5
Fig 5
(A-D) A comparison of the empirical cumulative distribution function of intracranial pressure predictions derived from the Extreme Gradient Boost algorithm from four cases with contrasting two-sample Kolmogorov-Smirnov statistics from four cases. Two hold-out test cases demonstrating the lowest two-sample Kolmogorov-Smirnov statistic (tsKS). Cases 4 and 5 (A, B) demonstrate favorable concordance between venous and arterial derived predictions in contrast to the subset cases 8 and 9 (C, D), where the concordance is poor. The difference in separation of the ECDF between the two models can be observed in cases 8 and 9 (C, D). The tsKS statistic depends on a ratio parameter consisting of the product of the distribution data points divided by the sum [25]. Red = arterial model, Blue = venous model.
Fig 6
Fig 6. Correlation matrix comparing distribution and imaging parameters from the hold-out and cautionary subsets.
Features from the top row are of interest. There was a significant negative correlation of DiffICPmd with laterality (bilateral was numerically coded as 2 and unilateral was 1 for this analysis) of -0.59. There was a moderate to low correlation with parameters of the distribution of the XGB-derived prediction (ADS = Anderson-Darling statistic, KS = Kolmogorov-Smirnov statistic). However, the correlation with DIICP = the definite integral ±1cm water of the median was strongly negative (-0.52), indicating that the higher the weight of the area under the curve within these bounds, the more accurate was the agreement between predicted and measured intracranial pressure. Similarly, the correlation with tsKS = two sample Kolmogorov-Smirnov statistic (0.49) was significant, indicating that the higher the overlap between the vascular model distributions, the higher the agreement with measured intracranial pressure. Comparably, Shannon entropy (sEnt) showed a strong positive correlation (0.48) indicating convergence to a uniform distribution (increased randomness) with higher DiffICPmd values.nIOPi = the levels of induced intraocular pressure applied during the imaging, n = total number of tested data points.
Fig 7
Fig 7
Pearson correlation between the definite integral ±1cm around the median of the probability density distribution (DIICP) and the absolute difference between predicted and measured intracranial pressure (DiffICPmd) for the A) arterial and B) venous models. Only A) the arterial model (r^Pearson = -0.76, p = 0.02) achieved statistical significance, in contrast to B) the venous model (r^Pearson = -0.10, p = 0.799). This indicated that the arterial model was a more discriminatory indicator of agreement between measured and predicted intracranial pressure.
Fig 8
Fig 8. Distribution of intracranial pressure and the number of analyzed image data points from the Extreme Gradient Boost training data set.
There is a low contribution of data points to the model at intracranial pressure levels <17 and >43 cm water with two participants below (K, M) and above (C, F) these boundaries, respectively. ICP = intracranial pressure in cm water [3].

References

    1. Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci. 2021;7:e623. doi: 10.7717/peerj-cs.623 - DOI - PMC - PubMed
    1. Barrett JP. The coefficient of determination—some limitations. Amer Statist. 1974;28(1):19–20. doi: 10.2307/2683523 - DOI
    1. Abdul-Rahman A, Morgan W, Yu DY. A machine learning approach in the non-invasive prediction of intracranial pressure using Modified Photoplethysmography. PLoS One. 2022;17(9):e0275417. doi: 10.1371/journal.pone.0275417 - DOI - PMC - PubMed
    1. Severini TA. Joint distributions. In: Probability, statistics, and stochastic processes. John Wiley & Sons; 2012. p. 156–247.
    1. Mishra P, Pandey CM, Singh U, Gupta A, Sahu C, Keshri A. Descriptive statistics and normality tests for statistical data. Ann Card Anaesth. 2019;22(1):67. doi: 10.4103/aca.ACA_157_18 - DOI - PMC - PubMed