Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 18;35(12):10469-10477.

Variational Disentanglement for Rare Event Modeling

Affiliations

Variational Disentanglement for Rare Event Modeling

Zidi Xiu et al. Proc AAAI Conf Artif Intell. .

Abstract

Combining the increasing availability and abundance of healthcare data and the current advances in machine learning methods have created renewed opportunities to improve clinical decision support systems. However, in healthcare risk prediction applications, the proportion of cases with the condition (label) of interest is often very low relative to the available sample size. Though very prevalent in healthcare, such imbalanced classification settings are also common and challenging in many other scenarios. So motivated, we propose a variational disentanglement approach to semi-parametrically learn from rare events in heavily imbalanced classification problems. Specifically, we leverage the imposed extreme-distribution behavior on a latent space to extract information from low-prevalence events, and develop a robust prediction arm that joins the merits of the generalized additive model and isotonic neural nets. Results on synthetic studies and diverse real-world datasets, including mortality prediction on a COVID-19 cohort, demonstrate that the proposed approach outperforms existing alternatives.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Left: Distribution of a two-dimensional latent space z where the long tail associates with higher risk. Right: Tail estimations with different schemes for the long-tailed data in one-dimensional space. EVT provides more accurate characterization comparing to other mechanisms.
Figure 2:
Figure 2:
First latent dimension from the inp dataset (1% event rate). Left: Learned prior and posterior distribution, and monotonic predicted risks (right axis). Right: The latent representation values distribution grouped by event type.
Figure 3:
Figure 3:
Bootstrapped AUC (left) and AUPRC (right) distributions for the COVID mortality data (2.6% event rate).

Update of

References

    1. Alemi AA; Fischer I; Dillon JV; and Murphy K 2016. Deep variational information bottleneck. In ICLR.
    1. Aranda-Ordaz FJ 1981. On two families of transformations to additivity for binary response data. Biometrika 68(2): 357–363.
    1. Bacchetti P 1989. Additive isotonic models. Journal of the American Statistical Association 84(405): 289–294.
    1. Balkema AA; and De Haan L 1974. Residual life time at great age. The Annals of probability 792–804.
    1. Barlow RE; Bartholomew DJ; Bremner JM; and Brunk HD 1972. Statistical inference under order restrictions: The theory and application of isotonic regression. Technical report, Wiley; New York.

LinkOut - more resources