On the Use of Optimal Transportation Theory to Recode Variables and Application to Database Merging
- PMID: 31527293
- DOI: 10.1515/ijb-2018-0106
On the Use of Optimal Transportation Theory to Recode Variables and Application to Database Merging
Abstract
Merging databases is a strategy of paramount interest especially in medical research. A common problem in this context comes from a variable which is not coded on the same scale in both databases we aim to merge. This paper considers the problem of finding a relevant way to recode the variable in order to merge these two databases. To address this issue, an algorithm, based on optimal transportation theory, is proposed. Optimal transportation theory gives us an application to map the measure associated with the variable in database A to the measure associated with the same variable in database B. To do so, a cost function has to be introduced and an allocation rule has to be defined. Such a function and such a rule is proposed involving the information contained in the covariates. In this paper, the method is compared to multiple imputation by chained equations and a statistical learning method and has demonstrated a better average accuracy in many situations. Applications on both simulated and real datasets show that the efficiency of the proposed merging algorithm depends on how the covariates are linked with the variable of interest.
References
-
- Bloch I. Fusion d’informations en traitement du signal et des images. France: Hermes Science Publication. 2003
-
- Hall D, Llinas J. An introduction to multisensor data fusion. Proc. IEEE. 1997;85:6–23.
-
- Abidi M, Gonzalez R. Data fusion in robotics and machine intelligence. United States: Academic Press. 1992
-
- Smyth P, Heckerman D, Jordan M. Probabilistic independance networks for hidden markov probability models. Technical Report MSR-TR-96-03, Microsoft Research, 1996.
-
- Rabiner L. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77:257–85.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources