Classification of histogram-valued data with support histogram machines
- PMID: 36819077
- PMCID: PMC9930853
- DOI: 10.1080/02664763.2021.1947996
Classification of histogram-valued data with support histogram machines
Abstract
The current large amounts of data and advanced technologies have produced new types of complex data, such as histogram-valued data. The paper focuses on classification problems when predictors are observed as or aggregated into histograms. Because conventional classification methods take vectors as input, a natural approach converts histograms into vector-valued data using summary values, such as the mean or median. However, this approach forgoes the distributional information available in histograms. To address this issue, we propose a margin-based classifier called support histogram machine (SHM) for histogram-valued data. We adopt the support vector machine framework and the Wasserstein-Kantorovich metric to measure distances between histograms. The proposed optimization problem is solved by a dual approach. We then test the proposed SHM via simulated and real examples and demonstrate its superior performance to summary-value-based methods.
Keywords: 62H30; Support vector machines; Wasserstein-Kantorovich metric; symbolic data.
© 2021 Informa UK Limited, trading as Taylor & Francis Group.
Conflict of interest statement
No potential conflict of interest was reported by the author(s).
Figures







References
-
- Alaei A. and Roy P.P., A new method for writer identification based on histogram symbolic representation, 14th International Conference on Frontiers in Handwriting Recognition, Heraklion, 2014, pp. 216–221.
-
- Angulo C., Anguita D., Abril L.G., and Ortega J.A., Support Vector Machines for Interval Discriminant Analysis, Neurocomput. 71 (2008), pp. 1220–1229.
-
- Billard L. and Diday E., From the statistics of data to the statistics of knowledge: Symbolic data analysis, J. Am. Stat. Assoc. 98 (2003), pp. 470–487.
-
- Billard L. and Diday E., Symbolic Data Analysis: Conceptual Statistics and Data Mining, Wiley, Chichester, 2007.
-
- Bottou L., Cortes C., Denker J.S., Druncker H., Guyon I., Jackel L., LeCun Y., Muller U.A., Sackinger E., Simard P., and Vapnik V., Comparison of classifier methods: a case study in handwritten digit recognition, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 – Conference C: Signal Processing (Cat. No.94CH3440–5), 1994, pp. 77–82 vol.2. 10.1109/ICPR.1994.576879. - DOI
LinkOut - more resources
Full Text Sources