Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Lars Schmarje¹, Johannes Brünger¹, Monty Santarossa¹, Simon-Martin Schröder¹, Rainer Kiko², Reinhard Koch¹

Affiliations

¹ Multimedia Information Processing Group, Kiel University, 24118 Kiel, Germany.
² Laboratoire d'Océanographie de Villefranche, Sorbonne Université, 06230 Villefranche-sur-Mer, France.

PMID: 34640981
PMCID: PMC8512301
DOI: 10.3390/s21196661

Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Lars Schmarje et al. Sensors (Basel). 2021.

. 2021 Oct 7;21(19):6661.

doi: 10.3390/s21196661.

Authors

Lars Schmarje¹, Johannes Brünger¹, Monty Santarossa¹, Simon-Martin Schröder¹, Rainer Kiko², Reinhard Koch¹

Affiliations

¹ Multimedia Information Processing Group, Kiel University, 24118 Kiel, Germany.
² Laboratoire d'Océanographie de Villefranche, Sorbonne Université, 06230 Villefranche-sur-Mer, France.

PMID: 34640981
PMCID: PMC8512301
DOI: 10.3390/s21196661

Abstract

Deep learning has been successfully applied to many classification problems including underwater challenges. However, a long-standing issue with deep learning is the need for large and consistently labeled datasets. Although current approaches in semi-supervised learning can decrease the required amount of annotated data by a factor of 10 or even more, this line of research still uses distinct classes. For underwater classification, and uncurated real-world datasets in general, clean class boundaries can often not be given due to a limited information content in the images and transitional stages of the depicted objects. This leads to different experts having different opinions and thus producing fuzzy labels which could also be considered ambiguous or divergent. We propose a novel framework for handling semi-supervised classifications of such fuzzy labels. It is based on the idea of overclustering to detect substructures in these fuzzy labels. We propose a novel loss to improve the overclustering capability of our framework and show the benefit of overclustering for fuzzy labels. We show that our framework is superior to previous state-of-the-art semi-supervised methods when applied to real-world plankton data with fuzzy labels. Moreover, we acquire 5 to 10% more consistent predictions of substructures.

Keywords: deep learning; fuzzy; marine; noisy; plankton; real-world; semi-supervised.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Illustration of fuzzy data and overclustering—The grey dots represent unlabeled data and the colored dots labeled data from different classes. The dashed lines represent decision boundaries. For certain data, a clear separation of the different classes with one decision boundary is possible and both classes contain the same amount of data (**top**). For fuzzy data determining a decision boundary is difficult because of intermediate datapoints between the classes (**middle**). These fuzzy datapoints can often not be easily sorted into one consistent class between annotators. If you overcluster the data, you get smaller but more consistent substructures in the fuzzy data (**bottom**). The images illustrate possible examples for certain data (cat & dog) and fuzzy plankton data (trichodesmium puff and tuft). The center plankton image was considered to be trichodesmium puff or tuft by around half of the annotators each. The left and right plankton image were consistently annotated.

**Figure 2**
Overview of our framework FOC for semi-supervised classification—The input image is x and the corresponding label is y. The arrows indicate the usage of image or label information. Parallel arrows represent the independent copy of the information. The usage of the label for the augmentations is described in Section 2.3. The red arrow stands for an inverse example image $x^{'}$ with a different label than y. The output of the normal and the overclustering head have different dimensionalities. The normal head has as many outputs as ground-truth classes exist ( $k_{G T}$ ) while the overclustering head has k outputs with $k > k_{G T}$ . The dashed boxes on the right side show the used loss functions. More information about the losses inverse cross-entropy and mutual information can be found in Section 2.1 and Section 2.2 respectively.

**Figure 3**
Qualitative results for unlabeled data—The results in each row are from the same predicted cluster. The three most important fuzzy labels based on the citizen scientists’ annotations are given below the image. The last two items with the red box in each row show examples not matching the majority of the cluster.

See this image and copyright information in PMC

References

1. Saleh A., Laradji I.H., Konovalov D.A., Bradley M., Vazquez D., Sheaves M. A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Sci. Rep. 2020;10:14671. doi: 10.1038/s41598-020-71639-x. - DOI - PMC - PubMed
1. Gómez-Ríos A., Tabik S., Luengo J., Shihavuddin A.S.M., Krawczyk B., Herrera F. Towards highly accurate coral texture images classification using deep convolutional neural networks and data augmentation. Expert Syst. Appl. 2019;118:315–328. doi: 10.1016/j.eswa.2018.10.010. - DOI
1. Thum G.W., Tang S.H., Ahmad S.A., Alrifaey M. Toward a highly accurate classification of underwater cable images via deep convolutional neural network. J. Mar. Sci. Eng. 2020;8:924. doi: 10.3390/jmse8110924. - DOI
1. Knausgård K.M., Wiklund A., Sørdalen T.K., Halvorsen K.T., Kleiven A.R., Jiao L., Goodwin M. Temperate fish detection and classification: A deep learning based approach. Appl. Intell. 2021 doi: 10.1007/s10489-020-02154-9. - DOI
1. Lombard F., Boss E., Waite A.M., Uitz J., Stemmann L., Sosik H.M., Schulz J., Romagnan J.B., Picheral M., Pearlman J., et al. Globally consistent quantitative observations of planktonic ecosystems. Front. Mar. Sci. 2019;6:196. doi: 10.3389/fmars.2019.00196. - DOI

MeSH terms

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Affiliations

Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources