Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 7;21(19):6661.
doi: 10.3390/s21196661.

Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Affiliations

Fuzzy Overclustering: Semi-Supervised Classification of Fuzzy Labels with Overclustering and Inverse Cross-Entropy

Lars Schmarje et al. Sensors (Basel). .

Abstract

Deep learning has been successfully applied to many classification problems including underwater challenges. However, a long-standing issue with deep learning is the need for large and consistently labeled datasets. Although current approaches in semi-supervised learning can decrease the required amount of annotated data by a factor of 10 or even more, this line of research still uses distinct classes. For underwater classification, and uncurated real-world datasets in general, clean class boundaries can often not be given due to a limited information content in the images and transitional stages of the depicted objects. This leads to different experts having different opinions and thus producing fuzzy labels which could also be considered ambiguous or divergent. We propose a novel framework for handling semi-supervised classifications of such fuzzy labels. It is based on the idea of overclustering to detect substructures in these fuzzy labels. We propose a novel loss to improve the overclustering capability of our framework and show the benefit of overclustering for fuzzy labels. We show that our framework is superior to previous state-of-the-art semi-supervised methods when applied to real-world plankton data with fuzzy labels. Moreover, we acquire 5 to 10% more consistent predictions of substructures.

Keywords: deep learning; fuzzy; marine; noisy; plankton; real-world; semi-supervised.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Illustration of fuzzy data and overclustering—The grey dots represent unlabeled data and the colored dots labeled data from different classes. The dashed lines represent decision boundaries. For certain data, a clear separation of the different classes with one decision boundary is possible and both classes contain the same amount of data (top). For fuzzy data determining a decision boundary is difficult because of intermediate datapoints between the classes (middle). These fuzzy datapoints can often not be easily sorted into one consistent class between annotators. If you overcluster the data, you get smaller but more consistent substructures in the fuzzy data (bottom). The images illustrate possible examples for certain data (cat & dog) and fuzzy plankton data (trichodesmium puff and tuft). The center plankton image was considered to be trichodesmium puff or tuft by around half of the annotators each. The left and right plankton image were consistently annotated.
Figure 2
Figure 2
Overview of our framework FOC for semi-supervised classification—The input image is x and the corresponding label is y. The arrows indicate the usage of image or label information. Parallel arrows represent the independent copy of the information. The usage of the label for the augmentations is described in Section 2.3. The red arrow stands for an inverse example image x with a different label than y. The output of the normal and the overclustering head have different dimensionalities. The normal head has as many outputs as ground-truth classes exist (kGT) while the overclustering head has k outputs with k>kGT. The dashed boxes on the right side show the used loss functions. More information about the losses inverse cross-entropy and mutual information can be found in Section 2.1 and Section 2.2 respectively.
Figure 3
Figure 3
Qualitative results for unlabeled data—The results in each row are from the same predicted cluster. The three most important fuzzy labels based on the citizen scientists’ annotations are given below the image. The last two items with the red box in each row show examples not matching the majority of the cluster.

References

    1. Saleh A., Laradji I.H., Konovalov D.A., Bradley M., Vazquez D., Sheaves M. A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Sci. Rep. 2020;10:14671. doi: 10.1038/s41598-020-71639-x. - DOI - PMC - PubMed
    1. Gómez-Ríos A., Tabik S., Luengo J., Shihavuddin A.S.M., Krawczyk B., Herrera F. Towards highly accurate coral texture images classification using deep convolutional neural networks and data augmentation. Expert Syst. Appl. 2019;118:315–328. doi: 10.1016/j.eswa.2018.10.010. - DOI
    1. Thum G.W., Tang S.H., Ahmad S.A., Alrifaey M. Toward a highly accurate classification of underwater cable images via deep convolutional neural network. J. Mar. Sci. Eng. 2020;8:924. doi: 10.3390/jmse8110924. - DOI
    1. Knausgård K.M., Wiklund A., Sørdalen T.K., Halvorsen K.T., Kleiven A.R., Jiao L., Goodwin M. Temperate fish detection and classification: A deep learning based approach. Appl. Intell. 2021 doi: 10.1007/s10489-020-02154-9. - DOI
    1. Lombard F., Boss E., Waite A.M., Uitz J., Stemmann L., Sosik H.M., Schulz J., Romagnan J.B., Picheral M., Pearlman J., et al. Globally consistent quantitative observations of planktonic ecosystems. Front. Mar. Sci. 2019;6:196. doi: 10.3389/fmars.2019.00196. - DOI

LinkOut - more resources