Attacking the out-of-domain problem of a parasite egg detection in-the-wild

Nutsuda Penpong¹, Yupaporn Wanna¹, Cristakan Kamjanlard², Anchalee Techasen³, Thanapong Intharah¹

Affiliations

¹ Visual Intelligence Laboratory, Department of Statistics, Faculty of Science, Khon Kaen University, Khon Kaen, Thailand.
² Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand.
³ Centre for Research and Development of Medical Diagnostic Laboratories (CMDL), Faculty of Associated Medical Sciences, Khon Kaen University, Khon Kaen, Thailand.

PMID: 39670064
PMCID: PMC11636797
DOI: 10.1016/j.heliyon.2024.e26153

Attacking the out-of-domain problem of a parasite egg detection in-the-wild

Nutsuda Penpong et al. Heliyon. 2024.

. 2024 Feb 13;10(4):e26153.

doi: 10.1016/j.heliyon.2024.e26153. eCollection 2024 Feb 29.

Authors

Nutsuda Penpong¹, Yupaporn Wanna¹, Cristakan Kamjanlard², Anchalee Techasen³, Thanapong Intharah¹

Affiliations

¹ Visual Intelligence Laboratory, Department of Statistics, Faculty of Science, Khon Kaen University, Khon Kaen, Thailand.
² Cholangiocarcinoma Research Institute, Khon Kaen University, Khon Kaen, Thailand.
³ Centre for Research and Development of Medical Diagnostic Laboratories (CMDL), Faculty of Associated Medical Sciences, Khon Kaen University, Khon Kaen, Thailand.

PMID: 39670064
PMCID: PMC11636797
DOI: 10.1016/j.heliyon.2024.e26153

Abstract

The out-of-domain (OO-Do) problem has hindered machine learning models especially when the models are deployed in the real world. The OO-Do problem occurs during machine learning testing phase when a learned machine learning model must predict on data belonging to a class that is different from that of the data used for training. We tackle the OO-Do problem in an object-detection task: a parasite-egg detection model used in real-world situations. First, we introduce the In-the-wild parasite-egg dataset to evaluate the OO-Do-aware model. The dataset contains 1,552 images, 1,049 parasite-egg, and 503 OO-Do images, uploaded through chatbot. It was constructed by conducting a chatbot test session with 222 medical technology students. Thereafter, we propose a data-driven framework to construct a parasite-egg recognition model for in-the-wild applications to address the OO-Do issue. In the framework, we use publicly available datasets to train the parasite-egg recognition models about in-domain and out-of-domain concepts. Finally, we compare the integration strategies for our proposed two-step parasite-egg detection approach on two test sets: standard and In-the-wild datasets. We also investigate different thresholding strategies for model robustness to OO-Do data. Experiments on two test datasets showed that concatenating an OO-Do-aware classification model after an object-detection model achieved outstanding performance in detecting parasite eggs. The framework gained 7.37% and 4.09% F1-score improvement from the baselines on Chula $_{t e s t}$ +Wild $_{O O - D o}$ dataset and the In-the-wild parasite-egg dataset, respectively.

Keywords: Chatbot; Computer vision in-the-wild; Data driven framework; Out-of-domain; Parasite egg detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Thanapong Intharah reports financial support was provided by The National Science, Research and Innovation Fund (NSRF), Thailand.

Figures

**Figure 1**
Differences between out-of-domain test data and out-of-distribution test data where the training data consists of four classes: English springer, car, parachute, and church. Out-of-distribution samples come from the same classes but they have different poses, textures, contexts, or whether from the training data. However, out-of-domain (OO-Do) samples come from completely different classes from that of the training data such as fish, cassette player, golf ball, and French horn.

**Figure 2**
Conceptual model of the relation between four datasets in the data-driven framework for training an object-detection model that is aware of the out-of-domain problem. We consider data in two dimensions. The data class is the category that the data belong to. The observation of the data is what describes the data. For computer vision tasks, observation of the data is the appearance. The hard-negative OO-Do dataset is a dataset that has similar observations to the data in the main task, but it does not have a common class with the main task. The main dataset is the dataset that only contains data from the task-focused classes. The in-the-wild dataset is the dataset used for testing, which contains data from the task-focused classes, with slightly different observations, and data from unseen classes, OO-Do samples. The OO-Do dataset is the dataset that contains a wilder set of unseen classes with completely different observations.

**Figure 3**
**(a)***Ascaris lumbricoides***(b)** Hookworms **(c)***Opisthorchis viverrini***(d)***Taenia* spp. **(e)***Trichuris trichiura***(f)** Adult parasite **(g)** Artifact **(h)** Unclear image **(i)** Other parasite egg **(j)** Arbitrary.

**Figure 4**
Overview of testing process for the classification-first framework.

**Figure 5**
Overview of training and finding thresholds for classification-first framework.

**Figure 6**
Overview of testing process for the classification-later framework.

**Figure 7**
Overview of training and optimizing for thresholds for the classification-later framework.

**Figure 8**
Examples of misclassified and correctly classified classification cases. (a-b) are examples of misclassified cases. *Echinostoma* spp. (a above) and minute intestinal fluke (b above) in the “other parasite egg” class were incorrectly classified as *Fasciolopsis buski* (a) below and OV (b) below, respectively, which could lead to lower detection rates for OO-Do images in this class. (c) is an example of the classification-later framework correctly classifying OO-Do images. An image of cat eyes above (c) was detected as Taenia spp. below (c) by the object-detection model. However, this image was correctly rejected by classification with the SoftMax threshold.

See this image and copyright information in PMC

References

1. B. Zhao, S. Yu, W. Ma, M. Yu, S. Mei, A. Wang, J. He, A. Yuille, A. Kortylewski, Ood-cv: A benchmark for robustness to out-of-distribution shifts of individual nuisances in natural images, in: Proceedings of the European Conference on Computer Vision (ECCV).
1. Olber B., Radlak K., Popowicz A., Szczepankiewicz M., Chachula K. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023. Detection of out-of-distribution samples using binary neuron activation patterns.
1. Lai C.-H., Yu S.-S., Tseng H.-Y., Tsai M.-H. A protozoan parasite extraction scheme for digital microscopic images. Comput. Med. Imaging Graph. 2010;34(2):122–130. - PubMed
1. Osaku D., Cuba C.F., Suzuki C.T., Gomes J.F., Falcão A.X. Automated diagnosis of intestinal parasites: a new hybrid approach and its benefits. Comput. Biol. Med. 2020;123 - PubMed
1. Butploy N., Kanarkard W., Maleewong Intapan P., et al. Deep learning approach for ascaris lumbricoides parasite egg classification. J. Parasitol. Res. 2021 - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Attacking the out-of-domain problem of a parasite egg detection in-the-wild

Affiliations

Attacking the out-of-domain problem of a parasite egg detection in-the-wild

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources