Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 23;23(2):257.
doi: 10.3390/e23020257.

An Improvised Machine Learning Model Based on Mutual Information Feature Selection Approach for Microbes Classification

Affiliations

An Improvised Machine Learning Model Based on Mutual Information Feature Selection Approach for Microbes Classification

Anaahat Dhindsa et al. Entropy (Basel). .

Abstract

The accurate classification of microbes is critical in today's context for monitoring the ecological balance of a habitat. Hence, in this research work, a novel method to automate the process of identifying microorganisms has been implemented. To extract the bodies of microorganisms accurately, a generalized segmentation mechanism which consists of a combination of convolution filter (Kirsch) and a variance-based pixel clustering algorithm (Otsu) is proposed. With exhaustive corroboration, a set of twenty-five features were identified to map the characteristics and morphology for all kinds of microbes. Multiple techniques for feature selection were tested and it was found that mutual information (MI)-based models gave the best performance. Exhaustive hyperparameter tuning of multilayer layer perceptron (MLP), k-nearest neighbors (KNN), quadratic discriminant analysis (QDA), logistic regression (LR), and support vector machine (SVM) was done. It was found that SVM radial required further improvisation to attain a maximum possible level of accuracy. Comparative analysis between SVM and improvised SVM (ISVM) through a 10-fold cross validation method ultimately showed that ISVM resulted in a 2% higher performance in terms of accuracy (98.2%), precision (98.2%), recall (98.1%), and F1 score (98.1%).

Keywords: classification; image segmentation; k-fold cross validation; machine learning modeling; microorganisms; mutual information.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Research flow.

Similar articles

Cited by

References

    1. Turak E., Harrison I., Dudgeon D., Abell R., Bush A., Darwall W., Finlayson C.M., Ferrier S., Freyhof J., Hermoso V., et al. Essential Biodiversity Variables for Measuring Change in Global Freshwater Biodiversity. Biol. Conserv. 2017;3:272–279. doi: 10.1016/j.biocon.2016.09.005. - DOI
    1. Morris R.A. Biodiversity Informatics. In: Levin S., editor. Encyclopedia of Biodiversity. 2nd ed. Elsevier; Amsterdam, The Netherlands: 2013. pp. 440–445.
    1. Carranza-Rojas J., Goeau H., Bonnet P., Mata-Montero E., Joly A. Going Deeper in the Automated Identification of Herbarium Specimens. BMC Evol. Biol. 2017;17:181. doi: 10.1186/s12862-017-1014-z. - DOI - PMC - PubMed
    1. Guo X., Coops N.C., Tompalski P., Nielsen S.E., Bater C.W., John Stadt J. Regional Mapping of Vegetation Structure for Biodiversity Monitoring Using Airborne Lidar Data. Ecol. Inform. 2017;38:50–61. doi: 10.1016/j.ecoinf.2017.01.005. - DOI
    1. Janicki J., Narula N., Ziegler M., Guénard B., Economo E.P. Visualizing and Interacting with Large-Volume Biodiversity Data Using Client-Server Web-Mapping Applications: The Design and Implementation of Antmaps. Org. Ecol. Inform. 2016;32:185–193. doi: 10.1016/j.ecoinf.2016.02.006. - DOI

LinkOut - more resources