Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 17;9(3):124.
doi: 10.3390/bioengineering9030124.

LASSO Regression Modeling on Prediction of Medical Terms among Seafarers' Health Documents Using Tidy Text Mining

Affiliations

LASSO Regression Modeling on Prediction of Medical Terms among Seafarers' Health Documents Using Tidy Text Mining

Nalini Chintalapudi et al. Bioengineering (Basel). .

Abstract

Generally, seafarers face a higher risk of illnesses and accidents than land workers. In most cases, there are no medical professionals on board seagoing vessels, which makes disease diagnosis even more difficult. When this occurs, onshore doctors may be able to provide medical advice through telemedicine by receiving better symptomatic and clinical details in the health abstracts of seafarers. The adoption of text mining techniques can assist in extracting diagnostic information from clinical texts. We applied lexicon sentimental analysis to explore the automatic labeling of positive and negative healthcare terms to seafarers' text healthcare documents. This was due to the lack of experimental evaluations using computational techniques. In order to classify diseases and their associated symptoms, the LASSO regression algorithm is applied to analyze these text documents. A visualization of symptomatic data frequency for each disease can be achieved by analyzing TF-IDF values. The proposed approach allows for the classification of text documents with 93.8% accuracy by using a machine learning model called LASSO regression. It is possible to classify text documents effectively with tidy text mining libraries. In addition to delivering health assistance, this method can be used to classify diseases and establish health observatories. Knowledge developed in the present work will be applied to establish an Epidemiological Observatory of Seafarers' Pathologies and Injuries. This Observatory will be a collaborative initiative of the Italian Ministry of Health, University of Camerino, and International Radio Medical Centre (C.I.R.M.), the Italian TMAS.

Keywords: correlations; disease mapping; lasso regression; seafarers; text mining.

PubMed Disclaimer

Conflict of interest statement

No author has any conflict during the preparation and publication of the manuscript.

Figures

Figure 1
Figure 1
Flowchart representation of typical text analysis using principles of tidy data.
Figure 2
Figure 2
Lexicon-based sentimental analysis architecture for text documents.
Figure 3
Figure 3
Lexicon based sentimental scores of the ICD 10 disease types (this is the plot of each disease sentiment changes towards more negative or positive over the times appearing in a dataset).
Figure 4
Figure 4
(a). Word cloud picturization of positive (green) and negative (red) sentimental words (most of the word alignments are associated with words pain, master, symptoms, hot, correct, lacking etc.). (b). Word count that contributes both negative and positive sentiments; the ‘pain’ word had the highest negative sentiment count (23,557) and the ‘master’ word has the highest positive sentiment count (8935).
Figure 5
Figure 5
TF-IDF word count for mental health and eye diseases category; the highest frequency symptomatic words calculated by TF-IDF are vital to disease diagnosis. This outcome presents the proper distinguishment of keywords that are important to specific categorical documents within the collection in a group of documents.
Figure 6
Figure 6
Data visualization networks (Common bigrams that occurred in categorical disease documents).
Figure 7
Figure 7
Correlation table between the symptomatic words.
Figure 8
Figure 8
ROC curve for text classification using LASSO regularized regression.

Similar articles

Cited by

References

    1. Abila S.S., Acejo I.L. Mental health of Filipino seafarers and its implications for seafarers’ education. Int. Marit. Health. 2021;72:183–192. doi: 10.5603/IMH.2021.0035. - DOI - PubMed
    1. Guillot-Wright S. The changing economic structure of the maritime industry and its adverse effects on seafarers’ health care rights. Int. Marit. Health. 2017;68:77–82. doi: 10.5603/IMH.2017.0015. - DOI - PubMed
    1. Caruso G. Do seafarers have sunshine; Proceedings of the 8th International Symposium on Maritime Health (ISMH) Book of Abstracts; Rijeka, Croatia. 8–13 May 2005.
    1. Laraqui O., Manar N., Laraqui S., Ghailan T., Deschamps F., Hammouda R., Laraqui C.E.H. Prevalence of skin diseases amongst Moroccan fishermen. Int. Marit. Health. 2018;69:22–27. doi: 10.5603/IMH.2018.0004. - DOI - PubMed
    1. Mahdi S.S., Amenta F. Eighty years of CIRM. A journey of commitment and dedication in providing maritime medical assistance. Int. Marit. Health. 2016;67:187–195. doi: 10.5603/IMH.2016.0036. - DOI - PubMed

LinkOut - more resources