. 2022 Jan 24;22(1):20.

doi: 10.1186/s12911-022-01755-3.

Semi-supervised incremental learning with few examples for discovering medical association rules

Ricardo Sánchez-de-Madariaga^{1

2}, Juan Martinez-Romo^{3

4}, José Miguel Cantero Escribano⁵, Lourdes Araujo^{3

4}

Affiliations

¹ Telemedicine and e-Health Research Unit, Monforte de Lemos 5, Instituto de Salud Carlos III, 28029, Madrid, Spain. ricardo.sanchez@isciii.es.
² Instituto Mixto UNED-ISCIII, IMIENS, 28029, Madrid, Spain. ricardo.sanchez@isciii.es.
³ Natural Language Processings and Information Retrieval Group, Universidad Nacional de Educación a Distancia, 28040, Madrid, Spain.
⁴ Instituto Mixto UNED-ISCIII, IMIENS, 28029, Madrid, Spain.
⁵ Preventive Medicine Service, Hospital Universitario La Paz-Carlos III-Cantoblanco, 28046, Madrid, Spain.

PMID: 35073885
PMCID: PMC8785547
DOI: 10.1186/s12911-022-01755-3

Semi-supervised incremental learning with few examples for discovering medical association rules

Ricardo Sánchez-de-Madariaga et al. BMC Med Inform Decis Mak. 2022.

. 2022 Jan 24;22(1):20.

doi: 10.1186/s12911-022-01755-3.

Authors

Ricardo Sánchez-de-Madariaga^{1

2}, Juan Martinez-Romo^{3

4}, José Miguel Cantero Escribano⁵, Lourdes Araujo^{3

4}

Affiliations

¹ Telemedicine and e-Health Research Unit, Monforte de Lemos 5, Instituto de Salud Carlos III, 28029, Madrid, Spain. ricardo.sanchez@isciii.es.
² Instituto Mixto UNED-ISCIII, IMIENS, 28029, Madrid, Spain. ricardo.sanchez@isciii.es.
³ Natural Language Processings and Information Retrieval Group, Universidad Nacional de Educación a Distancia, 28040, Madrid, Spain.
⁴ Instituto Mixto UNED-ISCIII, IMIENS, 28029, Madrid, Spain.
⁵ Preventive Medicine Service, Hospital Universitario La Paz-Carlos III-Cantoblanco, 28046, Madrid, Spain.

PMID: 35073885
PMCID: PMC8785547
DOI: 10.1186/s12911-022-01755-3

Abstract

Background: Association Rules are one of the main ways to represent structural patterns underlying raw data. They represent dependencies between sets of observations contained in the data. The associations established by these rules are very useful in the medical domain, for example in the predictive health field. Classic algorithms for association rule mining give rise to huge amounts of possible rules that should be filtered in order to select those most likely to be true. Most of the proposed techniques for these tasks are unsupervised. However, the accuracy provided by unsupervised systems is limited. Conversely, resorting to annotated data for training supervised systems is expensive and time-consuming. The purpose of this research is to design a new semi-supervised algorithm that performs like supervised algorithms but uses an affordable amount of training data.

Methods: In this work we propose a new semi-supervised data mining model that combines unsupervised techniques (Fisher's exact test) with limited supervision. Starting with a small seed of annotated data, the model improves results (F-measure) obtained, using a fully supervised system (standard supervised ML algorithms). The idea is based on utilising the agreement between the predictions of the supervised system and those of the unsupervised techniques in a series of iterative steps.

Results: The new semi-supervised ML algorithm improves the results of supervised algorithms computed using the F-measure in the task of mining medical association rules, but training with an affordable amount of manually annotated data.

Conclusions: Using a small amount of annotated data (which is easily achievable) leads to results similar to those of a supervised system. The proposal may be an important step for the practical development of techniques for mining association rules and generating new valuable scientific medical knowledge.

Keywords: Association rules discovery; Machine learning; Medical records; Semi-supervised approach.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Interaction between the dataset and the unsupervised and supervised modules

**Fig. 2**
Evolution of the p value and the performance (F-measure) of the system depending on the threshold used

**Fig. 3**
Flow diagram of incremental learning. Rounded rectangles show the beginning and the end of the iterations, rectangles are the rule sets, the broken line rectangle represents the seed set performance, ovals are processes, and the diamond represents a condition

See this image and copyright information in PMC

Cited by

Discovering HIV related information by means of association rules and machine learning.
Araujo L, Martinez-Romo J, Bisbal O, Sanchez-de-Madariaga R; Cohort of the National AIDS Network (CoRIS). Araujo L, et al. Sci Rep. 2022 Oct 28;12(1):18208. doi: 10.1038/s41598-022-22695-y. Sci Rep. 2022. PMID: 36307506 Free PMC article.
Patient-Generated Collections for Organizing Electronic Health Record Data to Elevate Personal Meaning, Improve Actionability, and Support Patient-Health Care Provider Communication: Think-Aloud Evaluation Study.
Nakikj D, Kreda D, Luthria K, Gehlenborg N. Nakikj D, et al. JMIR Hum Factors. 2025 Feb 3;12:e50331. doi: 10.2196/50331. JMIR Hum Factors. 2025. PMID: 39899851 Free PMC article.

References

1. Masuda Y. The yusho rice oil poisoning incident. In: Schecter A, editor. Dioxins and health. Berlin: Springer; 1994. pp. 633–659.
1. Hämäläinen W. Efficient search methods for statistical dependency rules. Fundam Inform. 2011;113(2):117–50. doi: 10.3233/FI-2011-603. - DOI
1. Ghafari SM, Tjortjis C. A survey on association rules mining using heuristics. Wiley Interdiscip Rev Data Min Knowl Discov. 2019 doi: 10.1002/widm.1307. - DOI
1. Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual meeting of the association for computational linguistics, 26–30 June 1995, MIT, Cambridge, Massachusetts, USA, Proceedings., 1995. p. 189–196. http://aclweb.org/anthology/P/P95/P95-1026.pdf
1. Pearson KX. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond Edinb Dublin Philos Mag J Sci. 1900;50(302):157–75. doi: 10.1080/14786440009463897. - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Semi-supervised incremental learning with few examples for discovering medical association rules

Affiliations

Semi-supervised incremental learning with few examples for discovering medical association rules

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources