Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 4;19(1):20210035.
doi: 10.1515/jib-2021-0035.

SCARF: a biomedical association rule finding webserver

Affiliations

SCARF: a biomedical association rule finding webserver

Balázs Szalkai et al. J Integr Bioinform. .

Abstract

The analysis of enormous datasets with missing data entries is a standard task in biological and medical data processing. Large-scale, multi-institution clinical studies are the typical examples of such datasets. These sets make possible the search for multi-parametric relations since from the plenty of the data one is likely to find a satisfying number of subjects with the required parameter ensembles. Specifically, finding combinatorial biomarkers for some given condition also needs a very large dataset to analyze. For fast and automatic multi-parametric relation discovery association-rule finding tools are used for more than two decades in the data-mining community. Here we present the SCARF webserver for generalized association rule mining. Association rules are of the form: a AND b AND … AND xy, meaning that the presence of properties a AND b AND … AND x implies property y; our algorithm finds generalized association rules, since it also finds logical disjunctions (i.e., ORs) at the left-hand side, allowing the discovery of more complex rules in a more compressed form in the database. This feature also helps reducing the typically very large result-tables of such studies, since allowing ORs in the left-hand side of a single rule could include dozens of classical rules. The capabilities of the SCARF algorithm were demonstrated in mining the Alzheimer's database of the Coalition Against Major Diseases (CAMD) in our recent publication (Archives of Gerontology and Geriatrics Vol. 73, pp. 300-307, 2017). Here we describe the webserver implementation of the algorithm.

Keywords: association rules; big data; data mining.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The input screen of the SCARF webserver.
Figure 2:
Figure 2:
The parameter screen of the SCARF webserver.
Figure 3:
Figure 3:
The output screen of the webserver (panel A) and a partial output with five discovered rules found in the example dataset (panel B).

References

    1. Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers; 2000.
    1. Hand DJ, Mannila H, Smyth P. Principles of Data Mining. Cambridge, MA: MIT Press; 2001.
    1. Ivan G, Szabadka Z, Grolmusz V. Being a binding site: characterizing residue composition of binding sites on proteins. Bioinformation. 2007;2:216–21. doi: 10.6026/97320630002216. - DOI - PMC - PubMed
    1. Ivan G, Szabadka Z, Ordog R, Grolmusz V, Naray-Szabo G. Four spatial points that define enzyme families. Biochem Biophys Res Commun. 2009;383:417–20. doi: 10.1016/j.bbrc.2009.04.022. - DOI - PubMed
    1. Ivan G, Szabadka Z, Grolmusz V. A hybrid clustering of protein binding sites. FEBS J. 2010;277:1494–502. doi: 10.1111/j.1742-4658.2010.07578.x. - DOI - PubMed

LinkOut - more resources