Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 10:1:794547.
doi: 10.3389/fbinf.2021.794547. eCollection 2021.

Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications

Affiliations

Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications

Agostinetto Giulia et al. Front Bioinform. .

Abstract

Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.

Keywords: DNA metabarcoding; association rule mining; machine learning; microbiome data; microbiome patterns; pattern mining.

PubMed Disclaimer

Conflict of interest statement

Author SA was employed by the company Quantia Consulting Srl. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Graphical overview of Frequent Itemset Mining (A) and Association Rule mining (B) approach integrated with elements related to microbiome analysis.
FIGURE 2
FIGURE 2
Scheme of microFIM framework. 1) Filtering taxa table; 2) Conversion of taxa table into transactional file; 3) Extract patterns with template file filled with minimum support threshold, minimum and maximum length; 4) Adding of interest measures as support, pattern length and all-confidence (Omiecinski, 2003; Xiong et al., 2006); 5) Generating pattern table, composed by presence-absence of patterns within samples and interest measures; 6) Generating visualizations.
FIGURE 3
FIGURE 3
(A) Graphical representation of Table 1; (B) Graphical representation of Table 2; (C) Pattern table generated from Table 1; (D) Pattern table generated from Table 2; (E) Jaccard heatmap plot of Table 1; (F) Jaccard heatmap plot of Table 2.
FIGURE 4
FIGURE 4
For Input 1, 2 and 3, here number of patterns obtained (1a, 2a, 3a), distribution of support values (1b, 2b, 3b) and distribution of pattern lengths (1c, 2c, 3c) are shown. In particular, three levels of analysis are shown: no filters applied to patterns, a minimum all-confidence of 0.5 and a minimum all-confidence of 0.8.
FIGURE 5
FIGURE 5
Overview of the main strengths, weaknesses, opportunities and threats (SWOT analysis) related to the use of frequent itemset mining as a tool for microbiome pattern analysis.

References

    1. Agapito G., Guzzi P. H., Cannataro M. (2015). DMET-miner: Efficient Discovery of Association Rules from Pharmacogenomic Data. J. Biomed. Inform. 56, 273–283. 10.1016/j.jbi.2015.06.005 - DOI - PubMed
    1. Agrawal R., Imieliński T., Swami A. (1993). Mining Association Rules between Sets of Items in Large Databases. SIGMOD Rec. 22, 207–216. 10.1145/170036.170072 - DOI
    1. Agrawal R., Mannila H., Srikant R., Toivonen H., Verkamo A. I. (1996). Fast Discovery of Association Rules. Data Min. Knowl. Discov. 12 (1), 307–328.
    1. Alves R., Rodriguez-Baena D. S., Aguilar-Ruiz J. S. (2010). Gene Association Analysis: a Survey of Frequent Pattern Mining from Gene Expression Data. Brief. Bioinform. 11 (2), 210–224. 10.1093/bib/bbp042 - DOI - PubMed
    1. Anaconda Software Distribution (2020). Anaconda Documentation. Austin, TX, USA: Anaconda Inc. Available at: https://docs.anaconda.com/ .

LinkOut - more resources