Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Feb 19:9:107.
doi: 10.1186/1471-2105-9-107.

Fuzzy association rules for biological data analysis: a case study on yeast

Affiliations

Fuzzy association rules for biological data analysis: a case study on yeast

Francisco J Lopez et al. BMC Bioinformatics. .

Abstract

Background: Last years' mapping of diverse genomes has generated huge amounts of biological data which are currently dispersed through many databases. Integration of the information available in the various databases is required to unveil possible associations relating already known data. Biological data are often imprecise and noisy. Fuzzy set theory is specially suitable to model imprecise data while association rules are very appropriate to integrate heterogeneous data.

Results: In this work we propose a novel fuzzy methodology based on a fuzzy association rule mining method for biological knowledge extraction. We apply this methodology over a yeast genome dataset containing heterogeneous information regarding structural and functional genome features. A number of association rules have been found, many of them agreeing with previous research in the area. In addition, a comparison between crisp and fuzzy results proves the fuzzy associations to be more reliable than crisp ones.

Conclusion: An integrative approach as the one carried out in this work can unveil significant knowledge which is currently hidden and dispersed through the existing biological databases. It is shown that fuzzy association rules can model this knowledge in an intuitive way by using linguistic labels and few easy-understandable parameters.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Linguistic labels defined for continuous features. This figure describes how the membership functions are defined for each fuzzy set in the corresponding continuous domain.
Figure 2
Figure 2
Biclusters 1 & 2. This figure shows the gene expression pattern represented by biclusters 1 (A) and 2 (B).
Figure 3
Figure 3
Biclusters 3 & 4. This figure shows the gene expression pattern represented by biclusters 3 (A) and 4 (B).
Figure 4
Figure 4
Biclusters 5 & 6. This figure shows the gene expression pattern represented by biclusters 5 (A) and 6 (B).
Figure 5
Figure 5
Comparison between fuzzy and crisp results 1. A) The histogram shows the distribution of the genes annotated in the term electron transport along the protein abundance domain. The graph below describes how the fuzzy sets are defined in this domain. The red dashed lines show the percentiles p33 and p66, i.e. the borders of the crisp sets. B) The same but for the genes annotated in the term snoRNA binding. Only the percentile p66 is shown in this case.
Figure 6
Figure 6
Comparison between fuzzy and crisp results 2. A) The histogram shows the distribution of the genes that belong to bicluster 5 along the responsiveness domain. The graph below describes how the fuzzy sets are defined in this domain. The red dashed lines show the percentiles p33 and p66, i.e. the borders of the crisp sets. B) The same but for the genes located at chromosome 16 and the intergenic length domain.
Figure 7
Figure 7
Complete Fuzzy-FP Tree. This figure shows an example of a complete Fuzzy-FP tree. Each node contains two membership degree lists, only one is included in the figure for clarity since initially both of them contain the same values.
Figure 8
Figure 8
Procedure for Fuzzy-FP Tree construction. This figure shows the pseudocode of the algorithm followed to build the Fuzzy-FP tree.
Figure 9
Figure 9
Frequent itemsets generation. This figure shows pseudocodes of the algorithm followed to traverse the Fuzzy-FP tree and get the frequent itemsets.

Similar articles

Cited by

References

    1. Kanehisa M, Bork P. Bioinformatics in the post-sequence era. Nature Genet. 2003;33:305–310. - PubMed
    1. Narayanan A, Keedwell EC, Olsson B. Artificial intelligence techniques for bioinformatics. Appl Bioinf. 2002;1:191–222. - PubMed
    1. Bhaskar H, Hoyle D, Singh S. Machine learning in bioinformatics: A brief survey and recommendations for practitioners. Computers in Biology and Medicine. 2005;36:1104–1125. Epub 2005 Oct 13. - PubMed
    1. Eisen MB, Spellman PT, Brown P, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the Nat Acad Sci USA. 1998;95:14863–14868. - PMC - PubMed
    1. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proceedings of the Nat Acad Sci USA. 1999;96:2907–2912. - PMC - PubMed

LinkOut - more resources