. 2010 Oct 7;11 Suppl 6(Suppl 6):S14.

doi: 10.1186/1471-2105-11-S6-S14.

Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

Elina Tjioe¹, Michael W Berry, Ramin Homayouni

Affiliations

Affiliation

¹ Department of Electrical Engineering and Computer Science and Graduate School of Genome Science and Techonology, University of Tennessee, Knoxville, TN 37996, USA.

PMID: 20946597
PMCID: PMC3026361
DOI: 10.1186/1471-2105-11-S6-S14

Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

Elina Tjioe et al. BMC Bioinformatics. 2010.

. 2010 Oct 7;11 Suppl 6(Suppl 6):S14.

doi: 10.1186/1471-2105-11-S6-S14.

Authors

Elina Tjioe¹, Michael W Berry, Ramin Homayouni

Affiliation

¹ Department of Electrical Engineering and Computer Science and Graduate School of Genome Science and Techonology, University of Tennessee, Knoxville, TN 37996, USA.

PMID: 20946597
PMCID: PMC3026361
DOI: 10.1186/1471-2105-11-S6-S14

Abstract

Background: Searching the enormous amount of information available in biomedical literature to extract novel functional relationships among genes remains a challenge in the field of bioinformatics. While numerous (software) tools have been developed to extract and identify gene relationships from biological databases, few effectively deal with extracting new (or implied) gene relationships, a process which is useful in interpretation of discovery-oriented genome-wide experiments.

Results: In this study, we develop a Web-based bioinformatics software environment called FAUN or Feature Annotation Using Nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. FAUN is tested on three manually constructed gene document collections. Its utility and performance as a knowledge discovery tool is demonstrated using a set of genes associated with Autism.

Conclusions: FAUN not only assists researchers to use biomedical literature efficiently, but also provides utilities for knowledge discovery. This Web-based software environment may be useful for the validation and analysis of functional associations in gene subsets identified by high-throughput experiments.

PubMed Disclaimer

Figures

**Figure 1**
**Five FAUN-generated features for the NatRev collection (110 genes) along with their top (highest intensity) terms.** Available options such as show-terms and term and entropy filtering are shown as pull-down menus or slider bars.

**Figure 2**
**Display of the dominant terms of Feature 6 (DNA damage/ATM) from the higher resolution (rank-30) NMF model of the NatRev collection as they occur in genes highly associated with that feature.** The gene filtering option uses thresholding on components of the H matrix factor in the NMF to vary the number of genes displayed. The display sentences option allows the user to view a ranked list of sentences (based on term frequency) from any particular gene (document).

**Figure 3**
**An illustration of the gene-to-gene correlation FAUN option based on the Pearson correlation of gene features.** The rightmost window shows the correlation between genes highly associated with the user-selected features 10, 20, and 27; the leftmost window shows the feature strength (and manually annotated labels) for the genes from the user-selected correlation cell.

**Figure 4**
**FAUN workflow.** All genes in the gene list are used to construct a gene document collection from which a term-by-gene document matrix is constructed using GTP [46]. The matrix is then factored using rank k to produce a k-feature-NMF model. The resulting W and H matrix factors are used to extract dominant/significant terms and dominant genes for all k features. Dominant genes are then correlated for each feature. The FAUN user can annotate any feature and the resulting annotated NMF model can be used by the FAUN classifier to classify new gene documents.

**Figure 5**
FAUN classification accuracy based on the strongest feature associated with each gene. [51]

**Figure 6**
Venn diagram of genes from different NMF (rank-k) models generated from Autism gene documents in the NatREv collection.

**Figure 7**
Gene distributions across different features from NMF (rank-k) models.

**Figure 8**
Matrix of genes by feature terms for the rank-20 NMF model of the NatRev collection.

**Figure 9**
Venn diagram of genes associated with Autism (RELN-related, blue) and methylation (MECP2-related, yellow) features using a lower threshold for H-matrix in the rank-30 NMF model of the NatRev collection. The resulting genes sets were compared to the 26 autism associated genes reported by Abrams and Geschwind (green). Both the expanded RELN and expanded MECP2 gene sets achieved an F1 score of 0.52 (69% precision, 42% recall), whereas the union of the two gene sets achieved an F1 score of 0.63 (62% precision, 58% recall). Red highlighted genes were new discoveries identified by adjusting the rank-k on the same dataset (Figure 7).

See this image and copyright information in PMC

References

1. http://www.ncbi.nlm.nih.gov/pubmed
1. Weeber M, Kors J, Mons B. Online tools to support literature-based discovery in the life sciences. Brief Bioinform. 2005;6(3):277–286. doi: 10.1093/bib/6.3.277. - DOI - PubMed
1. Bremer E, Hakenberg J, Han EH, Berrar D, Dubitzky W, editor. Knowledge Discovery in Life Science Literature. Vol. 3886. Lecture Notes in Computer Science, Berlin: Springer; 2006. http://www.springerlink.com/content/th9635n15671
1. Roos M, Marshall M, Gibson A, Schuemie M, Meij E, Katrenko S, Hage W, Krommydas K, Adriaans P. Structuring and extracting knowledge for the support of hypothesis generation in molecular biology. BMC Bioinformatics. 2009;10:S9. doi: 10.1186/1471-2105-10-S10-S9. - DOI - PMC - PubMed
1. Ananiadou S, DB DK, Tsujii J. Text mining and its potential applications in systems biology. Trends Biotechnol. 2006;24(12):571–579. doi: 10.1016/j.tibtech.2006.10.002. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

HD052472/HD/NICHD NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

Affiliation

Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources