Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 7;11 Suppl 6(Suppl 6):S14.
doi: 10.1186/1471-2105-11-S6-S14.

Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

Affiliations

Discovering gene functional relationships using FAUN (Feature Annotation Using Nonnegative matrix factorization)

Elina Tjioe et al. BMC Bioinformatics. .

Abstract

Background: Searching the enormous amount of information available in biomedical literature to extract novel functional relationships among genes remains a challenge in the field of bioinformatics. While numerous (software) tools have been developed to extract and identify gene relationships from biological databases, few effectively deal with extracting new (or implied) gene relationships, a process which is useful in interpretation of discovery-oriented genome-wide experiments.

Results: In this study, we develop a Web-based bioinformatics software environment called FAUN or Feature Annotation Using Nonnegative matrix factorization (NMF) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of NMF for processing gene sets are discussed. FAUN is tested on three manually constructed gene document collections. Its utility and performance as a knowledge discovery tool is demonstrated using a set of genes associated with Autism.

Conclusions: FAUN not only assists researchers to use biomedical literature efficiently, but also provides utilities for knowledge discovery. This Web-based software environment may be useful for the validation and analysis of functional associations in gene subsets identified by high-throughput experiments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Five FAUN-generated features for the NatRev collection (110 genes) along with their top (highest intensity) terms. Available options such as show-terms and term and entropy filtering are shown as pull-down menus or slider bars.
Figure 2
Figure 2
Display of the dominant terms of Feature 6 (DNA damage/ATM) from the higher resolution (rank-30) NMF model of the NatRev collection as they occur in genes highly associated with that feature. The gene filtering option uses thresholding on components of the H matrix factor in the NMF to vary the number of genes displayed. The display sentences option allows the user to view a ranked list of sentences (based on term frequency) from any particular gene (document).
Figure 3
Figure 3
An illustration of the gene-to-gene correlation FAUN option based on the Pearson correlation of gene features. The rightmost window shows the correlation between genes highly associated with the user-selected features 10, 20, and 27; the leftmost window shows the feature strength (and manually annotated labels) for the genes from the user-selected correlation cell.
Figure 4
Figure 4
FAUN workflow. All genes in the gene list are used to construct a gene document collection from which a term-by-gene document matrix is constructed using GTP [46]. The matrix is then factored using rank k to produce a k-feature-NMF model. The resulting W and H matrix factors are used to extract dominant/significant terms and dominant genes for all k features. Dominant genes are then correlated for each feature. The FAUN user can annotate any feature and the resulting annotated NMF model can be used by the FAUN classifier to classify new gene documents.
Figure 5
Figure 5
FAUN classification accuracy based on the strongest feature associated with each gene. [51]
Figure 6
Figure 6
Venn diagram of genes from different NMF (rank-k) models generated from Autism gene documents in the NatREv collection.
Figure 7
Figure 7
Gene distributions across different features from NMF (rank-k) models.
Figure 8
Figure 8
Matrix of genes by feature terms for the rank-20 NMF model of the NatRev collection.
Figure 9
Figure 9
Venn diagram of genes associated with Autism (RELN-related, blue) and methylation (MECP2-related, yellow) features using a lower threshold for H-matrix in the rank-30 NMF model of the NatRev collection. The resulting genes sets were compared to the 26 autism associated genes reported by Abrams and Geschwind (green). Both the expanded RELN and expanded MECP2 gene sets achieved an F1 score of 0.52 (69% precision, 42% recall), whereas the union of the two gene sets achieved an F1 score of 0.63 (62% precision, 58% recall). Red highlighted genes were new discoveries identified by adjusting the rank-k on the same dataset (Figure 7).

Similar articles

Cited by

References

    1. http://www.ncbi.nlm.nih.gov/pubmed
    1. Weeber M, Kors J, Mons B. Online tools to support literature-based discovery in the life sciences. Brief Bioinform. 2005;6(3):277–286. doi: 10.1093/bib/6.3.277. - DOI - PubMed
    1. Bremer E, Hakenberg J, Han EH, Berrar D, Dubitzky W, editor. Knowledge Discovery in Life Science Literature. Vol. 3886. Lecture Notes in Computer Science, Berlin: Springer; 2006. http://www.springerlink.com/content/th9635n15671
    1. Roos M, Marshall M, Gibson A, Schuemie M, Meij E, Katrenko S, Hage W, Krommydas K, Adriaans P. Structuring and extracting knowledge for the support of hypothesis generation in molecular biology. BMC Bioinformatics. 2009;10:S9. doi: 10.1186/1471-2105-10-S10-S9. - DOI - PMC - PubMed
    1. Ananiadou S, DB DK, Tsujii J. Text mining and its potential applications in systems biology. Trends Biotechnol. 2006;24(12):571–579. doi: 10.1016/j.tibtech.2006.10.002. - DOI - PubMed

Publication types

LinkOut - more resources