Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jun 15;30(12):i52-59.
doi: 10.1093/bioinformatics/btu260.

Using association rule mining to determine promising secondary phenotyping hypotheses

Affiliations

Using association rule mining to determine promising secondary phenotyping hypotheses

Anika Oellrich et al. Bioinformatics. .

Abstract

Motivation: Large-scale phenotyping projects such as the Sanger Mouse Genetics project are ongoing efforts to help identify the influences of genes and their modification on phenotypes. Gene-phenotype relations are crucial to the improvement of our understanding of human heritable diseases as well as the development of drugs. However, given that there are ∼: 20 000 genes in higher vertebrate genomes and the experimental verification of gene-phenotype relations requires a lot of resources, methods are needed that determine good candidates for testing.

Results: In this study, we applied an association rule mining approach to the identification of promising secondary phenotype candidates. The predictions rely on a large gene-phenotype annotation set that is used to find occurrence patterns of phenotypes. Applying an association rule mining approach, we could identify 1967 secondary phenotype hypotheses that cover 244 genes and 136 phenotypes. Using two automated and one manual evaluation strategies, we demonstrate that the secondary phenotype candidates possess biological relevance to the genes they are predicted for. From the results we conclude that the predicted secondary phenotypes constitute good candidates to be experimentally tested and confirmed.

Availability: The secondary phenotype candidates can be browsed through at http://www.sanger.ac.uk/resources/databases/phenodigm/gene/secondaryphenotype/list.

Contact: ao5@sanger.ac.uk or ds5@sanger.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overall workflow of the study. After determining-related phenotypes, the primary phenotype annotations assigned to genes in Sanger-MGP are enriched with potentially related phenotypes. The additional, predicted secondary phenotypes are evaluated in several steps
Fig.
2.
Fig. 2.
Illustration of the calculation of biological coherence scores to evaluate secondary phenotype predictions. Boxes that possess the same background colour are based on the same analysis scripts, only the input data differ (either randomized or original data). Black boxes symbolize the ratio of the biological coherence original versus randomized data which are used as input for the box plots depicted in Figure 3
Fig. 3.
Fig. 3.
Adding the predicted secondary phenotype annotation to the Sanger-MGP genes with reference range annotations and using these to create gene clusters based on phenotype similarity, improves the biological coherence of the obtained gene clusters
Fig. 4.
Fig. 4.
Accumulating the predicted secondary phenotypes together with reference range annotations for Sanger-MGP genes improves the predictability of causative disease genes using PhenoDigm
Fig. 5.
Fig. 5.
An extension of PhenoDigm’s web interface holds the secondary phenotype predictions

Similar articles

Cited by

References

    1. Agrawal R, et al. Fast discovery of association rules. In: Fayyad U, et al., editors. Advances in Knowledge Discovery and Data Mining. Menlo Park, California: AAAI Press; 1996. pp. 307–328.
    1. Amberger J, et al. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) Hum. Mutat. 2011;32:564–567. - PubMed
    1. Aymé S. Orphanet, an information site on rare diseases. Soins. 2003;672:46–47. - PubMed
    1. Beck T, et al. Practical application of ontologies to annotate and analyse large scale raw mouse phenotype data. BMC Bioinformatics. 2009;10(Suppl. 5):S2. - PMC - PubMed
    1. Borgelt C. Workshop of Frequent Item Set Mining Implementations (FIMI 2003) 2003. Efficient implementations of apriori and eclat.

Publication types

Supplementary concepts