Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 12:2013:bat018.
doi: 10.1093/database/bat018. Print 2013.

MalaCards: an integrated compendium for diseases and their annotation

Affiliations

MalaCards: an integrated compendium for diseases and their annotation

Noa Rappaport et al. Database (Oxford). .

Abstract

Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/

PubMed Disclaimer

Figures

Figure 1
Figure 1
GeneCards-based annotation pipeline. Each unified disease name is fed into the GeneCards search engine to find its associated gene set, as well as publications, disease–gene associations and the corresponding contexts wherein the match occurred. The set is then forwarded to GeneDecks, which distills statistically significant descriptors (e.g. ‘cardiovascular system phenotype’, ‘apoptosis’) for the genes in the set. These shared descriptors, sorted by relevance, are featured in various MalaCards sections.
Figure 2
Figure 2
MalaCards sections. Subset of the MalaCard for sickle cell anemia. The left-hand side of each section lists its contributing sources. The right-hand side contains nuggets of section-related information, with deep links to the original sources for comprehensive scrutiny. 9. A ‘stats bar' containing the statistics of a selected set of populated sections is displayed in the card header.
Figure 3
Figure 3
MalaCards home page and search results table. (A) MalaCards 1.03 home page, including search, sample disease, logos and links to GeneCards and associate suite members and a random disease generator. (B) Example of table of search results for the ‘pemphigus’ query. Columns include disease name, MIFTS and relevance score.
Figure 4
Figure 4
MalaCards disease network. (A) MalaCards disease network created by random sampling of 12% of the nodes, conserving the degree distribution. The network is clustered, whereas nodes and edges are colored according to their cluster association and sized by their authority parameter (22). This figure was produced using Gephi (21). (B) A subset of the directly connected nodes for ‘Sickle Cell Anemia’.
Figure 5
Figure 5
Disease network properties. (A) Disease degree distribution for incoming and outgoing MalaCards search-results edges. The continuous line represents the fit to the log–log binned data, following the function f(x) = ax + b with a = −1.7 and b = 10.8, obtained from a least-square fit with adjusted R2 of 0.9. Outliers are noted in red. (B) Distribution of the number of sources associated with each disease, supplying either or both names and annotations.
Figure 6
Figure 6
Database schema. A subset of MalaCards disease-centric relational database entities and their relationships, with associated web-card sections shown outlined in bold black.

References

    1. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 2005;6:95–108. - PubMed
    1. Baxevanis AD. Searching Online Mendelian Inheritance in Man (OMIM) for information on genetic loci involved in human disease. Curr. Protoc. Hum. Genet. 2012 Chapter 9: Unit 9 13 1–10. - PubMed
    1. Amberger J, Bocchini C, Hamosh A. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®) Hum. Mutat. 2011;32:564–567. - PubMed
    1. McDonagh EM, Whirl-Carrillo M, Garten Y, et al. From pharmacogenomic knowledge acquisition to clinical applications: the PharmGKB as a clinical pharmacogenomic biomarker resource. Biomarkers. 2011;5:795–806. - PMC - PubMed
    1. Davis AP, Murphy CG, Johnson R, et al. The comparative toxicogenomics database: update 2013. Nucleic Acids Res. 2013;41:D1104–D1114. - PMC - PubMed

Publication types