Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2012;13 Suppl 16(Suppl 16):S13.
doi: 10.1186/1471-2105-13-S16-S13. Epub 2012 Nov 5.

Knowledge-based analysis of proteomics data

Affiliations
Review

Knowledge-based analysis of proteomics data

Marina Bessarabova et al. BMC Bioinformatics. 2012.

Abstract

As it is the case with any OMICs technology, the value of proteomics data is defined by the degree of its functional interpretation in the context of phenotype. Functional analysis of proteomics profiles is inherently complex, as each of hundreds of detected proteins can belong to dozens of pathways, be connected in different context-specific groups by protein interactions and regulated by a variety of one-step and remote regulators. Knowledge-based approach deals with this complexity by creating a structured database of protein interactions, pathways and protein-disease associations from experimental literature and a set of statistical tools to compare the proteomics profiles with this rich source of accumulated knowledge. Here we describe the main methods of ontology enrichment, interactome topology and network analysis applied on a comprehensive, manually curated and semantically consistent knowledge source MetaBase and demonstrate several case studies in different disease areas.

PubMed Disclaimer

Figures

Figure 1
Figure 1
P-value based ranking in ontology enrichment analysis. Subset N represents a complete human proteome (all proteins and complexes in MetaCore database). The subset n of these nodes corresponds to the experiment. Light green ellipse depicts the set of ontology categories, which can be hierarchically ordered, (as is the case for GO processes or MeSH disease classification). R-set is the union of all the proteins tied to the particular characteristic or category (e.g., proteins associated with at least one GO processes).
Figure 2
Figure 2
Enrichment analysis of plasma proteome from an ovarian cancer mouse model. A. Top 10 significant biological processes from GO ontology. Bar length reflects the significance and equals to the negative logarithm of enrichment p-value. B. Top 10 significant maps from MetaCore canonical pathway map ontology.
Figure 3
Figure 3
Interactome analysis of proteomics datasets and gene lists. A. The general schema of interactions inside the set, between the sets and between the set and "global interactome". B. The "over" and "under"-connectivity phenomenon. The hub (P21 protein from MetaCore database, marked pink) is expected to be linked with five other proteins in the hypothetical dataset of 320 genes (purple circles), but in reality is can be linked with nine genes (purple and green circles), or 3 genes (purple circles). In these cases, it will be considered as "over" connected or "under" connected.
Figure 4
Figure 4
Complement pathway activation in glaucomatous optic nerve astrocytes revealed by proteomics. The relevant proteomic data (solid red indicator #1) and the differential gene expression data (indicators #2 and #3), mapped on the canonical pathway originally characterized in macrophages, show cross-verification of the complement pathway activation in glaucomatous ONHAs. Pathway steps confirmed by both data types are highlighted with ovals.
Figure 5
Figure 5
A diagram for hidden nodes algorithm. A set of experimentally derived nodes K is colored red. We connect them by shortest path network S (blue nodes). The rest of the global network is represented by black nodes.
Figure 6
Figure 6
"Growth factor regulation of the G1-S transition in cell cycle" network in LNCaP prostate cancer cells. The red dots indicate proteins identified as topologically significant using the gene expression profile. Blue dots indicate proteins identified as topologically significant using the proteomics profile. Red boxes-proteins identified as topologically significant from both sets of data.
Figure 7
Figure 7
Diagrams of network algorithms in MetaCore. (A) Direct interactions algorithm; (B) Shortest path algorithm; (C) Analyze network; (D) Analyze network (Transcription Factors); (E) Transcription regulation; (F) Analyze network (Receptors).
Figure 8
Figure 8
Direct interaction network of a plasma proteome from a mouse ovarian cancer model. 58 proteins were used to determine the direct relationship within the input list. A total 19 proteins were found to be directly interacting within 2 separate clusters. Directional edges are marked by colored arrows (green = activation, red = inhibition).
Figure 9
Figure 9
The top scoring network of AN algorithm applied to a plasma proteome from a mouse ovarian cancer model. Directional edges are marked by colored arrows (green = activation, red = inhibition) and root/input nodes are marked with red circles. Teal blue arrows delineate presence of interactions that represent canonical, well-established mechanisms

References

    1. Barla A, Jurman G, Riccadonna S, Merler S, Chierici M, Furlanello C. Machine learning methods for predictive proteomics. Brief. Bioinformatics. 2008;9:119–128. - PubMed
    1. Ariadne Genomics: MedScan: Text and Data Mining Technology. http://www.ariadnegenomics.com/products/medscan/
    1. I2E Enterprise Text Mining Software - Linguamatics. http://www.linguamatics.com/welcome/software/I2E.html
    1. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 2000;28:27–30. doi: 10.1093/nar/28.1.27. - DOI - PMC - PubMed
    1. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Harrys Kishore CJ, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A. Human Protein Reference Database - 2009 update. Nucleic Acids Research. 2009;37:D767–D772. doi: 10.1093/nar/gkn892. - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources