Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Dec 15;24(24):2894-900.
doi: 10.1093/bioinformatics/btn553. Epub 2008 Oct 30.

Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models

Affiliations

Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models

Simon Rogers et al. Bioinformatics. .

Abstract

Motivation: Modern transcriptomics and proteomics enable us to survey the expression of RNAs and proteins at large scales. While these data are usually generated and analyzed separately, there is an increasing interest in comparing and co-analyzing transcriptome and proteome expression data. A major open question is whether transcriptome and proteome expression is linked and how it is coordinated.

Results: Here we have developed a probabilistic clustering model that permits analysis of the links between transcriptomic and proteomic profiles in a sensible and flexible manner. Our coupled mixture model defines a prior probability distribution over the component to which a protein profile should be assigned conditioned on which component the associated mRNA profile belongs to. We apply this approach to a large dataset of quantitative transcriptomic and proteomic expression data obtained from a human breast epithelial cell line (HMEC). The results reveal a complex relationship between transcriptome and proteome with most mRNA clusters linked to at least two protein clusters, and vice versa. A more detailed analysis incorporating information on gene function from the Gene Ontology database shows that a high correlation of mRNA and protein expression is limited to the components of some molecular machines, such as the ribosome, cell adhesion complexes and the TCP-1 chaperonin involved in protein folding.

Availability: Matlab code is available from the authors on request.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Distribution of mean entropy values of p(jk). The left curve gives the true entropy, the right gives the entropy obtained when the proteins are permuted.
Fig. 2.
Fig. 2.
Protein cluster j=4 containing ribosomal proteins. Right-hand heat map shows protein profiles in j=4. Left-hand heat map shows associated mRNA profiles (each row corresponds to the same gene in each side) ordered by the mRNA cluster in which they are placed (i.e. top gene is in k=2, next group are in k=3, etc.). Red corresponds to high, green to low expression. The lower chart shows the probabilities p(kj=4) calculated from the conditional prior via Bayes law. Each colored segment corresponds to one mRNA segment and segment size is proportional to probability.
Fig. 3.
Fig. 3.
mRNA cluster k=3, containing a large proportion of ribosomal proteins (those in j=4). Left-hand heat map shows mRNA profiles (each row corresponds to the same gene in each side) for genes in k=3, Right-hand heat map shows their associated proteins. mRNA and protein profiles in the two figures are in the same order and are ordered by their membership to the protein clusters (right map). Red corresponds to high and green to low expression. The lower chart shows the conditional prior probabilities p(jk=3). Each colored segment corresponds to one protein cluster and size is proportional to probability.
Fig. 4.
Fig. 4.
Genes from k=6 and/or j=10 involved in cell adhesion. The top two genes are involved in both clusters and are both tagged with GO:0005198 and GO:0007155. The lower plot shows these two genes and a third (GSTP1) that is present in both k=6 and j=10 but does not have these labels. Red corresponds to high and green low expression.

Similar articles

Cited by

References

    1. Alizadeh A, et al. Different types of diffuse large b-cell lymphoma identified by gene expressing profiling. Nature. 2000;403:503–511. - PubMed
    1. Barker N, et al. The Yin Yang of TCF/beta-catenin signaling. Adv. Cancer Res. 2000;77:1–24. - PubMed
    1. Chen G, et al. Discordant protein and mRNA expression in lung adenocarcinomas. Mol. Cell. Proteomics. 2002;1:304–313. - PubMed
    1. Chudova D, et al. Gene expression clustering with functional mixture models. In: Thrun S, editor. Advances in Neural Information Processing Systems. Vol. 16 2004.
    1. Dempster A, et al. Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. B. 1977;39:1–38.

Publication types