Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;19(7):803-811.
doi: 10.1038/s41592-022-01526-y. Epub 2022 Jun 16.

Mass spectrometry-based draft of the mouse proteome

Affiliations

Mass spectrometry-based draft of the mouse proteome

Piero Giansanti et al. Nat Methods. 2022 Jul.

Abstract

The laboratory mouse ranks among the most important experimental systems for biomedical research and molecular reference maps of such models are essential informational tools. Here, we present a quantitative draft of the mouse proteome and phosphoproteome constructed from 41 healthy tissues and several lines of analyses exemplify which insights can be gleaned from the data. For instance, tissue- and cell-type resolved profiles provide protein evidence for the expression of 17,000 genes, thousands of isoforms and 50,000 phosphorylation sites in vivo. Proteogenomic comparison of mouse, human and Arabidopsis reveal common and distinct mechanisms of gene expression regulation and, despite many similarities, numerous differentially abundant orthologs that likely serve species-specific functions. We leverage the mouse proteome by integrating phenotypic drug (n > 400) and radiation response data with the proteomes of 66 pancreatic ductal adenocarcinoma (PDAC) cell lines to reveal molecular markers for sensitivity and resistance. This unique atlas complements other molecular resources for the mouse and can be explored online via ProteomicsDB and PACiFIC.

PubMed Disclaimer

Conflict of interest statement

Competing interests

M.W. and B.K. are founders and shareholders of OmicScouts GmbH and MSAID GmbH. They have no operational role in either company. M.F. is founder, shareholder and CEO of MSAID GmbH. T.S. is founder and shareholder of MSAID GmbH. The contents of this study are unrelated to any commercial activities. The remaining authors declare no competing interests.

Figures

Figure 1
Figure 1. Proteomic map of mouse tissues.
a, Illustration of the 41 tissues (covering 15 systems) and 66 PDAC cell lines subjected to proteome analysis. Each organ system is represented by a unique color code, and each tissue has a unique abbreviation, both are kept consistent throughout the figures. b, Number and overlap of identified protein-coding genes in the proteome and phosphoproteome datasets compared to the UniProt database. c, d, The number of protein and class I p-site (localization probability > 0.75) identifications for each tissue and cell line is displayed by heatmap bars. The color gradient within each bar reflects the number of samples each protein or p-site was identified in, where the darkest color regions represent the ubiquitous proteomes and phosphoproteomes. Dashed lines indicate proteins and p-sites identified and quantified in all tissues or cell lines. e, Schematic representation of the data and analysis workflows available in ProteomicsDB and PACiFIC.
Figure 2
Figure 2. Consolidation of the mouse proteome.
a, Pie charts showing the percentage of proteins identified by one or multiple peptides and grouped by UniProt protein evidence annotations (PE 1-5)). Numbers in brackets refer to the number of identified proteins, along with the number of unique genes they represent. b, Spectrum validation of four protein products for the gene Ahcyl2. In the left panel, the amino acid sequence of the canonical protein (Q68FL4) is shown, along with the three alternative products. Portions of the sequences identified in our dataset and which discriminate between the 4 isoforms are highlighted. In the right panel, a mirror plot of the experimental (E, top) and predicted (P, bottom) tandem mass spectra are shown for a representative peptide. Red and blue signals indicate y- and b-type fragment ions, respectively. Calculated spectral angle (SA) of 0.9 indicates near identical spectra. c, Number of observed sORF-encoded peptides (SEPs) as a function of the SA comparing measured and predicted reference spectra. SA values of >0.7 (dotted line) indicate near perfect agreement. At this cutoff, our dataset retains 719 SEPs, mapping to 712 unique sORFs (blue area). The inserted pie chart shows the proportion of sORFs with or without MS-based supporting evidence in the sORFs.org database. d, Classification and characterization of the validated (SA > 0.7) sORFs, in terms of genetic coordinates (top), initiation codon usage (bottom-left), and intensity distribution (bottom-right). The box indicate the IQR, the black vertical line indicate median value, and whiskers extend to the maximum and minimum values e, Identification frequency of the validated SEPs across all tissues and all cell lines. Bottom panel, mirror plot of the experimental (E, top) and predicted peptide (P, bottom) tandem mass spectra of an identified SEP (EDNPFAGSR) without previous MS-based supporting evidence, representing the Rbakdn gene.
Figure 3
Figure 3. Proteomic expression landscapes in the mouse.
a, Dynamic range of protein abundance (blue) and p-sites (red). Protein abundance spans ~7 orders of magnitude (OM), whereas p-sites abundance only spans ~5. In both cases, ~90% of the proteome or phosphoproteome is confined to within ~3 OM around the median value. b, Cumulative protein (top) and p-site (bottom) intensities (ranked by abundance; x axis) and their contribution to total proteome and phosphoproteome mass (y axis), respectively across all tissues or PDAC cell lines. The black solid line indicates the median, the filled area corresponds to the minimum and maximum across tissues or cell lines. c, Unsupervised clustering of mouse tissues and mPDAC proteomes, showing that strong qualitative and quantitative expression differences exist between the different proteomes. The clustering separates tissues from mPDACs, but also distinguishes the nervous system tissues, the female reproductive system tissues, the immune system tissues, and, to a lesser extent, the digestive system tissues. d, Dynamic range of the intensity-ranked proteomes of three representative tissues. Five of the most abundant genes which relate to the functional specialization of the respective tissue are listed in descending order.
Figure 4
Figure 4. Proteome comparative analysis across tissues and species.
a, Violin plots (n = 29 tissues) depicting the spread in relative contribution of the selected molecular features that can predict gene-level protein abundance using our model across tissues and species. The white dot denotes the median, while box borders indicate the first and third quartiles. Whiskers extend to the maximum and minimum values. b, Venn diagram of the relationship between orthologs and identified genes in the two species. c, Scatter plot of Pearson correlation coefficients as a measure for co-expression conservation. Each dot represents a gene annotation category (molecular functions, biological processes, or cellular components). Across each tissues pairs, when restricted to only the members of a given category, the proteome expression is highly correlated between mouse and human for the majority of the tested ontologies. However, for a small fraction of functional categories, their members are far less well conserved (higher variability of the person correlation across tissues, x-axis), suggesting different functional remodeling of the mouse and human proteomes during evolution. The dashed line marks the diagonal. d, PCA analysis of the 21 mouse and human matching tissues showing a predominant clustering of the proteomes by species. Each tissue is represented by a color matching the ones used in Figure 1 to represent the different anatomical systems. e, Proportion of gene expression variance explained by tissues (x-axis) and by species (y-axis) for each orthologous mouse-human gene pair (n = 7,459). The proteome abundance variations between mouse and human can be modeled considering two contributing factors: the species of origin and the type of tissues. Variance decomposition identified a large set of species-variable orthologs (SVOs) and tissue-variable orthologs (TVOs). The density estimation is calculated independently for each of the 3 sections of the plot, denoted by the dashed lines. f, Neighbourhood analysis of conserved co-expression (NACC) between mouse and human matching tissues at the proteome and transcriptome level. The distribution of NACC distances for each gene is shown, which represents the tendency of a gene to be co-expressed with the same set of orthologs in both species. The boxes indicate the interquartile range (IQR), the black horizontal lines indicate median values, and whiskers extend to +/- 1.5×IQR; no outliers are shown. g, Percentage of orthologs having a certain fold change when comparing each tissues pair. Between the two species, orthologs can differ as much as 100-fold. The colored lines indicate the different tissues. h, i, Scatter plot depicting proteome-based expression levels of mouse and human genes with 1:1 orthologs, highlighting differentially expressed genes in heart (h), and liver (i). The solid black line indicates the linear model estimated by reduced major-axis regression, other lines indicate absolute fold changes from the regression line of log2(10) and log2(100).
Figure 5
Figure 5. Linking large proteomic data collection with phenotypic drug and radiation response data.
a, Schematic representation of the multilevel integrative analysis workflow performed in this study to identify protein or p-site signatures associated with sensitivity or resistance. b, General selection at protein level by the partitioning tree method of the mPDACs panel in the radiation response dataset. The inset shows the prediction accuracy (Pearson correlation, n = 100 predictive models) between the predicted and measured radiation activity of random forest models combining the selected 20 proteins (see Methods). The median value and the IQR are indicated in purple. T, V, and H indicate the training, the validation and the hold-out data, respectively. Markers for resistance and sensitivity are colored in orange and blue, respectively. This color scheme is consistently used throughout the other panels of the figure. c, Lrrfip1 is a sensitive marker for radiation response (n = 66 cell lines, Pearson correlation, two-sided Pearson correlation test P < 0.05). The filled area indicates the 95% confidence interval, in blue the regression line. d, Same as Fig. 5b, but for p-sites. e, STRING-based interaction networks as in (e).DNA damage and chromatin modifying enzyme networks are highly enriched in p-sites positively correlated with radiation activity. f, Scatter plot from elastic net regression analysis showing that Sirt6 is a sensitivity marker for multiple inhibitors targeting Mek1/2. g, Scatter plot showing that Shroom2 is a sensitivity marker for five drugs targeting tubulin. ΔAUC indicates the difference between the maximum and minimum value of the standardized area under the dose-response curve (AUC) across the tested cell lines, plotted against the p-values of the Pearson correlation between Shroom2 abundance and drug sensitivity. h, Scatter plot showing that Mical2 Ser515 is a resistant marker for multiple inhibitors targeting CDK, CHK1, or ATR.

Comment in

  • The adult mouse proteome.
    Foster LJ. Foster LJ. Nat Methods. 2022 Jul;19(7):792-793. doi: 10.1038/s41592-022-01546-8. Nat Methods. 2022. PMID: 35739311 No abstract available.

References

    1. the mouse genome. Nature. 2002;420:510.
    1. Consortium MGS, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002 - PubMed
    1. Su AI, et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004 doi: 10.1073/pnas.0400782101. - DOI - PMC - PubMed
    1. Geiger T, et al. Initial Quantitative Proteomic Map of 28 Mouse Tissues Using the SILAC Mouse. Mol Cell Proteomics. 2013;12:1709–1722. - PMC - PubMed
    1. Huttlin EL, et al. A Tissue-Specific Atlas of Mouse Protein Phosphorylation and Expression. Cell. 2010;143:1174–1189. - PMC - PubMed

Publication types