Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar 23;3(4):100283.
doi: 10.1016/j.xgen.2023.100283. eCollection 2023 Apr 12.

Genetic dissection of the pluripotent proteome through multi-omics data integration

Affiliations

Genetic dissection of the pluripotent proteome through multi-omics data integration

Selcan Aydin et al. Cell Genom. .

Abstract

Genetic background drives phenotypic variability in pluripotent stem cells (PSCs). Most studies to date have used transcript abundance as the primary molecular readout of cell state in PSCs. We performed a comprehensive proteogenomics analysis of 190 genetically diverse mouse embryonic stem cell (mESC) lines. The quantitative proteome is highly variable across lines, and we identified pluripotency-associated pathways that were differentially activated in the proteomics data that were not evident in transcriptome data from the same lines. Integration of protein abundance to transcript levels and chromatin accessibility revealed broad co-variation across molecular layers as well as shared and unique drivers of quantitative variation in pluripotency-associated pathways. Quantitative trait locus (QTL) mapping localized the drivers of these multi-omic signatures to genomic hotspots. This study reveals post-transcriptional mechanisms and genetic interactions that underlie quantitative variability in the pluripotent proteome and provides a regulatory map for mESCs that can provide a basis for future mechanistic studies.

Keywords: chromatin accessibility; diversity outbred mice; eQTL; embryonic stem cells; ground state metastability; multi-omics factor analysis; pQTL; pluripotency; proteomics; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

T.C. has an equity interest in Predictive Biology, Inc.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of the quantitative proteome in Diversity Outbred mESC lines (A) The proteomes of 190 mESCs were quantified and compared with published ATAC-seq and RNA-seq data. (B) The probability of detecting a protein by MS (plotted on y axis) is linked to the protein-encoding gene’s average transcript abundance (x axis). (C) Principal component analysis points to sex as the major source of proteome variation among mESCs. PC1 and PC2 are plotted and colored by sex. (D) GO:BP categories including DNA methylation, chromatin remodeling, and ribosome biogenesis show significantly higher activity by GSVA in XY compared with XX lines. Protein ADP-ribosylation shows higher activity in mESCs having at least one copy of the reference Lifr allele (two-way ANOVA followed by Tukey’s HSD, ∗p < 0.05, ∗∗∗∗p < 5 × 10−5). See also Figure S1 and Tables S1, S2, and S3.
Figure 2
Figure 2
Subunit cohesiveness varies considerably among 164 protein complexes For each complex, pairwise correlations between all subunits were calculated and summarized as a boxplot. Boxplots are ordered and colored based on their median pairwise correlation, with more cohesive complexes on the left. Specific examples are highlighted. See also Figure S2.
Figure 3
Figure 3
The quantitative proteome co-varies with chromatin accessibility and the transcriptome (A) ID1 protein abundance is highly correlated with many regions of open chromatin genome wide. Circos plot showing ATAC-seq peaks where chromatin accessibility is positively (red) and negatively (blue) correlated with ID1 abundance; n = 112, abs(correlation) > 0.5. (B) Overall agreement between the transcriptome and proteome within an mESC line is widely variable across lines, as shown by a histogram of sample-level Pearson correlations (n = 174). (C) Agreement in transcript and protein abundance for a given gene also varies widely across lines. Histogram depicting the distribution of pairwise correlation coefficients between transcript and protein abundance of genes, with overrepresented GO terms annotated below in matching colors (green, positively correlated; orange, negatively correlated; purple, genes with little or no correlation). See also Figure S3 and Table S4.
Figure 4
Figure 4
Genetic characterization of the pluripotent proteome (A) Genetic mapping identifies 1,677 significant pQTLs including 1,056 local (diagonal line) and 621 distant loci. The location of the pQTL is plotted on the x axis against the midpoint of the protein-coding gene on the y axis. (B) Most co-mapping eQTLs and pQTLs show high agreement in their haplotype effects. Histogram of pairwise correlation coefficients between inferred allele effects from eQTL and pQTL scans for all genes with co-mapping QTLs. Bars are colored by significance of the correlation. (C) Examples of local pQTLs where the influence of genetic variation is seen at all three molecular layers. Left: LOD scores obtained from caQTL, eQTL, and pQTL scans for the target gene are plotted for the peak chromosome, with the target gene’s location annotated on the x axis. Right: haplotype effects inferred at the caQTL, eQTL, and pQTL peaks are shown. (D) Histogram showing that many distant pQTLs localize to specific genomic hotspots. (E) The effect of one local pQTL is propagated across all protein subunits of the replication complex. Left: RPA1, RPA2, and RPA3 LOD scores are plotted for Chr 6 (x axis) and show a shared pQTL peak at the location of the Rpa3 gene. Right: the inferred allele effects at the peak for all three proteins show high concordance. (F) Graphical overview of the different classes of pQTLs based on their effects on one or more molecular layers. Layers lacking impact (no QTLs with LOD > 5 and matching allele effects) are depicted in gray. See also Figure S4 and Tables S5 and S6.
Figure 5
Figure 5
MOFA reveals broad regulatory signatures that encompass multiple layers of data (A) MOFA yielded 23 latent factors that capture variation in one or more layers of genomic data. For each factor, percentage of variation explained in chromatin accessibility, transcript abundance, and protein abundance is displayed as a heatmap, as is the correlation of each factor to sample covariates including sex and Lifr genotype. On the right, a heatmap indicates overrepresentation of pluripotency regulator binding sites (NANOG, OCT4 [Pou5f1], and SOX2) among the top chromatin drivers of each factor. (B) Above: Depiction of QTL mapping with MOFA factors to identify genetic modifiers of shared molecular variation. Below: Table of QTL peaks for MOFA factors. Loci previously identified as QTL hotspots are denoted in the “Type” column. (C) For all proteins, the LOD score calculated at the Chr 15 pQTL peak is plotted (y axis) relative to the protein’s contribution (factor weight) to MOFA factor 3 (x axis). Proteins with absolute factor weights <0.1 were filtered. For each protein, color corresponds to the correlation between allele effects at the Chr 15 pQTL and the factor 3 QTL. Individual proteins that mapped with a significant pQTL are colored gray, and highlight that many proteins contribute substantially to factor 3 and show high agreement in allele effects at the Chr 15 peak (dark red and blue), despite individually not mapping with a significant pQTL. (D) For each expressed transcript, LOD score at the Chr 10 eQTL peak is plotted (y axis) relative to that transcript’s contribution to factor 4 (x axis). Transcripts with absolute factor weights <0.1 were filtered, and points are colored as described in (C). Many transcripts contribute to factor 4 and have correlated allele effects at the Chr 10 QTL, despite failing to map individually with a significant Chr 10 eQTL. (E) Genome-wide LOD scores obtained from the factor 4 QTL scan are plotted with mediation results overlaid. Duxf3 expression was previously identified as a strong candidate mediator for the eQTL hotspot in this region but performs poorly as a mediator of the factor 4 QTL compared with Gm20625. Both genes are highlighted in green next to their corresponding LOD score drop. See also Figure S5 and Table S7.

References

    1. Skelly D.A., Czechanski A., Byers C., Aydin S., Spruce C., Olivier C., Choi K., Gatti D.M., Raghupathy N., Keele G.R., et al. Mapping the effects of genetic variation on chromatin state and gene expression reveals loci that control ground state pluripotency. Cell Stem Cell. 2020;27:459–469.e8. doi: 10.1016/j.stem.2020.07.005. - DOI - PMC - PubMed
    1. Hamazaki T., El Rouby N., Fredette N.C., Santostefano K.E., Terada N. Concise Review: induced pluripotent stem cell research in the era of precision medicine. Stem Cell. 2017;35:545–550. doi: 10.1002/stem.2570. - DOI - PMC - PubMed
    1. Ortmann D., Vallier L. Variability of human pluripotent stem cell lines. Curr. Opin. Genet. Dev. 2017;46:179–185. doi: 10.1016/j.gde.2017.07.004. - DOI - PubMed
    1. Volpato V., Webber C. Addressing variability in iPSC-derived models of human disease: guidelines to promote reproducibility. Dis. Model. Mech. 2020;13:dmm042317. doi: 10.1242/dmm.042317. - DOI - PMC - PubMed
    1. Czechanski A., Byers C., Greenstein I., Schrode N., Donahue L.R., Hadjantonakis A.-K., Reinholdt L.G. Derivation and characterization of mouse embryonic stem cells from permissive and nonpermissive strains. Nat. Protoc. 2014;9:559–574. doi: 10.1038/nprot.2014.030. - DOI - PMC - PubMed

LinkOut - more resources