Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep;15(9):e8871.
doi: 10.15252/msb.20198871.

Paralog dependency indirectly affects the robustness of human cells

Affiliations

Paralog dependency indirectly affects the robustness of human cells

Rohan Dandage et al. Mol Syst Biol. 2019 Sep.

Abstract

The protective redundancy of paralogous genes partly relies on the fact that they carry their functions independently. However, a significant fraction of paralogous proteins may form functionally dependent pairs, for instance, through heteromerization. As a consequence, one could expect these heteromeric paralogs to be less protective against deleterious mutations. To test this hypothesis, we examined the robustness landscape of gene loss-of-function by CRISPR-Cas9 in more than 450 human cell lines. This landscape shows regions of greater deleteriousness to gene inactivation as a function of key paralog properties. Heteromeric paralogs are more likely to occupy such regions owing to their high expression and large number of protein-protein interaction partners. Further investigation revealed that heteromers may also be under stricter dosage balance, which may also contribute to the higher deleteriousness upon gene inactivation. Finally, we suggest that physical dependency may contribute to the deleteriousness upon loss-of-function as revealed by the correlation between the strength of interactions between paralogs and their higher deleteriousness upon loss of function.

Keywords: CRISPR; gene dosage; gene duplication; paralogs; protein-protein interactions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure EV1
Figure EV1. Distribution of CS values in the 4 CS datasets
  1. A–D

    The locations of essential and non‐essential genes [taken as a union set of genes reported by DepMap, 2018 and BAGEL (Hart & Moffat, 2016)] are denoted on the distributions. The locations of the cancer drivers, oncogenes, and tumor suppressors are also denoted on the distribution (derived from Lever et al, 2019).

Source data are available online for this figure.
Figure 1
Figure 1. The LOF of paralogs is less deleterious than that of singletons in human cell lines
  1. A

    LOF data derived from genome‐wide CRISPR‐Cas9 screening experiments. The deleteriousness of LOF of a gene on cell proliferation is estimated from the depletion of gRNAs in the experiment. The extent of depletion is measured as a CRISPR score (CS, see Materials and Methods). CS values across cell lines from three biologically independent datasets—CS1 (Wang et al, 2015), CS2/CS2.1 (Meyers et al, 2017; DepMap, 2018), and CS3 (Shifrut et al, 2018) are shown. Genes that are not in the paralog datasets but that were not identified as singletons in the stringent identification of singletons are denoted as “unclassified”. Relatively higher CS of paralogs compared to singletons indicates that they are relatively less deleteriousness. P‐values from two‐sided Mann–Whitney U tests are shown. On the violin plots, the medians of the distributions are shown by a horizontal black line and quartiles by a vertical thick black line. For clarity, the upper and lower tails of the distributions are not shown.

  2. B, C

    (B) Comparisons of CS values between paralogs and singletons and (C) between paralogs and unclassified genes (neither clearly a paralog nor a singleton). CS data for 4 (CS1) + 450 (CS2.1) + 1 (CS3) cell lines is shown. Each point represents the mean CS for a class (singleton, paralog, or unclassified) in an individual cell line. All points are below the diagonal (dashed gray line), showing that the effect is systematic and largely cell‐line independent. Similar plots are shown for CS2 dataset in Appendix Fig S2.

  3. D

    Older paralogs tend to be more essential than younger ones and are therefore less protective (i.e., more deleterious upon LOF). On the y‐axis, the age groups are ordered in increasing distance of phylogenetic node of duplication relative to common ancestor, i.e. Opisthokonta. Sets of essential and non‐essential genes were derived from the union of gene sets reported by DepMap (2018) and BAGEL (Hart & Moffat, 2016; See Materials and Methods). P‐value from a two‐sided Mann–Whitney U test is shown. The boxes represent the first and third quartiles (Q1 and Q2) of the distribution, and the upper and lower whiskers extend up to Q3 + 1.5*interquartile range and Q1 − 1.5*interquartile range, respectively. The central horizontal line represents the median of the distributions containing 65 data points in the case of essential paralogs and 235 data points in the case of non‐essential paralogs.

Source data are available online for this figure.
Figure 2
Figure 2. The LOF of paralogs that form heteromers is more deleterious than the LOF of non‐heteromers
  1. The effect of LOF on cell proliferation (CS values) is relatively more deleterious in the case of heteromeric paralogs than non‐heteromers, across all 4 CS datasets. P‐values from two‐sided Mann–Whitney U tests are shown. Similar plot for heteromers defined with direct PPI only is shown in Appendix Fig S3.

  2. Mean CS values of heteromeric paralogs and non‐heteromers (defined by “all PPI”s from BioGRID source) are shown across cell lines. Each point represents the mean CS value for a class in an individual cell line. All the points are above the diagonal (dashed gray line), showing that the effect is systematic and largely independent of cell line. Similar plots for both PPI sources and CS2 dataset are shown in Appendix Fig S4.

  3. Similar to panel (B), but comparing paralogs that form heteromers and homomers to those that form homomers only (defined by “all PPI”s from BioGRID source). This result shows that the difference between heteromers and non‐heteromers is not caused by the fact that heteromers are also enriched for homomers. Similar plots for both PPI sources and CS2 dataset are shown in Appendix Fig S4.

  4. Paralogs that form heteromers tend to have been duplicated earlier in evolution. The age of the paralog pairs is shown in terms of synonymous substitutions per site (dS) (see Materials and Methods), a proxy for age. Data are shown for interactions derived from “all PPI”, and those that are more likely to detect “direct PPI”. P‐values from two‐sided Mann‐Whitney U tests are shown.

  5. Paralogs that form heteromers tend to be more deleterious upon LOF than other paralogs. Data from CS2.1 are shown, largely independent of the age of the paralog. In the legends, paralogs are ordered by their age. The CS values per class of paralogs (heteromer or not) and their age group are aggregated by taking median across cell lines. Note that while heteromers are more deleterious in most of the age groups, in the case of 2 out of 10 age groups a reverse trend is observed. Distributions of the CS values per class of paralogs (heteromer or not) and their age group for this analysis are shown in Appendix Fig S5A. Similar analysis with dataset CS2 and for heteromers detected with “direct PPI”s only is shown in Appendix Fig S5 B–D. P‐values from two‐sided Mann‐Whitney U tests are shown.

Data information: On the violin plots (panel A and D), the medians of the distributions are denoted by a horizontal black line, while the quartiles of the distributions from the median value are indicated by a vertical thick black line. For clarity, the upper and lower tails of the distributions are not shown in panel (A).Source data are available online for this figure.
Figure 3
Figure 3. Association between the molecular functions of paralogs, their probability of heteromerization, and the effect of gene LOF on cell proliferation
Average CS values of paralogs (heteromer or not heteromer) belonging to a gene set were used in the analysis. On the y‐axis, GO terms for molecular functions are sorted according to their proportion of heteromeric paralogs (i.e., # of heteromers/# of paralogs, heteromers defined by “all PPI”). The size of the circles represents the number of paralog pairs in a category, and the colors represent the proportion of heteromers in that category. In the left panel, average CS values of heteromers per category are shown on the x‐axis. In the right panel, the difference between the average CS value of the heteromers and average CS values of the non‐heteromers are shown on the x‐axis. The terms with significant difference between the average CS value of the heteromers and average CS value of the non‐heteromers (estimated by two‐sided t‐test) are annotated with the blue edges. The descriptions of the representative significant GO terms with the highest difference are shown in the right‐side panel. Spearman rank correlation between the proportion of the heteromers in the GO terms and the average CS value of paralogs in the term [r s(# of heteromers/# of paralogs per term, CS mean of paralogs per term)] is shown in the lower left corner. Only GO molecular functions with more than 10% of the number of paralogs in all the gene sets are shown. Similar analysis for the GO biological process and GO cellular component aspect, for the “all PPI” based data, is shown in Fig EV2. Similar analysis with the “direct PPI” data is shown in Appendix Fig S6. See Dataset EV5 for GO terms and annotations shown on this figure. Note that not all gene sets are independent because some genes are in several categories.Source data are available online for this figure.
Figure EV2
Figure EV2. Association between the biological processes and cellular components of paralogs, their probability of heteromerization, and the effect of gene LOF on cell proliferation, in the case of the heteromers defined by the “all PPI” only
  1. A, B

    Gene set analysis for Biological Processes and Cellular Components is shown in panels (A) and (B), respectively. Average CS values (x‐axis) of paralogs (heteromer or not heteromer) belonging to a gene set were used in the analysis. In each panel, GO terms are sorted according to their proportion of heteromeric paralogs (i.e., # of heteromers/# of paralogs). The size of the circles represents the number of paralog pairs in a category, and the colors represent the proportion of heteromers in the category. In the left panel, average CS value of heteromers per category is shown on the x‐axis. In the right panel, the difference between the average CS value of the heteromers and average CS value of the non‐heteromers is shown on the x‐axis. The terms with significant difference between the average CS value of the heteromers and average CS value of the non‐heteromers (estimated by two‐sided t‐test) are annotated with the blue edges. Descriptions of the representative significant GO terms with the highest difference are shown in the right‐side panel. Spearman rank correlation between the proportion of heteromers in the GO terms and the average CS value of paralogs in the term [r s(# of heteromers/# of paralogs per term, CS mean of paralogs per term)] is shown in the lower left corner. Only GO molecular functions with more than 10% of the number of paralogs in all the gene sets are shown. See Dataset EV4 for GO term annotations shown on this figure. Note that not all gene sets are independent because some genes are in several categories.

Source data are available online for this figure.
Figure 4
Figure 4. Relationship between the effect of LOF of a gene on cell proliferation, mRNA expression, and number of protein–protein interaction partners
  1. The effect of gene LOF on cell proliferation as measured in terms of CS values is correlated with mRNA expression and number of PPI partners. Considering the interdependence between the three related factors, partial correlations were estimated using Spearman correlation coefficients (ρ) between each pair of factors while controlling for the third factor (covariate, indicated in the curly brackets). The P‐values associated with the correlations are denoted on the heatmap. Average CS values across CS datasets were used. See Appendix Fig S7 for correlations in case of individual CS datasets and direct PPI.

  2. Paralogs that form heteromers have more interacting partners compared to non‐heteromers. Number of interactions is in log2 scale. Similar plot with heteromeric paralogs detected with only direct PPI is shown in Appendix Fig S8A.

  3. Paralogs that form heteromers show higher expression than non‐heteromers. Similar plot with heteromers of paralogs detected with only direct PPI is shown in Appendix Fig S8B. Cell‐line‐wise comparisons with heteromers defined by “all PPI” and “direct PPI” are shown in Appendix Fig S8C and D, respectively. Contribution of the interacting factors in determining the paralog status is determined by jointly modeling through two approaches: partial correlations (panel D) and classification models (panel E).

  4. Partial Spearman correlation coefficients (r, shown on the y‐axis), between CS values and a paralog status (heteromer or not, binary variable, 1: heteromer, 0: not heteromer). The correlations were determined while controlling for none of mRNA expression and number of interactions (“none”), only mRNA expression (“expression”), only number of interactions (“interaction”), or both (“both”) (as shown on the x‐axis). Controlling for the number of interactions leads to the greater loss of negative correlation, indicating that it contributes to the correlation more than mRNA expression. Similar analysis with heteromers defined by “direct PPI” is shown in Appendix Fig S8E.

  5. Feature importance (shown on the y‐axis) of the three factors as determined through four different classification models (shown on the x‐axis). Means and standard deviations of the ROC AUC values across all cross validations and bootstrapping runs (see Materials and Methods) are plotted for each of the four classifiers. The CS values used for this analysis are mean of the CS values across all the CS datasets. For similar analysis with the four individual CS datasets, see Appendix Fig S9 A–D.

Data information: In panels (B and C), P‐values from two‐sided Mann–Whitney U tests are shown. On the violin plots, the medians of the distributions are shown by a horizontal black line and quartiles by a vertical thick black line. For clarity, the upper and lower tails of the distributions are not shown.Source data are available online for this figure.
Figure EV3
Figure EV3. Paralogs have fewer interaction partners and lower mRNA expression compared to singletons
  1. Paralogs have fewer interaction partners than singletons. Number of interactions is in log2‐transformed.

  2. Paralogs have lower mRNA expression than singletons. mRNA expression of genes is shown in terms of log2 of FPKM.

  3. Across the majority of cell lines, the average mRNA expression of paralogs is lower than that of singletons. Each point represents the average mRNA expression (FPKM in log2 scale) for a class (paralog or singleton) in an individual cell line. All points are above the diagonal (dashed gray line), indicating that the effect is systematic and largely cell‐line independent.

  4. Partial Spearman correlation coefficients (r, shown on the y‐axis) between the CS value and a paralog status of a gene (paralog or singleton, binary variable, 1: paralog, 0: singleton). The correlations were calculated while controlling for none of mRNA expression and number of interactions (“none”), only mRNA expression (“expression”), only number of interactions (“interaction”), or both (“both”) (as shown on the x‐axis). Controlling for mRNA expression leads to the greater loss of correlation for interactions. The mRNA expression of paralogs is a better contributor to correlation between the CS values and the status of the gene being paralog or singleton (binary variable), than the number of interaction partners.

  5. Interdependence of the robustness of paralogs (shown in terms of CS score, y‐axis) on the mRNA expression (on y‐axis). Gene subsets, i.e., paralog or singleton and CS datasets are shown in rows. In the columns, mRNA expression of the genes binned into five equal‐sized bins is shown. Median of the CS values of the genes in each subset is shown on the heatmap. The P‐values from two‐sided Mann–Whitney U tests for the comparison of distributions of the CS values of the paralogs versus singletons, in each CS dataset and each bin of mRNA expression, are denoted on the heatmap. Distributions of CS values in each case are shown in Appendix Fig S10.

Data information: In panels (A) and (B), P‐values from two‐sided Mann–Whitney U tests are shown. On the violin plots, the medians of the distributions are denoted by a horizontal black line, whereas the quartiles from the median value are indicated by a vertical thick black line. For clarity, the upper and lower tails of the distributions are not shown.Source data are available online for this figure.
Figure 5
Figure 5. Robustness landscape visualization showing regions of deleteriousness to LOF as a function of mRNA expression and number of interaction partners
mRNA expression (lined on x‐axis) and number of PPI partners (y‐axis) are strong determinants of the deleteriousness of gene LOF (measured in terms of average CS across CS datasets, shown on z‐axis).
  1. The landscape shows the effect of LOF of genes on cell proliferation (CS) as a function of the two parameters. The region with high gene expression levels and large number of interactions clearly shows relatively lower CS values, indicating greater deleteriousness upon LOF.

  2. Kernel density estimates for paralogs and singletons are overlaid on the landscape to indicate their level of occupancy. The density of paralogs is located toward lower expression levels and small numbers of protein interaction partners, compared to singletons.

  3. Similar to (B), kernel densities of heteromeric paralogs and non‐heteromeric ones are overlaid on the landscape. The location of heteromers is biased toward higher expression levels and larger number of protein interaction partners, compared to non‐heteromers. Also, locations of representative heteromeric (UBQLN1 and UBQLN4) and non‐heteromeric pairs (COL5A1 and COL11A2) are annotated on the landscape.

Data information: Similar plots with direct PPIs only are shown in Fig EV4.Source data are available online for this figure.
Figure EV4
Figure EV4. Landscape of the robustness of human cell lines to LOF, considering direct physical interactions only
  1. RNA expression level (log2 scale FPKM scores) and number of direct protein–protein interaction partners (log2 scale) are strong determinants of the deleteriousness of gene LOF. The landscape shows CS values as a function of these two parameters. Regions of the landscape with high mRNA expression and large number of interactions clearly show lower CS values.

  2. Kernel density estimates for paralogs and singletons are overlaid on the landscape to indicate their level of occupancy. The density of paralogs is biased strongly toward lower expression levels compared to singletons.

  3. Similar to (B), heteromeric paralogs and non‐heteromeric ones are overlaid on the landscape to indicate their level of occupancy. The density of heteromers is biased strongly toward higher number of protein interaction partners, compared to non‐heteromers. The positions of representative heteromeric and non‐heteromeric pairs of paralogous genes are shown on the robustness landscape.

Source data are available online for this figure.
Figure 6
Figure 6. Asymmetric expression of paralogs and mechanistic insights into the relatively greater deleteriousness of heteromeric paralogs
  1. Schematic representing likely scenarios pertaining to the relationship between the asymmetry in mRNA expression of a pair of paralogs (P1 and P2) and their relative deleteriousness upon LOF, as discussed in the text.

  2. The most expressed paralog (P1) of a pair is more likely to be deleterious than the least expressed (P2). mRNA expression data is composed of 374 cell lines. Each point represents CS value of an individual cell line. P‐value is from two‐sided Mann–Whitney U test. On the violin plots, the medians of the distributions are denoted by a horizontal black line, and quartiles of the distributions from the medians are indicated by a vertical thick black line. For clarity, the upper and lower tails of the distributions are not shown. Heteromers in this analysis are defined from the “all PPI”s. For similar analysis with “direct PPI”s, see Appendix Fig S11.

  3. Relationship between the difference in CS of the paralog pair (P1 − P2) and the asymmetry of mRNA expression levels, i.e., (P1 − P2)/(P1 + P2), where mRNA expression of P1 is higher than P2. The values of asymmetry of mRNA expression levels close to 0 are cases in which the mRNA expression is symmetrical and asymmetrical for values near 1. Error bars represent 95% confidence interval with respect to difference in CS (y‐axis) of the paralog pairs defined by equal sized bins of the asymmetry of mRNA expression (x‐axis). The heteromers are defined by “all PPI”. Similar analysis with heteromers defined by “direct PPI” is shown in Appendix Fig S13A. The relationship between the two factors in case of representative pairs of heteromeric and non‐heteromeric paralogs is shown in Appendix Fig S14. Comparison of distributions of the correlation scores between heteromers and non‐heteromers is shown in Fig EV5B.

  4. Heteromeric paralogs tend to have more symmetric mRNA expression as compared to non‐heteromers. Distribution of the asymmetry in the mRNA expression, i.e., (P1 − P2)/(P1 + P2), where mRNA expression of P1 is higher than P2. The values near 0 are cases in which the mRNA expression is symmetrical and asymmetrical for values near 1.

  5. The deleteriousness of the heteromers upon LOF (lined on the y‐axis) is negatively correlated with the number of residues at the interaction interface (x‐axis). ρ is Spearman's correlation coefficient. P‐values associated with the Spearman's correlation coefficient are shown in the legend. Structures of representative heteromers are shown in Appendix Fig S15.

Source data are available online for this figure.
Figure EV5
Figure EV5. Relationship between the asymmetry of expression and the relative deleteriousness of paralog
  1. The probability that a highly expressed paralog P1 has higher CS than the lowly expressed paralog P2, as a function of its normalized relative mRNA expression to P2. Probability of 0.5, shown by dotted line, indicates that it is equally likely that paralog P1 would have greater CS than P2 and paralog P2 would have greater CS than P1. Probability of less than 0.5 indicates paralog P2 would have greater CS than P1. The scaled asymmetry of expression is shown on the x‐axis. On the left, P1 is more likely to have higher CS value (less deleterious) and expression is symmetric. On the right, P1 is more likely to have relatively lower CS value (more deleterious) and expression is asymmetric. Asymmetry in mRNA expression (x‐axis) was binned into 10 equal size bins. The color of the points represents the average difference of CS value in the bin. Similar analysis with the CS2 dataset is shown in Appendix Fig S11A.

  2. The average difference of CS values between P1 and P2 (P1(CS) − P2(CS)) is correlated with the asymmetry of mRNA expression [i.e., (P1 − P2)/(P1 + P2), where mRNA expression of P1 is greater than that of the P2], across cell lines. Each point in the distribution corresponds to the correlation for a single pair of paralogs. r s: Spearman correlation coefficient. Similar analysis with CS2 dataset is shown in Appendix Fig S11B. See Appendix Fig S12 for relationships between asymmetry of the mRNA expression and difference in CS values for representative pairs of heteromers and non‐heteromeric paralogs.

  3. Extent of the transcriptional coregulation in heteromers versus non‐heteromers. mRNA expression of the paralogs was correlated across 374 cell lines. r p: Pearsons's correlation coefficient. mRNA expression values were z‐score normalized before estimating correlations.

  4. Extent of the post‐transcriptional coregulation in heteromers versus non‐heteromers. Protein expression of the paralogs was correlated across 49 cell lines. While estimating the partial correlation, the protein expression of the paralogs was controlled with the mRNA expression. r p: Pearsons's correlation coefficient. Protein and mRNA expression values were z‐score normalized before taking the correlations.

Data information: In panels (B–D), P‐values from two‐sided Mann–Whitney U tests are shown. On the violin plots, the medians of the distributions are shown by a horizontal black line and quartiles by a vertical thick black line. For clarity, the upper and lower tails of the distributions are not shown.Source data are available online for this figure.

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410 - PubMed
    1. Amoutzias GD, Robertson DL, Van de Peer Y, Oliver SG (2008) Choose your partners: dimerization in eukaryotic transcription factors. Trends Biochem Sci 33: 220–229 - PubMed
    1. Baker CR, Hanson‐Smith V, Johnson AD (2013) Following gene duplication, paralog interference constrains transcriptional circuit evolution. Science 342: 104–108 - PMC - PubMed
    1. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D et al (2012) The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483: 603–607 - PMC - PubMed
    1. Barshir R, Hekselman I, Shemesh N, Sharon M, Novack L, Yeger‐Lotem E (2018) Role of duplicate genes in determining the tissue‐selectivity of hereditary diseases. PLoS Genet 14: e1007327 - PMC - PubMed

Publication types