Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Aug;560(7718):325-330.
doi: 10.1038/s41586-018-0409-3. Epub 2018 Aug 8.

Genetic and transcriptional evolution alters cancer cell line drug response

Affiliations

Genetic and transcriptional evolution alters cancer cell line drug response

Uri Ben-David et al. Nature. 2018 Aug.

Abstract

Human cancer cell lines are the workhorse of cancer research. Although cell lines are known to evolve in culture, the extent of the resultant genetic and transcriptional heterogeneity and its functional consequences remain understudied. Here we use genomic analyses of 106 human cell lines grown in two laboratories to show extensive clonal diversity. Further comprehensive genomic characterization of 27 strains of the common breast cancer cell line MCF7 uncovered rapid genetic diversification. Similar results were obtained with multiple strains of 13 additional cell lines. Notably, genetic changes were associated with differential activation of gene expression programs and marked differences in cell morphology and proliferation. Barcoding experiments showed that cell line evolution occurs as a result of positive clonal selection that is highly sensitive to culture conditions. Analyses of single-cell-derived clones demonstrated that continuous instability quickly translates into heterogeneity of the cell line. When the 27 MCF7 strains were tested against 321 anti-cancer compounds, we uncovered considerably different drug responses: at least 75% of compounds that strongly inhibited some strains were completely inactive in others. This study documents the extent, origins and consequences of genetic variation within cell lines, and provides a framework for researchers to measure such variation in efforts to support maximally reproducible cancer research.

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The authors declare no competing financial interests.

Figures

Extended Data Figure 1:
Extended Data Figure 1:. Comparison of Broad and Sanger genomic features across 106 cell lines
(a) Comparison of the Pearson correlations of germline vs. somatic SNVs across 106 paired cell lines. (b) A histogram presenting the distribution of mutation discordance fractions across cell lines. The distribution of all non-silent SNVs is shown in black, and that of the 447 genes included in the Oncopanel is shown in gray. (c) Comparison of the fraction of discordant gene-level CNAs between the Broad and the Sanger (n=106 cell lineS), using three different threshold for CNA calling. Bar, median; box, 25th and 75th percentiles; whiskers, 1.5*IQR of lower and upper quartile; circles: data points. (d) A histogram presenting the distribution of mutation discordance fractions across cell lines. Mutations are colored as in (c). (e) CNA landscapes of 11 paired cell lines. For each cell line, the upper row presents the CNA landscape of the Broad strain, and the lower row presents that of the Sanger strain, Copy number gains are shown in red, and copy number losses are shown in blue. CNAs <10Mb in size are not presented. (f) A histogram presenting the fraction of the genome affected by subclonal events across 916 cell lines from the Cancer Cell Line Encyclopedia. MCF7 and A549 are denoted by arrows. (g) All CCLE cell lines ranked by their aneuploidy scores. (h) All CCLE cell lines ranked by the number of gene-level CNAs that they harbor. (i) All CCLE cell lines ranked by the number of gene-level SNVs that they harbor. (j) All CCLE cell lines ranked by their chromosomal instability (CIN70) signature scores. (k) All CCLE cell lines ranked by their DNA repair signature scores80. (l) All CCLE cell lines ranked by their genomic instability scores79. (m) All CCLE cell lines ranked by their subclonal genome fraction78. A vertical black line represents the rank of MCF7 in each comparison. (n) Comparison of gene expression variation across multiple strains of nine cell lines, including MCF7. Box plots present the standard deviations of the expression levels for the 978 landmark genes directly measured in L1000. Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5*IQR of lower or upper quartile; circles: all data points.
Extended Data Figure 2:
Extended Data Figure 2:. Schematic representation of the MCF7 and A549 strains included in the current study
(a) MCF7 strains included in this study, presenting their origins (columns), years of acquisition (rows), manipulations (color) and progeny relationships (arrows). (b) A table of the MCF7 strains included in this study, presenting their origins, years of acquisition, passage numbers, and genetic manipulations. (c) A549 strains included in this study, presenting their origins (columns), years of acquisition (rows), manipulations (color) and progeny relationships (arrows). (d) A table of the A549 strains included in this study, presenting their origins, years of acquisition, passage numbers, and genetic manipulations.
Extended Data Figure 3:
Extended Data Figure 3:. Genetic variation across 27 MCF7 strains.
(a) Variation in the copy number status of nine selected genes across 27 MCF7 strains. Copy number gains are shown in red, and copy number losses in blue. Thresholds for relative gains/losses were set at ±0.1. (b) Western blots presenting the relative protein expression levels of ERα across strains. The expression of β-actin was used for normalization. For gel source data, see Supplementary Fig. 1. The experiment was repeated twice with similar results. (c) Quantification of the relative expression of ERα. Strains Q and M were excluded from the comparison. Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5*IQR of lower or upper quartile; circles: all data points. One-tailed t-test. (d) The allelic fractions of non-silent mutations in seven selected genes across 27 MCF7 strains. (e) The number of non-silent point mutations (SNVs) across the 27 MCF7 strains. (f) The number of COSMIC non-silent point-mutations shared by each number of MCF7 strains. (g) Top: unsupervised hierarchical clustering of 27 MCF7 strains, based on all their SNVs. Groups of strains expected to cluster together based on their evolutionary history are highlighted, as in Fig. 1. Bottom: a corresponding heatmap, showing the mutation status of all mutations across the 27 MCF7 strains. Shown are mutations identified only in a subset of the strains, which were detected in above 5% of the reads (allelic fraction>0.05). The presence of a mutation is shown in yellow, and its absence in gray. (h) The number of large (>15bp) indels and rearrangements across the 27 MCF7 strains. Indels are shown in gray, and rearrangements in black. (i) The recurrence of SVs in each of the 42 (out of 60) genes for which at least one event was detected. (j) The number of SVs shared by each number of MCF7 strains. Note that this analysis is limited to the 60 genes listed in Supplementary Table 2.
Extended Data Figure 4:
Extended Data Figure 4:. Comparison of CNA landscapes between MCF7 strains.
(a) CNA landscapes of a pair of MCF7 strains separated from each other by extensive passaging. (b) CNA landscapes of three pairs of MCF7 strains separated from each other by a genetic manipulation (introduction of a GFP reporter). (c) CNA landscapes of 10 MCF7 strains separated by multiple freeze-thaw cycles, with little passaging in between. (d) CNA landscapes of a pair of MCF7 strains that were either cultured in vitro (top) or cultured in vivo and treated with tamoxifen (bottom). (e) CNA landscapes of a pair of MCF7 strains separated by merely seven passages from each other. (f) CNA landscapes of a pair of MCF7 strains before (top) or after (bottom) introduction of Cas9. (g) CNA landscapes of a pair of MCF7 strains obtained from four different sources. (h) CNA landscapes of a pair of MCF7 strains separated from each other by extensive passaging. Data points represent 1Mb bins throughout the genome. Gains are shown in red, losses in blue. Yellow backgrounds highlight differential CNAs between the compared strains.
Extended Data Figure 5:
Extended Data Figure 5:. Characterization of the variation in allelic fraction and cellular prevalence of SNVs across 27 MCF7 strains and their single cell-derived clones.
(a) Top: unsupervised hierarchical clustering of 27 MCF7 strains, based on the allelic fractions of all their SNVs. Groups of strains expected to cluster together based on their evolutionary history are highlighted, as in Fig. 1. Bottom: a corresponding heatmap, showing the allelic fractions of all mutations across the 27 MCF7 strains. Shown are mutations identified only in a subset of the strains. The presence of a mutation is shown in color according to its allelic fraction. (b) The AF of an activating PIK3CA mutation (top) and an inactivating TP53 mutation (bottom) across strains. (c) Top: unsupervised hierarchical clustering of 27 MCF7 strains, based on their SNV cellular prevalence. Groups of strains expected to cluster together based on their evolutionary history are highlighted, as in Fig. 1. Bottom: a corresponding heatmap, showing the cellular prevalence of all mutations across the 27 MCF7 strains. Shown are mutations identified only in a subset of the strains. The presence of a mutation is shown in color according to its cellular prevalence. (d) The distribution of the maximal differences in cellular prevalence (CP) of non-silent mutations, across 27 MCF7 strains. The peak at maxΔCP=1 represents SNVs that are clonal in at least one strain but are nearly or completely absent in at least one other strain; the peak at maxΔCP=0 represents SNVs that are detected at similar prevalence across all 27 strains; and the peak at maxΔCP=~0.1 represents a group of SNVs present at CP=~0.1 only in strain M. (e) A table of the MCF7 single cell-derived clones included in this study, presenting their parental cell line, genetic manipulations and relationship to one another. (f) A heatmap presenting the allelic fractions of non-silent mutations in three WT single cell-derived MCF7 clones and its parental population. The presence of a mutation is shown in color according to its allelic fraction. (g) A heatmap presenting the allelic fractions of non-silent mutations in five genetically-manipulated single cell-derived MCF7 clones. For two of the clones, samples were passaged for a prolonged time and sequenced at multiple time points. The presence of a mutation is shown in color according to its allelic fraction. (h) Comparison of the karyotypic variation between parental and single cell-derived cell populations. Histograms present the distribution of chromosome numbers from the parental (light gray) and single cell-derived (dark gray) populations. P-values indicate the significance of the difference between the variations (rather than the means) of the populations from a one-tailed Levene’s test (n=50 metaphases per group). (i) Two representative karyotypes from each sample. Note that all single cell-derived clones are karyotipically heterogeneous. Marker chromosomes are not shown. Arrows point to partially aberrant chromosomes. Images are representative of 50 metaphases counted per sample. (j) Two representative karyotypes from two cell populations of the same single cell-derived clone, separated by 6 months of culture propagation. Marker chromosomes are not shown. Arrows point to partially aberrant chromosomes. Images are representative of 50 metaphases counted per sample. (k) Comparison of the karyotypic variation between two cell populations of the same single cell-derived clone, separated by 6 months of culture propagation. Histograms present the distribution of chromosome numbers from the early (light gray) and late (dark gray) populations. 50 metaphases were counted per sample. P-value indicates the significance of the difference between the means of the populations from a two-tailed Wilcoxon rank-sum test.
Extended Data Figure 6:
Extended Data Figure 6:. Transcriptomic variation across 27 MCF7 strains and their single cell-derived clones.
(a) Comparison of the L1000-based MCF7 expression profiles to microarray-based expression profiles from CCLE. Histograms present the distributions of the Spearman correlations between the 27 MCF7 strains and either MCF7 (light purple), two MCF7 derivatives (dark purple and blue), other breast cancer cell lines (green) or non-breast cancer cell lines (gray). The comparison is based on the 978 “landmark” genes directly measured in L1000. (b) The number of differentially expressed genes (DEGs) identified in all possible pair-wise comparisons of MCF7 strains, using a two-fold change cutoff. LFC, log fold change; DEGs, differentially expressed genes. (c) The 10 top “hallmark” gene sets identified by GSEA to be significantly enriched among the 100 genes that are most differentially expressed across the MCF7 strains. The two gene sets related to estrogen response are highlighted in red. (d) Comparison of gene expression variation within and between strains. Histograms present the distributions of gene expression variation within replicates of the same strain (gray), between closely related strains (purple), and between all strains (green). The comparison is based on the 978 “landmark” genes directly measured in L1000. (e) Heatmap presenting the arm-level CNA profiles of 27 MCF7 strains. Gains are shown in red, losses in blue. (f) GSEA reveals down-regulation of the genes on chromosomes 10q, 17q and 21q in strains that have lost copies of these arms, and up-regulation of the genes on chromosomes 5q, 6p, 14q and 16p in strains that have gained copies of these arms. (g) GSEA reveals up-regulation of mTOR signaling (gene set: hallmark_MTORC1_Signaling) and of genes that are up-regulated when PTEN is knocked-down (gene set: PTEN_DN.v2_UP; bottom) in strains that have gained PIK3CA, down-regulation of the estrogen response signature (gene set: hallmark_Estrogen_Response_Late) in strains that have lost ESR1, cell cycle signature (gene set: KEGG_cell_cycle) in strains that have lost CDKN2A, and down-regulation of KRAS signaling (gene set: hallmark_KRAS_Signaling_DN) in strains that have lost MAP2K4. (h) GSEA reveals up-regulation of mTOR signaling (gene set: hallmark_MTORC1_Signaling) in strains with high prevalence of an activating PIK3CA mutation, up-regulation of genes that are up-regulated when PTEN is knocked-down (gene set: PTEN_DN.v1_UP) in strains that harbor an inactivating PTEN mutation, and down-regulation of genes that are down-regulated when TP53 is knocked-down (gene set: P53_DN.v1_DOWN) in strains with high cellular prevalence of an inactivating TP53 mutation. (i) GSEA reveals up-regulation of mTOR signaling (gene sets: MTOR_UP.N4.V1_UP and hallmark_MTORC1_Signaling) in strains that have both PTEN copy number loss and an inactivating PTEN mutation. (j) A tSNE plot of single-cell RNA sequencing (scRNA-seq) data from MCF7-AA cells treated with bortezomib (500nM) at different time points. Each dot represents a single cell, and cells are colored by time point. (k) Comparison of the proteasome gene expression signature across time points. (l) Comparison of the unfolded protein response gene expression signature across time points. (m) Comparison of two proliferation gene expression signatures, S (top) and G2M (bottom), across time points. (n) Comparison of the early (top) and late (bottom) response to estrogen gene expression signatures across time points. Red lines denote mean values. P-values indicate significance from a one-way ANOVA followed by a Games-Howell post hoc test. n= 1,726, 2,743, 1,851 and 1,235 cells for t0, t12, t48 and t96, respectively. (o) A tSNE plot of scRNA-seq data from a parental population and its single cell-derived clone at two time points. Each dot represents a single cell, and cells are colored by sample. (p) Comparison of the transcriptional heterogeneity between a parental MCF7 population and its single cell-derived clones. n=2,904, 2,990, 3,896 and 4,583 cells for parental, WT3, WT4 and WT5, respectively. (q) Comparison of the transcriptional heterogeneity between two cultures of the same single-cell clone, separated by 6 months of continuous passaging. n=4,295 and 4,116 cells, for clone9-May17 and clone9-Nov17, respectively. Box plots present the Euclidean distance between the cells in each cell population. Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5*IQR of lower or upper quartile. P-values indicate significance from a one-way ANOVA followed by a Games-Howell post hoc test. (r) The 10 top “hallmark” gene sets identified by GSEA to be significantly enriched among the top differentially expressed genes (DEGs) between the two cultures of clone MCF7_GREB1_9 (May ‘17 vs. Nov ‘17). The gene sets related to estrogen response are highlighted in red, and those related to proliferation are highlighted in green.
Extended Data Figure 7:
Extended Data Figure 7:. Extensive genetic and transcriptional variation across 23 strains of A549.
(a) Top: unsupervised hierarchical clustering of 23 A549 strains, based on their non-silent single nucleotide variant (SNV) profiles derived from deep targeted sequencing. Strains expected to cluster together based on their evolutionary history are highlighted in blue. Bottom: a corresponding heatmap, showing the mutation status of non-silent mutations across the 23 A549 strains. Shown are mutations identified only in a subset of the strains, which were detected in above 5% of the reads (allelic fraction>0.05). The presence of a mutation is shown in yellow, and its absence in gray. (b) The number of non-silent point-mutations shared by each number of A549 strains. (c) Top: unsupervised hierarchical clustering of 23 A549 strains, based on the allelic fractions of their non-silent SNVs. Bottom: a corresponding heatmap, showing the allelic fractions of non-silent mutations across the 23 A549 strains. Shown are mutations identified only in a subset of the strains. The presence of a mutation is shown in color according to its allelic fraction. (d) The allelic fractions of non-silent mutations in six selected genes across 23 A549 strains. Note the inactivating frameshift mutation in SMARCA4, one of the most frequently mutated genes in lung adenocarcinoma,64,65, which was detected at an allelic fraction of ~1 in 9 of the strains, but was not detected at all in the other 14 strains. (e) The number of gene-level copy number alterations (CNAs) shared by each number of MCF7 strains. Copy number gains are shown in red, and copy number losses in blue. (f) CNA variation in the copy number of CDKN2A. Copy number gains are shown in red, and copy number losses in blue. Thresholds for relative gains/losses were set at +/−0.1, respectively. (g) Unsupervised hierarchical clustering of 23 A549 strains, based on their global gene expression profiles. Strains expected to cluster together based on their evolutionary history are highlighted in blue. (h) A tSNE plot of L1000-based gene expression profiles from multiple samples of nine cancer cell lines. The asterisk denotes the 23 A549 strains profiled in the current study. (i) Comparison of the L1000-based A549 expression profiles to microarray-based expression profiles from CCLE. Histograms present the distributions of the Spearman correlations between the 23 A549 strains and either A549 (light blue), other non-small cell lung cancer cell lines (purple), other lung cancer cell lines (green) or non-lung cancer cell lines (gray). The comparison is based on the 978 “landmark” genes directly measured in L1000. (j) The number of differentially expressed genes (DEGs) identified in all possible pairwise comparisons of A549 strains, using a two-fold change cutoff. LFC, log fold change; DEGs, differentially expressed genes. (k) Arm-level gains are associated with significant up-regulation, and arm-level losses are associated with significant down-regulation, of genes transcribed from the aberrant arms. For example, GSEA reveals up-regulation of the genes on chromosome 2q in strains that have gained a copy of that arm (left), and down-regulation of the genes on chromosome 9q in strains that have lost a copy of that arm (right). (l) Gene-level CNAs are associated with significant dysregulation of the perturbed pathways. For example, GSEA reveals up-regulation of the genes that are up-regulated, and down-regulation of the genes that are down-regulated, when TP53 is knocked-down in strains with MDM2 high-level copy number gain; and up-regulation or down-regulation of the G2/M cell cycle checkpoint signature in strains with CDKN2A copy number loss or CCND1 copy number gain, respectively. (m) Point mutations are associated with significant dysregulation of the perturbed pathways. For example, GSEA reveals down-regulation of two PRC2-related expression signatures in strains with an inactivating SMARCA4. (n) The 10 top gene sets identified by GSEA to be significantly enriched among the 100 genes that are most differentially expressed across the A549 strains. The six gene sets related to KRAS signaling are highlighted in red.
Extended Data Figure 8:
Extended Data Figure 8:. Genetic variation across multiple strains of additional cancer and non-cancer cell lines
(a) A bar plot presenting the fraction of non-silent SNVs that are discordant between pairs of strains of the same cell line. Bars represent mean ± s.e.m. n, number of strain pairs compared. (b) Arm-level CNAs arise in RPE1 samples. Plots present CNAs detected by an e-karyotyping analysis of 26 RPE1 samples. Gains are shown in red, losses in blue. (c) Comparison of variability in non-silent SNVs between non-transformed, partially-transformed and fully-transformed MCF10A samples. Box plots present the fraction of discordant non-silent SNVs between pairs of samples within each category. Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5*IQR of lower or upper quartile; circles: all data points. one-tailed Wilcoxon rank-sum test, n=28, 112 and 14 strain pairs, for the non-transformed, partially-transformed and the fully-transformed groups, respectively. (d) Comparison of the Broad-Sanger allelic fraction correlations of cell lines derived from primary tumors and those derived from metastases. Bar, median; colored rectangle, 25th and 75th percentiles; width of the violin indicates frequency at that value. One-tailed Wilcoxon rank-sum test. (e) Top: comparison of the chromosomal instability (CIN70) gene expression signature score between CCLE lines derived from primary tumors and those derived from metastases. Botton: comparison of the weighted-genomic integrity index (wGII) between CCLE lines derived from primary tumors and those derived from metastases. Bar, median; colored rectangle, 25th and 75th percentiles; width of the violin indicates frequency at that value. One-tailed Wilcoxon rank-sum test. (f) Comparison of the Broad-Sanger allelic fraction correlations of microsatellite-stable cell lines (MSS) and microsatellite-unstable cell lines (MSI). Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5*IQR of lower or upper quartile; circles: all data points. One-tailed Wilcoxon rank-sum test. (g) Heatmaps presenting the allelic fractions of non-silent mutations in multiple strains of cancer cell lines. The presence of a mutation is shown in color according to its allelic fraction. (h) Heatmaps presenting the allelic fractions of non-silent mutations in multiple strains of the non-cancer cell lines HA1E and MCF10A. The presence of a mutation is shown in color according to its allelic fraction. Also shown is an unsupervised hierarchical clustering of the 15 MCF10A strains, which represent different degrees of cellular transformation, based on their non-silent mutation profiles.
Extended Data Figure 9:
Extended Data Figure 9:. Characterization of cell proliferation and morphology across 27 MCF7 strains.
(a) Growth response curves of 27 MCF7 strains, based on microscopy imaging. Mean ± s.d., n= 3 replicate wells per data point. (b) Mean ± s.d., n= 3 replicate wells per data point. (c) Variation in cellular radius across the 27 MCF7 strains. (d) Variation in form factor, a measure of circularity, across the 27 MCF7 strains. (e) Variation in nuclear radius across the 27 MCF7 strains. Data points represent mean values, and error bars represent standard deviations. (f) Microscopy imaging of the 27 MCF7 strains, showing the morphological differences between them. Scale, 300μM. Images are representative of 5 replicate wells per strain. (g) Unsupervised hierarchical clustering of 27 MCF7 strains, based on 1,784 morphological features. (h) The correlation between proliferation rate (shown as doubling time) and the number of non-silent protein coding mutations, across 18 naturally-occurring MCF7 strains (i.e., strains that have not undergone drug selection or genetic manipulation). Spearman’s rho value and p-value indicate the strength and significance of the correlation, respectively. (i) The correlation between proliferation rate (shown as doubling time) and the fraction of subclonal mutations, across 18 naturally-occurring MCF7 strains. Spearman’s rho value and p-value indicate the strength and significance of the correlation, respectively.
Extended Data Figure 10:
Extended Data Figure 10:. Characterization of drug response variation across 27 MCF7 strains.
(a) Unsupervised hierarchical clustering of 27 MCF7 strains, based on their response to all 321 compounds in the primary screen. Groups of strains expected to cluster together based on their evolutionary history are highlighted, as in Fig. 1. (b) Pie chart presenting the classification of the screened compounds based on their differential activity. The response to each active compound was defined as “consistent” if viability change was <−50% for all strains, “variable” if viability change was <−50% for some strains and >−20% for other strains, or “intermediate” if viability change was in between these values. Classification was performed using a two strain threshold. (c) Pie charts Pie charts as in (b), only excluding strains Q and M that were generally more drug resistant. Classification was performed using a one-strain or a two strain threshold (left and right charts, respectively). (d) Pie charts as in (b), only using an activity threshold of viability change <−80%. Classification was performed using a one-strain threshold, either including all strains (left) or excluding strains Q and M (right). (e) The number of gene-level copy number alterations (CNAs) shared by each number of MCF7 strains. Copy number gains are shown in red, and copy number losses in blue. (f) The number of non-silent point-mutations shared by each number of MCF7 strains. The 10 naturally-occurring CMap strains were averaged and considered as a single sample. (g) The correlation between proliferation rate (shown as doubling time) and the number of non-silent protein coding mutations, across naturally-occurring MCF7 strains (n=10). Spearman’s rho value and p-value indicate the strength and significance of the correlation, respectively. The 10 naturally-occurring CMap strains were averaged and considered as a single sample. (h) The correlation between proliferation rate (shown as doubling time) and the fraction of subclonal mutations, across naturally-occurring MCF7 strains (n=10). Spearman’s rho value and p-value indicate the strength and significance of the correlation, respectively. The 10 naturally-occurring CMap strains were averaged and considered as a single sample. (i) The number of differentially expressed genes (DEGs) identified in all possible pair-wise comparisons of MCF7 strains, using a two-fold change cutoff. LFC, log fold change; DEGs, differentially expressed genes. The 10 naturally-occurring CMap strains were averaged and considered as a single sample. (j) Pie charts presenting the classification of the screened compounds based on their differential activity. The response to each active compound was defined as “consistent” if viability change was <−50% for all strains, “variable” if viability change was <−50% for some strains and >−20% for other strains, or “intermediate” if viability change was in between these values. Classification was performed using a one-strain or a two strain resistance threshold (left and right charts, respectively). The 10 naturally-occurring CMap strains were averaged and considered as a single sample. (k) Shown are the dose response curves for ten compounds. For each compound, eight concentrations were tested in each strain. Two sensitive strains and two insensitive strains are plotted. Each data point represents the mean of two replicates. Nutlin-3, a compound that had no toxicity against any of the strains in the primary screen, was included as negative control. Romidepsin, a compound that killed all strains very efficiently in the primary screen was included as positive control and turned out to be differentially active at lower concentrations. (l) The Pearson’s correlation of the two compound screen replicates across the MCF7 strains. (m) Strains more sensitive to proteasome inhibitors exhibit higher proteasome activity. The chymotrypsin-like activity of the proteasome was measured in three sensitive and three insensitive strains. Mean ± s.d., one-tailed t-test, n=4 replicate wells. (n) Western blots presenting the relative protein expression levels of the proteasome 19S complex members PSMC2 and PSMD1 in three sensitive and three insensitive strains. The expression of α-tubulin was used for normalization. The experiment was repeated once, with n=3 strains per group. For gel source data, see Supplementary Fig. 1. (o) Quantification of the relative expression of PSMC2 and PSMD1. Bars represent mean values, and error bars represent standard deviations. Mean ± s.d., one-tailed t-test, n=3 strains per group. (p) Up-regulation of the KEGG cell cycle signature in strains sensitive to the cell cycle inhibitor CDK/CRK inhibitor (n= 3), compared to insensitive strains (n=12). (q) Up-regulation of mTOR signaling in strains sensitive to the PI3K inhibitor PP-121 (n=11), compared to insensitive strains (n=5). (r) Down-regulation of the genes that are down-regulated when ALK is knocked-down in strains sensitive to the ALK inhibitor TAE-684 (n=4), compared to insensitive strains (n=15). (s) Up-regulation of IL6-JAK-STAT3 signaling in strains sensitive to the STAT inhibitor Nifuroxazide (n=9), compared to insensitive strains (n=6). (t) Up-regulation of the genes that are upregulated when AKT is over-expressed in strains sensitive to the AKT inhibitor Triciribine (n=2), compared to insensitive strains (n=8). (u) Up-regulation of hypoxia signaling in strains sensitive to the HSP inhibitor 17-AAG (n=3), compared to insensitive strains (n=15). (v) Down-regulation of xenobiotic metabolism signatures in strains M and Q (n=2), which exhibited an increased resistance to most compounds, compared to the other strains (n=25). (w) Up-regulation of the early and late estrogen response signatures, in strains most sensitive to the ER inhibitor tamoxifen (n=5), compared to the least sensitive strains (n=5). (x) Sensitivity to estrogen depletion and to tamoxifen is associated with the copy number status of ESR1. Heatmaps represent the relative viability in estrogen-depleted medium (top) and to tamoxifen (at 16.6μM; bottom).
Extended Data Figure 11:
Extended Data Figure 11:. Comparison of genetic-, transcriptomic- and drug response-based clustering trees, genomic distances and CRISPR dependencies.
(a) Comparison of clustering trees using the Fowlkes-Mallows approach. The dendrograms based on SNVs, gene-level CNAs, arm-level CNAs, gene expression profiles and drug response patterns were all compared to each other. The Fowlkes-Mallows index (Bk) was computed for all the potential numbers of clusters (k values) ranging from 5 to 26. The red line represents the observed Bk values, whereas the dashed gray line represents the 95% upper quantile of the randomized distribution. The maximum Bk value represents the degree of similarity between the compared pair of dendrograms. (b) Force-directed layout of screened lines using a similarity matrix determined by the probability of cell lines co-clustering in dependency space. Cell lines (nodes) are colored by lineage. (c) Top: the overlap of dependencies in KPL1 and MCF7 using corrected CERES scores, with genes showing depletion effects in all cell lines (i.e., pan-essential genes) excluded. The threshold for dependency was set as a CERES score <−0.5. Bottom: overlap in dependency with genes of indeterminate dependency status (CERES scores between −0.4 and −0.6) in either cell line excluded. (d) A two-sample Gene Set enrichment analysis (GSEA) of MCF7 and KPL1 against the estrogen response gene sets (n=1 sample per group). Expression of the estrogen signaling pathway is strongly enriched in MCF7. (e) The correlation between ESR1 dependency values and the single-sample GSEA enrichment scores of the estrogen response hallmark gene sets (n=27 cell lines). The difference in estrogen response signaling between MCF7 and KPL1 predicts their differing levels of dependency on ESR1. (f) The correlation between GATA3 dependency and GATA3 protein levels (z-scored RPPA values; n=27 cell lines). The difference in GATA3 protein levels between MCF7 and KPL1 predicts their differing levels of dependency on GATA3. Spearman’s rho values and p-values indicate the strength and significance of the correlations, respectively. (g) Top: comparison of proliferation rates between a parental MCF7 population and its single cell-derived clones. Bottom: comparison of proliferation rates between two cultures of the same single-cell clone, separated by 6 months of continuous passaging. Box plots present the population doubling time of each sample. Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5*IQR of lower or upper quartile; circles: all data points. Two-tailed t-test; n, replicate wells. (h) Top: comparison of the sensitivity to estrogen depletion between a parental MCF7 population and its single cell-derived clones. Bottom: comparison of the sensitivity to estrogen depletion between two cultures of the same single-cell clone, separated by 6 months of continuous passaging. Box plots present the relative growth rate in estrogen-depleted medium. Bar, median; box, 25th and 75th percentiles; whiskers, data within 1.5*IQR of lower or upper quartile; circles: all data points. Two-tailed t-test; n, replicate wells. (i) The correlation between sensitivity to tamoxifen (relative viability at 20μM) and the sensitivity to estrogen depletion (relative growth rate), across the parental MCF7 populations and their single-cell clones (n=7). Spearman’s rho value and p-value indicate the strength and significance of the correlation, respectively. (j) Correlation plots between various measures to estimate cell line strains (n=351 strain pairs). CNA distances (based on low-pass whole-genome sequencing or targeted sequencing), SNV distances, gene expression distances and drug response distances were compared to each other. CNV distance based on LP-WGS was determined by the fraction of the genome affected by discordant CNV calls. CNV and SNV distances based on targeted sequencing were determined by Jaccard indices. Gene expression and drug response distances were determined by Euclidean distances. Spearman’s rho value and p-value indicate the strength and significance of the correlation, respectively.
Figure 1:
Figure 1:. Extensive genetic variation across 27 strains of the cancer cell line MCF7.
(a) The distribution of pairwise allelic fraction (AF) correlations between the Broad and the Sanger cell lines (n=106), for germline (black) and somatic (gray) SNVs. One-tailed paired Wilcoxon rank-sum test. (b) The number of gene-level copy number alterations (CNAs) shared by each number of MCF7 strains. Gains, red; losses, blue. (c) CNAs of two genes, PTEN and ESR1. (d) The number of non-silent point-mutations shared by each number of MCF7 strains. (e) The AFs of inactivating mutations in the tumor suppressor PTEN. (f) Top: unsupervised hierarchical clustering of 27 MCF7 strains, based on CNA profiles derived from low-pass whole-genome sequencing. Orange, strain M subjected to in vivo passaging and drug treatment; blue, 11 Connectivity Map strains cultured in the same lab without extensive passaging; green, strains D and E cultured in the same lab and separated by few passages; purple, strains I and K separated by Cas9 introduction. Bottom: a corresponding heatmap, showing the CNA landscapes of the strains, relative to the median CNA landscape. Gains, red; losses, blue. (g) Top: unsupervised hierarchical clustering of 27 MCF7 strains, based on their non-silent SNV profiles derived from deep targeted sequencing. Colors, as in (f). Bottom: a corresponding heatmap, showing the mutation status of non-silent mutations across strains. Shown are mutations identified in a subset of the strains at AF>0.05. Mutation present, yellow; mutation absent, gray. (h) Comparison of the magnitude of CNAs observed following multiple freeze-thaw cycles (n=9; R/A/S vs. W/X/Y), extensive passaging (n=5; D vs. L vs. AA, B vs. I/P), and genetic manipulations (n=4; AA vs. O, B vs. C, I vs. J/K). Bar, median; box, 25th and 75th percentiles; whiskers, 1.5*IQR of lower and upper quartile; circles: data points. Two-tailed Wilcoxon rank-sum test.
Figure 2:
Figure 2:. Genetic heterogeneity and clonal dynamics underlying genetic variation.
(a) Top: unsupervised hierarchical clustering of 27 MCF7 strains, based on the allelic fractions of their non-silent SNVs. Colors, as in Fig. 1. Bottom: a corresponding heatmap, showing the AFs of non-silent mutations present in a subset of the strains. (b) The distribution of AFs of non-silent mutations across strains. (c) The cellular prevalence of mutation clusters across MCF7 strains, as identified by a PyClone analysis. Shown are mutation clusters with differential abundance (ΔCP>0.15), the clonal cluster (cluster #6; CP≈1 in all clones) and a cluster unique to MCF7-M (cluster #12). n = mutations per cluster, mean ± s.e.m. (d) Top: unsupervised hierarchical clustering of 27 samples of DNA-barcoded MCF7-D, based on barcode representation. Dendrogram branches are colored by culture condition. Bottom: a corresponding heatmap of barcode representation. ETP, early time point; RPMI, RPMI-1640 medium; DMEM, DMEM medium; DMSO, RPMI-1640 with 0.05% DMSO; ESTDEP, estrogen-depleted RPMI-1640 medium; BORT, bortezomib (500nM; 48hr exposure) followed by RPMI-1640.
Figure 3:
Figure 3:. Extensive transcriptomic variation associated with genetic variation.
(a) A tSNE plot of gene expression profiles from multiple samples of nine cancer cell lines. *, the 27 MCF7 strains profiled in the current study (also encircled). (b) Unsupervised hierarchical clustering of the strains, based on their global gene expression profiles. Colors, as in Fig. 1. (c) Schematics presenting the analysis performed to evaluate the association between genetic variation and transcriptional programs. (d) Arm-level gains and losses are associated with significant up- and down-regulation of genes transcribed from the aberrant arms. (e) Gene-level CNAs are associated with significant dysregulation of the perturbed pathways. For example, up-regulation of mTOR signaling in strains that have lost a copy of PTEN. (f) Point mutations are associated with significant dysregulation of the perturbed pathways. For example, up-regulation of mTOR signaling in strains with an inactivating PTEN mutation. (g) Copy number loss of ESR1 is associated with significant down-regulation of the estrogen response. (h) A tSNE plot of single-cell RNA sequencing data from a parental population and three of its single cell-derived clones.
Figure 4:
Figure 4:. Drug response consequences of genetic and transcriptomic variation.
(a) Top: unsupervised hierarchical clustering of 27 MCF7 strains, based on their response to the 55 active compounds in the primary screen. Colors, as in Fig. 1. Bottom: a corresponding heatmap, showing the % of viability change for each compound across strains. Compounds colored based on their mechanism-of-action. (b) Classification of the screened compounds based on their differential activity. Consistent, viability change <−50% for all strains; variable, viability change <−50% for some strains and >−20% for other strains; intermediate, viability change in between these values. (c) Comparison of the similarity in drug response patterns between compounds that share the same mechanism-of-action (n=39) and compounds that work through different mechanisms (n=1,439). One-tailed Wilcoxon rank-sum test. (d) Highly similar differential drug response patterns for three proteasome inhibitors: bortezomib, MG-132 and carfilzomib. Each data point represents the mean of two replicates. The number of data points per strain is mentioned within parentheses. The response pattern with no drug (DMSO control) is presented for comparison. (e) Schematics presenting the analysis performed to evaluate the association between drug response and transcriptional variation. (f) Up-regulation of the KEGG cell cycle signature in strains sensitive to the cell cycle inhibitor alsterpaullone (8 sensitive vs. 15 resistant strains). (g) Up-regulation of mTOR signaling in strains sensitive to the PI3K inhibitor BKM-120 (8 sensitive vs. 5 resistant strains). (h) Up-regulation of the genes that are up-regulated when PTEN is knocked-down in strains sensitive to AKT inhibitor IV (6 sensitive vs. 9 resistant strains). (i) Strains with PTEN mutation (n=12) respond more strongly to AKT inhibitor IV than strains without the mutation (n=14). (j) Strains with ESR1 copy number loss (n=5) grow better in estrogen-depleted medium than strains without ESR1 loss (n=21). (k) Comparison of GSEA-based MoA identification between the MCF7 cohort and the CTD2 (n=15) and GDSC (n=19) cohorts, across matched drugs. Two-tailed Fisher’s exact test. For all box plots: bar, median; box, 25th and 75th percentiles; whiskers, 1.5*IQR of lower and upper quartile; circles: data points.

Comment in

References

    1. Sharma SV, Haber DA & Settleman J Cell line-based platforms to evaluate the therapeutic efficacy of candidate anticancer agents. Nat Rev Cancer 10, 241–53 (2010). - PubMed
    1. Freedman LP, Cockburn IM & Simcoe TS The Economics of Reproducibility in Preclinical Research. PLoS Biol 13, e1002165 (2015). - PMC - PubMed
    1. Prinz F, Schlange T & Asadullah K Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov 10, 712 (2011). - PubMed
    1. Barretina J et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–7 (2012). - PMC - PubMed
    1. Garnett MJ et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–5 (2012). - PMC - PubMed

Additional Methods References

    1. Bray MA, Fraser AN, Hasaka TP & Carpenter AE Workflow and metrics for image quality control in large-scale high-content screens. J Biomol Screen 17, 266–74 (2012). - PMC - PubMed
    1. Dao D et al. CellProfiler Analyst: interactive data exploration, analysis and classification of large biological image sets. Bioinformatics 32, 3210–3212 (2016). - PMC - PubMed
    1. Adalsteinsson VA et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat Commun 8, 1324 (2017). - PMC - PubMed
    1. Ha G et al. Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer. Genome Res 22, 1995–2007 (2012). - PMC - PubMed
    1. Sholl LM et al. Institutional implementation of clinical tumor profiling on an unselected cancer population. JCI Insight 1, e87062 (2016). - PMC - PubMed

Publication types