Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 26;6(1):101276.
doi: 10.1016/j.crmeth.2025.101276. Epub 2026 Jan 14.

Epigenomic, transcriptomic, and proteomic characterization of breast cancer cell line reference samples

Affiliations

Epigenomic, transcriptomic, and proteomic characterization of breast cancer cell line reference samples

Chirag Nepal et al. Cell Rep Methods. .

Abstract

Next-generation sequencing requires accuracy, reproducibility, and standardized reference materials. The Sequencing Quality Control (SEQC-2) multicenter studies on paired breast cancer and B cell lines generated extensive genomic datasets, but integrated epigenomic and proteomic references remain limited. Here, we performed Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq), Methyl-seq, RNA sequencing (RNA-seq), and proteomic profiling to establish comprehensive multi-omics reference materials. We identified >7,700 protein groups, with 95% of genes encoding a single peptide isoform. Protein expression from CpG island (CGI)-overlapping transcripts was higher than non-CGI transcripts in both cell lines. Certain SNVs were incorporated into mutated peptides. Chromatin accessibility was regulated by CG density: CG-rich regions showed lower methylation, greater accessibility, and higher gene/protein expression, whereas CG-poor regions exhibited higher methylation, reduced accessibility, and cell line-specific expression patterns. These datasets provide well-defined genomic, epigenomic, transcriptomic, and proteomic characterizations that can serve as benchmarks for validating omics assays and bioinformatics methods, offering a valuable community resource.

Keywords: CP: genetics; epigenomics; proteomics; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests A.F. is an employee of Takara Bio USA, Inc., and W.J. is an employee of IQVIA Laboratories Genomics. The views presented in this article do not necessarily reflect the current or future opinion or policy of the US Food and Drug Administration. Any mention of commercial products is for clarification and not intended as an endorsement.

Figures

None
Graphical abstract
Figure 1
Figure 1
Multi-omics experimental design and bioinformatics workflow for profiling epigenetic, transcriptional, and proteomic features of HCC1395 and HCC1395BL cell lines HCC1395 and HCC1395BL represent a breast cancer cell line and “normal” (i.e., immortalized) B lymphocyte cell line, respectively. ATAC-seq, RRBS, RNA-seq, and proteomic analyses were performed on three replicates of both cell lines. Each data type was analyzed using standard analytical pipelines. ATAC-seq and DNA methylation were analyzed to understand the correlation between the two layers of epigenetics in relation to genomic CG density. Gene expression was correlated with epigenetic marks and protein expression. See also Table S1.
Figure 2
Figure 2
Open chromatin landscape of HCC1395 and HCC1395BL cell lines reveal distinct patterns of ATAC-seq accessibility (A) A UCSC browser showing 3 replicates of ATAC-seq coverage of HCC1395 (red) and HCC1395BL (blue) cell lines. ATAC peaks region are shown as red and blue rectangular bars. (B) Bar plot shows the number of ATAC peaks from three replicates. Peaks identified in all three replicates are defined as consensus peaks. (C) Venn diagram shows the overlap of ATAC peaks from the HCC1395 and HCC1395 cell lines. (D) Distribution of ATAC peaks with respect to promoter, intragenic, and intergenic regions. (E) Distribution of ATAC peaks with respect to CGI. (F) Mean counts of ATAC peaks overlapping promoter, intragenic, and intergenic regions. Peaks from each region are further classified based on overlap with CGIs. p values were calculated using t tests by comparing CGI and non-CGI groups across promoter, intragenic, and intergenic regions. (G) Boxplots show the distribution of ATAC peak width across promoter, intragenic, and intergenic regions, which were classified into two groups based on overlap with CGIs. See also Tables S2 and S3.
Figure 3
Figure 3
DNA methylation profiling reveals global and locus-specific differences between HCC1395 and HCC1395BL cell lines across CGIs and gene bodies (A) Distribution of beta value of CG methylation. Beta values ranged between 0 and 1 and were divided into 10 bins. (B) Mean methylation levels of all expressed genes. Gene length is scaled between TSSs and TESs. (C) A UCSC browser view showing the gene SKI and CGIs overlapping promoter and intragenic regions. Mean methylation levels of HCC1395 and HCC1395BL are represented as beta values in the range of 0–1. CGIs overlapping promoters had low methylation levels, while four intragenic CGIs had high methylation levels. (D) Mean methylation levels of promoter CGIs (left) and intragenic CGIs (right) and 2-KB flanking CGIs. The y axis represents the mean methylation level (beta value). (E) ATAC signals in promoter CGIs (left) and 2-KB flanking regions. ATAC signals in CGIs inside gene body (intragenic CGIs) (right) and 2-kb flanking regions. The y axis represents the mean ATAC signal measured in RPKM. See also Tables S4 and S5.
Figure 4
Figure 4
Association of CG density, open chromatin, and CG methylation uncovers coordinated epigenetic regulation (A) ATAC peaks overlapping CGIs (left) and non-overlapping CGIs (right) in the HCC1395 cell line. ATAC peaks are grouped into 4 bins based on decreasing CG density. (B) Same as in (A), but for the HCC1395 cell line. (C and D) CG methylation levels of ATAC peaks overlapping CGIs (left) and non-overlapping CGIs (right) in HCC1395 (C) and HCC1395BL (D). ATAC peaks are grouped into 4 bins based on decreasing CG density. The y axis indicates methylation level (beta values), which ranged between 0 and 1. (E) A UCSC genome browser view of the EGFR promoter region. CGI and ATAC peaks are shown as horizontal bars. ATAC-seq coverage and DNA methylation beta values across three replicates of HCC1395 and HCC1395BL are shown as coverage tracks. The HCC1395 cell line had high ATAC-seq signals in the ATAC peak region and low methylation levels. The HCC1395BL cell line had low ATAC-seq signals in the ATAC peak region and high methylation levels. See also Tables S2, S3, S4, and S5.
Figure 5
Figure 5
Transcriptomic analysis of the HCC1395 and HCC1395BL reveals differences in gene expression and RNA editing (A) Number of reference and alternative transcripts expressed in HCC1395 (left) and HCC1395BL (right). Alternative promoters were depleted for CGIs. (B) Violin plots show mean expression levels across three replicates for HCC1395 (left) and HCC1395BL (right). p values were computed using two-sided t tests. (C) The coverage of ATAC-seq reads around gene TSSs. Genes were classified into four bins based on expression levels separately for CGI promoters and non-CGI promoters. (D) Same as in (C), for the HCC1395BL cell line. (E and F) UMAP clusters of HCC1395 and HCC1395BL. (G and H) Average expression levels of CGI promoters’ and non-CGI promoters’ genes by excluding genes in single cells with zero counts. The mean expression level of CGI promoters was higher than for non-CGI promoters. (I and J) Frequency of A-to-I edits across three replicates in the HCC1395 and HCC1395BL cell lines. See also Tables S6, S7, and S8.
Figure 6
Figure 6
Projection of HCC1395 and HCC1395BL ATAC peaks across 23 tumor tissues from TCGA reveals ubiquitous and tissue-specific accessible regions (A) Overlap of HCC1395 cell line ATAC peaks with the ATAC peaks from 23 different tumor tissues across TCGA. Most CGI-containing ATAC peaks found in HCC1395 were also detected in other tissues (left); for ATAC peaks without CGIs, this was not the case (right). x axis indicates the percentage of overlap of HCC1395 ATAC peaks with ATAC peaks from different tumor tissues. (B) Same as in (A), but for the HCC1395BL cell line. (C) Schematic representation to show the relationship between genomic CG density and epigenetic features (ATAC peaks and DNA methylation) and gene expression.
Figure 7
Figure 7
Proteomic profiling links alternative splicing to isoform-specific protein expression and integrates protein abundance with genomic and epigenomic features (A) A UCSC genome browser view (reverse strand) showing peptides identified by MS/MS spectra that mapped to two UniProt isoforms of PFN2 (P35080-1 and P35080-2). These isoforms share exons 1 and 2 but have different third exons resulting from alternative splicing. Peptides common to both isoforms are colored magenta. Peptides unique to P35080-1 and P35080-2 are colored blue and red, respectively. Peptides overlapping in each transcript are summed up to quantify protein expression levels. (B) Bar plots show individual protein expression levels of each spliced isoform across three replicates for both HCC1395 and HCC1395BL cell lines. (C) Number of protein groups detected per gene. The majority (98.5%) of genes have only one detected protein group. (D and E) Correlation of gene expression and protein expression levels. Gene expression and protein levels are positively correlated in HCC1395 (R = 0.477) and HCC1395BL (R = 0.474). (F and G) Average protein expression levels of genes grouped based on overlap with CGI in their promoters. Genes overlapping CGI have significantly higher protein expression levels in HCC1395 (F) and HCC1395BL (G). p values were calculated using a t test. See also Tables S9 and S10.

Update of

References

    1. Fang L.T., Zhu B., Zhao Y., Chen W., Yang Z., Kerrigan L., Langenbach K., de Mars M., Lu C., Idler K., et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat. Biotechnol. 2021;39:1151–1160. doi: 10.1038/s41587-021-00993-6. - DOI - PMC - PubMed
    1. Zhao Y., Fang L.T., Shen T.W., Choudhari S., Talsania K., Chen X., Shetty J., Kriga Y., Tran B., Zhu B., et al. Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study. Sci. Data. 2021;8:296. doi: 10.1038/s41597-021-01077-5. - DOI - PMC - PubMed
    1. Xiao W., Ren L., Chen Z., Fang L.T., Zhao Y., Lack J., Guan M., Zhu B., Jaeger E., Kerrigan L., et al. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat. Biotechnol. 2021;39:1141–1150. doi: 10.1038/s41587-021-00994-5. - DOI - PMC - PubMed
    1. Talsania K., Shen T.W., Chen X., Jaeger E., Li Z., Chen Z., Chen W., Tran B., Kusko R., Wang L., et al. Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies. Genome Biol. 2022;23:255. doi: 10.1186/s13059-022-02816-6. - DOI - PMC - PubMed
    1. Xiao C., Chen Z., Chen W., Padilla C., Colgan M., Wu W., Fang L.T., Liu T., Yang Y., Schneider V., et al. Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples. Genome Biol. 2022;23:237. doi: 10.1186/s13059-022-02803-x. - DOI - PMC - PubMed

LinkOut - more resources