. 2021 May 7;20(5):2983-3001.

doi: 10.1021/acs.jproteome.1c00243. Epub 2021 Apr 15.

Proteogenomic Workflow Reveals Molecular Phenotypes Related to Breast Cancer Mammographic Appearance

Tommaso De Marchi¹, Paul Theodor Pyl¹, Martin Sjöström¹, Stina Klasson², Hanna Sartor³, Lena Tran¹, Gyula Pekar⁴, Johan Malmström⁵, Lars Malmström^{6

7}, Emma Niméus^{1

8}

Affiliations

¹ Division of Surgery, Oncology, and Pathology, Department of Clinical Sciences, Lund University, Solvegatan 19, Lund SE-223 62, Sweden.
² Department Plastic and Reconstructive Surgery, Skåne University Hospital, Inga Marie Nilssons gata 47, Malmö SE-20502, Sweden.
³ Division of Diagnostic Radiology, Department of Translational Medicine, Skåne University Hospital, Entrégatan 7, Lund SE-22185, Sweden.
⁴ Division of Oncology and Pathology, Department of Clinical Sciences, Lund University, Skåne University Hospital, Lund SE-22185, Sweden.
⁵ Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Klinikgatan 32, Lund SE-22184, Sweden.
⁶ S3IT, University of Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland.
⁷ Institute for Computational Science, University of Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland.
⁸ Department of Surgery, Skåne University Hospital, Lund 222 42, Sweden.

PMID: 33855848
PMCID: PMC8155562
DOI: 10.1021/acs.jproteome.1c00243

Proteogenomic Workflow Reveals Molecular Phenotypes Related to Breast Cancer Mammographic Appearance

Tommaso De Marchi et al. J Proteome Res. 2021.

. 2021 May 7;20(5):2983-3001.

doi: 10.1021/acs.jproteome.1c00243. Epub 2021 Apr 15.

Authors

Tommaso De Marchi¹, Paul Theodor Pyl¹, Martin Sjöström¹, Stina Klasson², Hanna Sartor³, Lena Tran¹, Gyula Pekar⁴, Johan Malmström⁵, Lars Malmström^{6

7}, Emma Niméus^{1

8}

Affiliations

¹ Division of Surgery, Oncology, and Pathology, Department of Clinical Sciences, Lund University, Solvegatan 19, Lund SE-223 62, Sweden.
² Department Plastic and Reconstructive Surgery, Skåne University Hospital, Inga Marie Nilssons gata 47, Malmö SE-20502, Sweden.
³ Division of Diagnostic Radiology, Department of Translational Medicine, Skåne University Hospital, Entrégatan 7, Lund SE-22185, Sweden.
⁴ Division of Oncology and Pathology, Department of Clinical Sciences, Lund University, Skåne University Hospital, Lund SE-22185, Sweden.
⁵ Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Klinikgatan 32, Lund SE-22184, Sweden.
⁶ S3IT, University of Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland.
⁷ Institute for Computational Science, University of Zurich, Winterthurerstrasse 190, Zurich CH-8057, Switzerland.
⁸ Department of Surgery, Skåne University Hospital, Lund 222 42, Sweden.

PMID: 33855848
PMCID: PMC8155562
DOI: 10.1021/acs.jproteome.1c00243

Abstract

Proteogenomic approaches have enabled the generat̲ion of novel information levels when compared to single omics studies although burdened by extensive experimental efforts. Here, we improved a data-independent acquisition mass spectrometry proteogenomic workflow to reveal distinct molecular features related to mammographic appearances in breast cancer. Our results reveal splicing processes detectable at the protein level and highlight quantitation and pathway complementarity between RNA and protein data. Furthermore, we confirm previously detected enrichments of molecular pathways associated with estrogen receptor-dependent activity and provide novel evidence of epithelial-to-mesenchymal activity in mammography-detected spiculated tumors. Several transcript-protein pairs displayed radically different abundances depending on the overall clinical properties of the tumor. These results demonstrate that there are differentially regulated protein networks in clinically relevant tumor subgroups, which in turn alter both cancer biology and the abundance of biomarker candidates and drug targets.

Keywords: breast cancer; data-independent acquisition; proteogenomics; proteomics; transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1**
Experimental workflow of this study. A total of 21 samples derived from a larger cohort (set 1, N = 172, see Experimental Procedures) and a second set of 24 tumors from a larger study (set 2, N = 109, see Experimental Procedures) were employed (A). Panel (B) shows examples of nonspiculated and spiculated tumor masses. Panel (C) displays the overlap between the molecular (ER status) and appearance features evaluated in this study, for which no association was found (set 1: Fisher exact p-value = 0.665, set 2: Fisher exact p-value = 0.283, (D)). Tumor specimens were processed as whole tissue lysates (WTL, MS-only analysis) and ALLPREP flow-throughs (FT, RNA-seq and MS analyses). Panel (E) displays the experimental workflow of our RNA and MS (DDA and DIA) analyses: tumor tissues were cut into slices and processed by ALLPREP. RNA and protein fractions were extracted and processed from ALLPREP sample preparation for downstream RNA-sequencing and DDA/DIA MS, respectively. Tissue slices were prepared only for downstream MS (DDA/DIA). Samples for DDA were fractionated using strong anion exchange columns (SAX, six fractions) to enable higher proteome coverage. DDA data (i) was submitted to with MaxQuant processing to derive protein abundances and (ii) to the MakeGTL workflow to generate a spectral library for downstream DIA search. RNA-seq data was processed using the standard DESeq2 workflow (see Experimental Procedures). Abbreviations: DDA: data-dependent acquisition, DIA: data-independent acquisition, ER: estrogen receptor, FDR: false discovery rate, FT: flow-through, MS: mass spectrometry, RT: retention time, WTL: whole tissue lysate.

**Figure 2**
Overall comparison between transcriptomic and proteomic data layers. Panel (A) displays the dynamic range (presented as relative abundance over total signal) of transcript and protein intensities of matching identifications in our RNA (green), DDA (red), and DIA (blue) MS data (examples of transcript–protein pairs displaying similar abundances across data layers are labeled). Distributions of Spearman correlations between matching transcript and protein (DDA: top, DIA: bottom) abundances are displayed in panel (B) (gray: nonsignificant, light blue: significant), while examples of consistent positive and negative correlation between protein levels (DDA and DIA) and RNA abundance are depicted in panel (C). Panels (D) and (E) display the distribution of transcript–protein correlations for significant (q-value < 0.15, see Experimental Procedures for details) GOBP pathways out of our DDA and DIA MS analyses, respectively. Color gradient is representative of the low (pink) and high (dark red) median transcript–protein correlation for each GOBP term. Acronyms: DDA: data-dependent acquisition, DIA: data-independent acquisition, ER: estrogen receptor, GOBP: gene ontology biological process.

**Figure 3**
Comparison between transcriptomic and proteomic data in the context of the estrogen receptor and appearance statuses. Panels (A) and (B) display all transcript–protein pairs scaled Log2Ratios for the ER status (A) and appearance ((B); DDA: left, DIA: right). Significant differential expression at the RNA level is marked by full dots and in bigger size; concordance and discordance between RNA and protein layers are shown in green and purple, respectively). Most significant genes (top 5% quantile) are shown in labels. GSEA analyses were performed on all data layers (RNA, DDA, and DIA) for ER and spiculation statuses using the Hallmark database. Pathways are ranked based on the RNA-level enrichment score. Panel (C) displays the overlap of GSEA analyses for the ER status, while panel (D) shows the results of analysis of appearance features (i.e., spiculation vs no spiculation). Significant pathways in each data layer (RNA: green, DDA: red, DIA: blue) are marked in full color, while transparent ones did not pass the false discovery rate (FDR < 0.25) cutoff. Positive scores mark enrichment in ER-positive and spiculated tumors, respectively, while negative scores define enrichments in ER-negative and nonspiculated samples. Acronyms: DDA: data-dependent acquisition, DIA: data-independent acquisition, ER: estrogen receptor, FDR: false discovery rate, GSEA: gene set enrichment analysis.

**Figure 4**
Pathway-level comparison of transcript–protein pairs. The figure displays transcript–protein-wise comparison within significant pathways out of GSEA analyses for the ER status (estrogen response early, (A)) and appearance (epithelial mesenchymal transition, (B)). Left panels display Log2Ratios of each transcript/protein (ranked by RNA expression) between ER-positive/negative and spiculated/nonspiculated tumors, while center panels display the corresponding enrichment scores in each data layer (RNA: green, DDA: red, DIA: blue). Right panels show distribution of enrichment scores for core-enriched (red) and noncore-enriched (gray) transcript/proteins. Left and center plots background color denotes enrichment in ER-positive (blue) and ER-negative (red) groups and spiculated (orange) and nonspiculated (purple) tumor groups. Abbreviations: DDA: data-dependent acquisition, DIA: data-independent acquisition, ER: estrogen receptor, FDR: false discovery rate, GSEA: gene set enrichment analysis.

**Figure 5**
Evaluation of differential transcript usage and single amino acid variant detection at the proteomic level. We employed transcriptomic data information to search our DIA data for DTU (A–C) and SAAVs (D, E). For DTU analysis, we employed the BANDITs workflow to define transcript differential expression to then generate an isoform-aware spectral library with which to search our DIA MS data. Panel (A) displays detected DTU at the protein (DIA MS) level and their expression compared to transcript levels. Examples of transcript (left) and (when detected) their specific peptide (right) expression are shown in panel (B) (ER status) and (C) (appearance). t Test p-value is shown for box-plots (peptide level). For SAAV detection, nonsynonymous SNVs detected at the RNA level in breast tumors and healthy breast tissues derived from reconstruction surgery were employed to define a variant-specific library against which the DIA data was searched. Panel (D) shows in which samples (healthy breast tissue and cancer) each variant was detected (Numbers in brackets represent peptide charge). Abbreviations: DIA: data-independent acquisition, DTU: differential transcript usage, MS: mass spectrometry, SAAV: single amino acid variant, SNV: single nucleotide variant.

**Figure 6**
Protein cluster regulation dependent on the estrogen receptor status. Co-regulated protein clusters in ER-positive (left) and ER-negative (right) tumors (see Figure S15) were extracted from the DIA data, annotated with GOBP terms, condensed, and visualized in Cytoscape (A). Edge thickness and length relate to the cluster distance (Euclidean), the node color relates to the scaled mean intensity of all proteins in each cluster, and the node size depends on the number of proteins in each cluster. Panel (B) shows the correlation to mRNA of each protein per cluster for ER-positive and ER-negative tumors. Panel (C) displays differences in correlation to RNA between ER-positive and ER-negative (i.e., ER positive–ER negative) tumor groups within showcased co-regulation clusters for FDA drug targets. Abbreviations: DIA: data-independent acquisition, ER: estrogen receptor, FDA: Food and Drug Administration, GOBP: gene ontology biological process, MS: mass spectrometry.

See this image and copyright information in PMC

Cited by

Interpreting biologically informed neural networks for enhanced proteomic biomarker discovery and pathway analysis.
Hartman E, Scott AM, Karlsson C, Mohanty T, Vaara ST, Linder A, Malmström L, Malmström J. Hartman E, et al. Nat Commun. 2023 Sep 2;14(1):5359. doi: 10.1038/s41467-023-41146-4. Nat Commun. 2023. PMID: 37660105 Free PMC article.
Generalized precursor prediction boosts identification rates and accuracy in mass spectrometry based proteomics.
Scott AM, Karlsson C, Mohanty T, Hartman E, Vaara ST, Linder A, Malmström J, Malmström L. Scott AM, et al. Commun Biol. 2023 Jun 10;6(1):628. doi: 10.1038/s42003-023-04977-x. Commun Biol. 2023. PMID: 37301900 Free PMC article.
Integrated View of Baseline Protein Expression in Human Tissues Using Public Data Independent Acquisition Data Sets.
Prakash A, Collins A, Vilmovsky L, Fexova S, Jones AR, Vizcaino JA. Prakash A, et al. J Proteome Res. 2025 Feb 7;24(2):685-695. doi: 10.1021/acs.jproteome.4c00788. Epub 2025 Jan 7. J Proteome Res. 2025. PMID: 39764611 Free PMC article.
MammOnc-DB, an integrative breast cancer data analysis platform for target discovery.
Karthikeyan SK, Chandrashekar DS, Sahai S, Shrestha S, Aneja R, Singh R, Kleer CG, Kumar S, Qin ZS, Nakshatri H, Manne U, Creighton CJ, Varambally S. Karthikeyan SK, et al. NPJ Breast Cancer. 2025 Apr 18;11(1):35. doi: 10.1038/s41523-025-00750-x. NPJ Breast Cancer. 2025. PMID: 40251157 Free PMC article.
Identification of Novel Genes and Proteoforms in Angiostrongylus costaricensis through a Proteogenomic Approach.
da Silva EMG, Rebello KM, Choi YJ, Gregorio V, Paschoal AR, Mitreva M, McKerrow JH, Neves-Ferreira AGDC, Passetti F. da Silva EMG, et al. Pathogens. 2022 Oct 31;11(11):1273. doi: 10.3390/pathogens11111273. Pathogens. 2022. PMID: 36365024 Free PMC article.

See all "Cited by" articles

References

1. DeSantis C. E.; Ma J.; Gaudet M. M.; Newman L. A.; Miller K. D.; Goding Sauer A.; Jemal A.; Siegel R. L. Breast Cancer Statistics, 2019. CA. Cancer J. Clin. 2019, 438–451. 10.3322/caac.21583. - DOI - PubMed
1. Fachal L.; Aschard H.; Beesley J.; Barnes D. R.; Allen J.; Kar S.; Pooley K. A.; Dennis J.; Michailidou K.; Turman C.; Soucy P.; Lemaçon A.; Lush M.; Tyrer J. P.; Ghoussaini M.; Marjaneh M. M.; Jiang X.; Agata S.; Aittomäki K.; Alonso M. R.; Andrulis I. L.; Anton-Culver H.; Antonenkova N. N.; Arason A.; Arndt V.; Aronson K. J.; Arun B. K.; Auber B.; Auer P. L.; Azzollini J.; Balmaña J.; Barkardottir R. B.; Barrowdale D.; Beeghly-Fadiel A.; Benitez J.; Bermisheva M.; Białkowska K.; Blanco A. M.; Blomqvist C.; Blot W.; Bogdanova N. V.; Bojesen S. E.; Bolla M. K.; Bonanni B.; Borg A.; Bosse K.; Brauch H.; Brenner H.; Briceno I.; Brock I. W.; Brooks-Wilson A.; Brüning T.; Burwinkel B.; Buys S. S.; Cai Q.; Caldés T.; Caligo M. A.; Camp N. J.; Campbell I.; Canzian F.; Carroll J. S.; Carter B. D.; Castelao J. E.; Chiquette J.; Christiansen H.; Chung W. K.; Claes K. B. M.; Clarke C. L.; Collée J. M.; Cornelissen S.; Couch F. J.; Cox A.; Cross S. S.; Cybulski C.; Czene K.; Daly M. B.; de la Hoya M.; Devilee P.; Diez O.; Ding Y. C.; Dite G. S.; Domchek S. M.; Dörk T.; Dos-Santos-Silva I.; Droit A.; Dubois S.; Dumont M.; Duran M.; Durcan L.; Dwek M.; Eccles D. M.; Engel C.; Eriksson M.; Evans D. G.; Fasching P. A.; Fletcher O.; Floris G.; Flyger H.; Foretova L.; Foulkes W. D.; Friedman E.; Fritschi L.; Frost D.; Gabrielson M.; Gago-Dominguez M.; Gambino G.; Ganz P. A.; Gapstur S. M.; Garber J.; García-Sáenz J. A.; Gaudet M. M.; Georgoulias V.; Giles G. G.; Glendon G.; Godwin A. K.; Goldberg M. S.; Goldgar D. E.; González-Neira A.; Tibiletti M. G.; Greene M. H.; Grip M.; Gronwald J.; Grundy A.; Guénel P.; Hahnen E.; Haiman C. A.; Håkansson N.; Hall P.; Hamann U.; Harrington P. A.; Hartikainen J. M.; Hartman M.; He W.; Healey C. S.; Heemskerk-Gerritsen B. A. M.; Heyworth J.; Hillemanns P.; Hogervorst F. B. L.; Hollestelle A.; Hooning M. J.; Hopper J. L.; Howell A.; Huang G.; Hulick P. J.; Imyanitov E. N.; Isaacs C.; Iwasaki M.; Jager A.; Jakimovska M.; Jakubowska A.; James P. A.; Janavicius R.; Jankowitz R. C.; John E. M.; Johnson N.; Jones M. E.; Jukkola-Vuorinen A.; Jung A.; Kaaks R.; Kang D.; Kapoor P. M.; Karlan B. Y.; Keeman R.; Kerin M. J.; Khusnutdinova E.; Kiiski J. I.; Kirk J.; Kitahara C. M.; Ko Y.-D.; Konstantopoulou I.; Kosma V.-M.; Koutros S.; Kubelka-Sabit K.; Kwong A.; Kyriacou K.; Laitman Y.; Lambrechts D.; Lee E.; Leslie G.; Lester J.; Lesueur F.; Lindblom A.; Lo W.-Y.; Long J.; Lophatananon A.; Loud J. T.; Lubiński J.; MacInnis R. J.; Maishman T.; Makalic E.; Mannermaa A.; Manoochehri M.; Manoukian S.; Margolin S.; Martinez M. E.; Matsuo K.; Maurer T.; Mavroudis D.; Mayes R.; McGuffog L.; McLean C.; Mebirouk N.; Meindl A.; Miller A.; Miller N.; Montagna M.; Moreno F.; Muir K.; Mulligan A. M.; Muñoz-Garzon V. M.; Muranen T. A.; Narod S. A.; Nassir R.; Nathanson K. L.; Neuhausen S. L.; Nevanlinna H.; Neven P.; Nielsen F. C.; Nikitina-Zake L.; Norman A.; Offit K.; Olah E.; Olopade O. I.; Olsson H.; Orr N.; Osorio A.; Pankratz V. S.; Papp J.; Park S. K.; Park-Simon T.-W.; Parsons M. T.; Paul J.; Pedersen I. S.; Peissel B.; Peshkin B.; Peterlongo P.; Peto J.; Plaseska-Karanfilska D.; Prajzendanc K.; Prentice R.; Presneau N.; Prokofyeva D.; Pujana M. A.; Pylkäs K.; Radice P.; Ramus S. J.; Rantala J.; Rau-Murthy R.; Rennert G.; Risch H. A.; Robson M.; Romero A.; Rossing M.; Saloustros E.; Sánchez-Herrero E.; Sandler D. P.; Santamariña M.; Saunders C.; Sawyer E. J.; Scheuner M. T.; Schmidt D. F.; Schmutzler R. K.; Schneeweiss A.; Schoemaker M. J.; Schöttker B.; Schürmann P.; Scott C.; Scott R. J.; Senter L.; Seynaeve C. M.; Shah M.; Sharma P.; Shen C.-Y.; Shu X.-O.; Singer C. F.; Slavin T. P.; Smichkoska S.; Southey M. C.; Spinelli J. J.; Spurdle A. B.; Stone J.; Stoppa-Lyonnet D.; Sutter C.; Swerdlow A. J.; Tamimi R. M.; Tan Y. Y.; Tapper W. J.; Taylor J. A.; Teixeira M. R.; Tengström M.; Teo S. H.; Terry M. B.; Teulé A.; Thomassen M.; Thull D. L.; Tischkowitz M.; Toland A. E.; Tollenaar R. A. E. M.; Tomlinson I.; Torres D.; Torres-Mejía G.; Troester M. A.; Truong T.; Tung N.; Tzardi M.; Ulmer H.-U.; Vachon C. M.; van Asperen C. J.; van der Kolk L. E.; van Rensburg E. J.; Vega A.; Viel A.; Vijai J.; Vogel M. J.; Wang Q.; Wappenschmidt B.; Weinberg C. R.; Weitzel J. N.; Wendt C.; Wildiers H.; Winqvist R.; Wolk A.; Wu A. H.; Yannoukakos D.; Zhang Y.; Zheng W.; Hunter D.; Pharoah P. D. P.; Chang-Claude J.; García-Closas M.; Schmidt M. K.; Milne R. L.; Kristensen V. N.; French J. D.; Edwards S. L.; Antoniou A. C.; Chenevix-Trench G.; Simard J.; Easton D. F.; Kraft P.; Dunning A. M. Fine-Mapping of 150 Breast Cancer Risk Regions Identifies 191 Likely Target Genes. Nat. Genet. 2020, 52, 56–73. 10.1038/s41588-019-0537-1. - DOI - PMC - PubMed
1. Perou C. M.; Sørlie T.; Eisen M. B.; van de Rijn M.; Jeffrey S. S.; Rees C. a.; Pollack J. R.; Ross D. T.; Johnsen H.; Akslen L. A.; Fluge O.; Pergamenschikov A.; Williams C.; Zhu S. X.; Lønning P. E.; Børresen-Dale A. L.; Brown P. O.; Botstein D. Molecular Portraits of Human Breast Tumours. Nature 2000, 406, 747–752. 10.1038/35021093. - DOI - PubMed
1. Ali H. R.; Rueda O. M.; Chin S.-F.; Curtis C.; Dunning M. J.; Aparicio S. A. J. R.; Caldas C. Genome-Driven Integrated Classification of Breast Cancer Validated in over 7, 500 Samples. Genome Biol. 2014, 15, 431.10.1186/s13059-014-0431-1. - DOI - PMC - PubMed
1. Coates A. S.; Winer E. P.; Goldhirsch A.; Gelber R. D.; Gnant M.; Piccart-Gebhart M.; Thürlimann B.; Senn H.-J. Panel Members. Tailoring Therapies--Improving the Management of Early Breast Cancer: St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2015. Ann. Oncol. Off. J. Eur. Soc. Med. Oncol. 2015, 26, 1533–1546. 10.1093/annonc/mdv221. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Proteogenomic Workflow Reveals Molecular Phenotypes Related to Breast Cancer Mammographic Appearance

Affiliations

Proteogenomic Workflow Reveals Molecular Phenotypes Related to Breast Cancer Mammographic Appearance

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases