Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Aug 14;41(8):1397-1406.
doi: 10.1016/j.ccell.2023.06.009.

Proteogenomic data and resources for pan-cancer analysis

Yize Li  1 Yongchao Dou  2 Felipe Da Veiga Leprevost  3 Yifat Geffen  4 Anna P Calinawan  5 François Aguet  4 Yo Akiyama  4 Shankara Anand  4 Chet Birger  4 Song Cao  1 Rekha Chaudhary  6 Padmini Chilappagari  6 Marcin Cieslik  7 Antonio Colaprico  8 Daniel Cui Zhou  1 Corbin Day  9 Marcin J Domagalski  6 Myvizhi Esai Selvan  5 David Fenyö  10 Steven M Foltz  1 Alicia Francis  6 Tania Gonzalez-Robles  11 Zeynep H Gümüş  5 David Heiman  4 Michael Holck  6 Runyu Hong  10 Yingwei Hu  12 Eric J Jaehnig  2 Jiayi Ji  13 Wen Jiang  2 Lizabeth Katsnelson  10 Karen A Ketchum  6 Robert J Klein  5 Jonathan T Lei  2 Wen-Wei Liang  1 Yuxing Liao  2 Caleb M Lindgren  9 Weiping Ma  5 Lei Ma  6 Michael J MacCoss  14 Fernanda Martins Rodrigues  1 Wilson McKerrow  10 Ngoc Nguyen  6 Robert Oldroyd  9 Alexander Pilozzi  6 Pietro Pugliese  15 Boris Reva  5 Paul Rudnick  16 Kelly V Ruggles  17 Dmitry Rykunov  5 Sara R Savage  2 Michael Schnaubelt  12 Tobias Schraink  11 Zhiao Shi  2 Deepak Singhal  6 Xiaoyu Song  13 Erik Storrs  1 Nadezhda V Terekhanova  1 Ratna R Thangudu  6 Mathangi Thiagarajan  18 Liang-Bo Wang  1 Joshua M Wang  10 Ying Wang  10 Bo Wen  2 Yige Wu  1 Matthew A Wyczalkowski  1 Yi Xin  6 Lijun Yao  1 Xinpei Yi  2 Hui Zhang  12 Qing Zhang  4 Maya Zuhl  6 Gad Getz  19 Li Ding  20 Alexey I Nesvizhskii  3 Pei Wang  5 Ana I Robles  21 Bing Zhang  22 Samuel H Payne  23 Clinical Proteomic Tumor Analysis Consortium
Collaborators, Affiliations
Review

Proteogenomic data and resources for pan-cancer analysis

Yize Li et al. Cancer Cell. .

Abstract

The National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) investigates tumors from a proteogenomic perspective, creating rich multi-omics datasets connecting genomic aberrations to cancer phenotypes. To facilitate pan-cancer investigations, we have generated harmonized genomic, transcriptomic, proteomic, and clinical data for >1000 tumors in 10 cohorts to create a cohesive and powerful dataset for scientific discovery. We outline efforts by the CPTAC pan-cancer working group in data harmonization, data dissemination, and computational resources for aiding biological discoveries. We also discuss challenges for multi-omics data integration and analysis, specifically the unique challenges of working with both nucleotide sequencing and mass spectrometry proteomics data.

Keywords: CPTAC; data harmonization; multi-omics; open data; pan-cancer; proteogenomics.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests F.A. is an inventor on a patent application related to SignatureAnalyzer-GPU filed by the Broad Institute and is an employee and shareholder of Illumina Inc. since 8 November 2021.

Figures

Fig. 1 -
Fig. 1 -. Tumor types and data types of the CPTAC pan-cancer dataset.
Overview of the available molecular data types for the CPTAC pan-cancer cohort (n=1072, see Table S1 for list of excluded cases and reasons for exclusion from the original data sets). Whole exome, whole genome, transcriptome, proteome, and phosphoproteome data are available for all ten cancer types. Normal samples are available for a subset of tumor types, see Table S1 and S2.
Fig. 2 -
Fig. 2 -. Demographics of the CPTAC dataset.
Distributions of selected clinical features among the pan-cancer cohort illustrated in Fig 1. Age is stratified by quartiles. Grade information is not available for BRCA and COAD cohorts. Stage information is not available for the GBM cohort. BMI, Tobacco use, and Alcohol use data is not available for BRCA, COAD, and HGSC cohorts. For survival plots, time starts at diagnosis. Additional clinical features, such as race and ethnicity, are available for exploration on the ProTrack pan-cancer sample dashboard.
Figure 3 -
Figure 3 -. Streaming data with APIs.
Programmatic access to CPTAC proteogenomic data across all cohorts is provided by both a Python and R API.
Figure 4 -
Figure 4 -. Web portals to CPTAC data.
Multiple websites present CPTAC’s proteogenomic data for visual exploration.

References

    1. ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium (2020). Pan-cancer analysis of whole genomes. Nature 578, 82–93. 10.1038/s41586-020-1969-6. - DOI - PMC - PubMed
    1. Ding L, Bailey MH, Porta-Pardo E, Thorsson V, Colaprico A, Bertrand D, Gibbs DL, Weerasinghe A, Huang K-L, Tokheim C, et al. (2018). Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics. Cell 173, 305–320.e10. 10.1016/j.cell.2018.03.033. - DOI - PMC - PubMed
    1. Alfaro JA, Sinha A, Kislinger T, and Boutros PC (2014). Onco-proteogenomics: cancer proteomics joins forces with genomics. Nat Methods 11, 1107–1113. 10.1038/nmeth.3138. - DOI - PubMed
    1. Mani DR, Krug K, Zhang B, Satpathy S, Clauser KR, Ding L, Ellis M, Gillette MA, and Carr SA (2022). Cancer proteogenomics: current impact and future prospects. Nat Rev Cancer 22, 298–313. 10.1038/s41568-022-00446-5. - DOI - PubMed
    1. Rodriguez H, Zenklusen JC, Staudt LM, Doroshow JH, and Lowy DR (2021). The next horizon in precision oncology: Proteogenomics to inform cancer diagnosis and treatment. Cell 184, 1661–1670. 10.1016/j.cell.2021.02.055. - DOI - PMC - PubMed

Publication types

LinkOut - more resources