Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 20;6(5):102090.
doi: 10.1016/j.xcrm.2025.102090. Epub 2025 Apr 30.

Integrative proteomic profiling of tumor and plasma extracellular vesicles identifies a diagnostic biomarker panel for colorectal cancer

Affiliations

Integrative proteomic profiling of tumor and plasma extracellular vesicles identifies a diagnostic biomarker panel for colorectal cancer

Jun Wang et al. Cell Rep Med. .

Abstract

The lack of reliable non-invasive biomarkers for early colorectal cancer (CRC) diagnosis underscores the need for improved diagnostic tools. Extracellular vesicles (EVs) have emerged as promising candidates for liquid-biopsy-based cancer monitoring. Here, we propose a comprehensive workflow that integrates staged mass spectrometry (MS)-based discovery and verification with ELISA-based validation to identify EV protein biomarkers for CRC. Our approach, applied to 1,272 individuals, yields a machine learning model, ColonTrack, incorporating EV proteins HNRNPK, CTTN, and PSMC6. ColonTrack effectively distinguishes CRC from non-CRC cases and identifies early-stage CRC with high accuracy (combined area under the curve [AUC] >0.97, sensitivity ∼0.94, specificity ∼0.93). Our analysis of EV protein profiles from tissue and plasma demonstrates ColonTrack's potential as a robust non-invasive biomarker panel for CRC diagnosis and early detection.

Keywords: diagnostic biomarker panel; early diagnosis of colorectal cancer; machine learning; proteomics; tissue extracellular vesicles.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Comprehensive analysis of tissue- and plasma-derived EVs for discovery of CRC biomarker (A) Workflow for EV protein-based biomarker discovery in colorectal cancer. (B) Clinical indicators of subjects used in discovery cohort, including 40 patients with CRC and 20 healthy individuals. (C–E) TEM of primary tumor (PT) tissue (scale bar, 2 μm; in the magnified view, scale bar, 500 nm), PT tissue EV (scale bar, 200 nm), and CRC plasma EV (scale bar, 200 nm). Characterization of tissue EVs and plasma EVs by (D) NTA and (E) immunoblotting. (F) Overlapping of identified proteins between two spectral libraries (left) and of quantified proteins between tissue EV samples and plasma EV samples (right). (G) Proteins quantified in tissue EV samples (left) and plasma EV samples were ranked according to their median intensity. The top ten most abundant proteins are labeled in left, and their relative contribution to the total protein intensity is indicated. The top 12 most abundant proteins in plasma are labeled in right, and their relative contribution to the total protein intensity is indicated.
Figure 2
Figure 2
Integration of tissue EV and plasma EV proteomics revealed candidate biomarkers for CRC diagnosis (A) WGCNA identified 11 functional protein modules based on tissue EV proteome. Dendrogram of proteins based on the measurement of dissimilarity and identification of the 11 modules. (B) Network plot of 11 functional protein modules (ME00–10), each network node represents one protein, color-coded by the different functional modules. (C) GSEA of CRC-associated pathway comparing PT and AM. Pathways highly upregulated in the PT group (normalized enrichment score [NES] > 0) were visualized (A total of nine relevant pathways were evaluated, with the other four pathways shown in Figure S2E.). (D) Volcano plot displaying differentially abundant proteins between PT tissue EVs and AM tissue EVs. Each dot represents a protein, with red dots for proteins significantly upregulated in PT and blue dots for proteins significantly downregulated in PT. p value was calculated using Wilcoxon rank-sum test. (E) Dot plot illustrating enriched pathways identified based on significantly upregulated/downregulated proteins. Red dots represent pathways enriched (adjusted p < 0.05) for proteins upregulated in PT compared to AM, whereas blue dots represent pathways enriched (adjusted p < 0.05) for proteins downregulated in PT. p values were calculated using R package clusterProfiler (version 4.8.3). (F) The clustering heatmap showed differential EV proteins in plasma EVs between patients with CRC and healthy individuals. (G) Principal coordinates analysis (PCoA) based on Bray-Curtis distance of plasma EV samples. p value was calculated using PERMANOVA test. (H) Integration of tissue EV and plasma EV proteomics to reveal candidate biomarkers for CRC diagnosis. (I) Tissue-plasma EV proteome abundance map showing median protein intensity (assessed by MS intensity) in the tissue EV as a function of that in the plasma. Bottom panel highlighted the selected 21 candidate biomarkers. (J) Heatmap showed median intensity and detection ratio of 21 selected candidate biomarkers in tissue EV samples (left) and plasma EV samples (right). The color intensity of the grid represents the intensity of detected proteins, and the number represents detection ratio. Bar plots represent fold change of proteins in tissue EV samples (PT vs. AM).
Figure 3
Figure 3
Verification of plasma EV candidate biomarkers using PRM-MS-based targeted MS (A) The flowchart for screening 21 candidate proteins to select those suitable for developing the diagnostic model. (B) Boxplot of protein abundance for 21 candidate biomarkers in 20 patients with CRC and 20 healthy individuals. The median values in each group are shown as black dotted lines. The differences between groups for each candidate were analyzed by Wilcoxon rank-sum test (∗ for p < 0.05, ∗∗ for p < 0.01, ∗∗∗ for p < 0.001). (C) The diagnostic capacity of single candidate biomarker and the highlighted proteins were selected as P6 biomarker panel for ELISA validation. AUC, area under curve; SEN, sensitivity; SPE, specificity. (D) Correlation of protein abundance for 21 candidate biomarkers. Asterisks represent significance (∗ for p < 0.05, ∗∗ for p < 0.01, ∗∗∗ for p < 0.001) and the highlighted proteins were selected as P6 biomarker panel for ELISA validation. p value was calculated using Delong test. (E) Variable importance plots produced by the random forest algorithm measured as each variable’s mean decrease in accuracy. p value was calculated using Delong test. (F) Performance benchmark of 10 state-of-the-art machine learning classifiers based on three evaluation metrics: ROC AUC, accuracy, and F1 score. Data are represented as mean ± SD. (G) The ROC curve of the P6 panel demonstrated superior performance compared to CEA or CA19-9 in diagnosing CRC.
Figure 4
Figure 4
ColonTrack model development for CRC diagnosis based on ELISA validation (A) Workflow for diagnostic model construction and assessment based on ELISA quantification result. p value was calculated using Delong test. (B) Comparison of ELISA quantification of candidate biomarkers in P6 panel between HC (n = 77), BD (n = 91), and CRC (n = 176) groups on modeling set. (C) Variable importance plots produced by the random forest algorithm measured as each variable’s mean decrease in accuracy for CRC versus HC (top) and CRC versus BD (bottom). Data are represented as mean ± SD. (D) The top panel displays the ROC curves for the ColonTrack model, CEA, and CA19-9 in distinguishing patients with CRC (n = 53) from HC (n = 23), while the bottom panel presents the confusion matrix for classifying CRC versus HC in the testing set. (E) The top panel displays the ROC curves for the ColonTrack model, CEA, and CA19-9 in distinguishing patients with CRC (n = 53) from BD (n = 27), while the bottom panel presents the confusion matrix for classifying CRC versus BD in the testing set.
Figure 5
Figure 5
Diagnosis performance of the ColonTrack model on a large cohort (A) The ROC curves for the ColonTrack model, CEA, and CA19-9 in distinguishing patients with CRC (n = 75) from HC (n = 32) or from BD (n = 39), and the confusion matrix for classifying CRC versus HC or CRC versus BD in the internal validation set. (B) The ROC curves for the ColonTrack model, CEA, and CA19-9 in distinguishing patients with CRC (n = 142) from HC (n = 66) or from BD (n = 66), and the confusion matrix for classifying CRC versus HC or CRC versus BD in the external validation set. (C) The ColonTrack score of CRC, BD, and HC in the internal validation set. (D) As in (C), but in the external validation set. Significance was determined by two-sided Wilcoxon rank-sum test (∗ for p < 0.05, ∗∗ for p < 0.01, ∗∗∗ for p < 0.001). (E) The detection sensitivity of ColonTrack model, CEA and CA19-9 for different TNM stage of CRC patients in internal validation set and external validation set. (F) The ROC curves for the ColonTrack model, CEA, and CA19-9 in distinguishing patients with early-stage (TNM stage I) CRC (n = 7) from patients with CRA (n = 27), and the confusion matrix for classifying CRC versus CRA in the internal validation set. (G) The ROC curves for the ColonTrack model, CEA, and CA19-9 in distinguishing patients with early-stage (TNM stage I) CRC (n = 29) from patients with CRA (n = 46), and the confusion matrix for classifying CRC versus CRA in the external validation set. (H) By comparing pre- and post-surgery ELISA tests, it was found that the expression levels of CTTN, HNRNPK, and PSMC6 in plasma EVs of patients with CRC (n = 38) were significantly downregulated after surgery. The ColonTrack model demonstrated a negative result with postoperative patients. Significance was determined by two-sided Wilcoxon rank-sum test (∗ for p < 0.05, ∗∗ for p < 0.01, ∗∗∗ for p < 0.001). (I) Clinical indicators of postoperative patients (n = 38).
Figure 6
Figure 6
Benchmarking ColonTrack performance with an additional cohort (A) The heatmap showed expression levels of CTTN, HNRNPK, and PSMC6 and the diagnostic result of ColonTrack. The expression levels were standardized using Z score transformation. (B) The ROC curves for the ColonTrack model, CEA, and CA199 in distinguishing patients with CRC (n = 126) from non-CRC (n = 136), and the confusion matrix for classifying CRC versus non-CRC in the additional cohort. p value was calculated using Delong test. (C)The ROC curves for the ColonTrack model, CEA, and CA199 in distinguishing patients with CRC (n = 53) from CRA (n = 68), and the confusion matrix for classifying CRC versus CRA in the additional cohort. p value was calculated using Delong test. (D) CRC positivity rates for different indicators (top) and the detection rates of ColonTrack and mSeptin-9 across different stages of CRC in the additional cohort (bottom). (E) CRC positivity rates of ColonTrack and mSeptin-9 across different stages of CRC in the retrospective cohort.

References

    1. Siegel R.L., Giaquinto A.N., Jemal A. Cancer statistics, 2024. CA Cancer J. Clin. 2024;74:12–49. doi: 10.3322/caac.21820. - DOI - PubMed
    1. Siegel R.L., Miller K.D., Wagle N.S., Jemal A. Cancer statistics, 2023. CA Cancer J. Clin. 2023;73:17–48. doi: 10.3322/caac.21763. - DOI - PubMed
    1. Rabeneck L., Chiu H.M., Senore C. International Perspective on the Burden of Colorectal Cancer and Public Health Effects. Gastroenterology. 2020;158:447–452. doi: 10.1053/j.gastro.2019.10.007. - DOI - PubMed
    1. Robertson D.J., Lee J.K., Boland C.R., Dominitz J.A., Giardiello F.M., Johnson D.A., Kaltenbach T., Lieberman D., Levin T.R., Rex D.K. Recommendations on Fecal Immunochemical Testing to Screen for Colorectal Neoplasia: A Consensus Statement by the US Multi-Society Task Force on Colorectal Cancer. Gastroenterology. 2017;152:1217–1237.e3. doi: 10.1053/j.gastro.2016.08.053. - DOI - PubMed
    1. Barnell E.K., Wurtzler E.M., La Rocca J., Fitzgerald T., Petrone J., Hao Y., Kang Y., Holmes F.L., Lieberman D.A. Multitarget Stool RNA Test for Colorectal Cancer Screening. JAMA. 2023;330:1760–1768. doi: 10.1001/jama.2023.22231. - DOI - PMC - PubMed

Substances

LinkOut - more resources