Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;40(4):623-636.
doi: 10.3803/EnM.2024.2208. Epub 2025 Apr 10.

Comprehensive Proteomics and Machine Learning Analysis to Distinguish Follicular Adenoma and Follicular Thyroid Carcinoma from Indeterminate Thyroid Nodules

Affiliations

Comprehensive Proteomics and Machine Learning Analysis to Distinguish Follicular Adenoma and Follicular Thyroid Carcinoma from Indeterminate Thyroid Nodules

Hee-Sung Ahn et al. Endocrinol Metab (Seoul). 2025 Aug.

Abstract

Backgruound: The preoperative diagnosis of follicular thyroid carcinoma (FTC) is challenging because it cannot be readily distinguished from follicular adenoma (FA) or benign follicular nodular disease (FND) using the sonographic and cytological features typically employed in clinical practice.

Methods: We employed comprehensive proteomics and machine learning (ML) models to identify novel diagnostic biomarkers capable of classifying three subtypes: FTC, FA, and FND. Bottom-up proteomics techniques were applied to quantify proteins in formalin-fixed, paraffin-embedded (FFPE) thyroid tissues. In total, 202 FFPE tissue samples, comprising 62 FNDs, 72 FAs, and 68 FTCs, were analyzed.

Results: Close spectrum-spectrum matching quantified 6,332 proteins, with approximately 9% (780 proteins) differentially expressed among the groups. When applying an ML model to the proteomics data from samples with preoperative indeterminate cytopathology (n=183), we identified distinct protein panels: five proteins (CNDP2, DNAAF5, DYNC1H1, FARSB, and PDCD4) for the FND prediction model, six proteins (DNAAF5, FAM149B1, RPS9, TAGLN2, UPF1, and UQCRC1) for the FA model, and seven proteins (ACTN4, DSTN, MACROH2A1, NUCB1, SPTAN1, TAGLN, and XRCC5) for the FTC model. The classifiers' performance, evaluated by the median area under the curve values of the random forest models, was 0.832 (95% confidence interval [CI], 0.824 to 0.839) for FND, 0.826 (95% CI, 0.817 to 0.835) for FA, and 0.870 (95% CI, 0.863 to 0.877) for FTC.

Conclusion: Quantitative proteome analysis combined with an ML model yielded an optimized multi-protein panel that can distinguish FTC from benign subtypes. Our findings indicate that a proteomic approach holds promise for the differential diagnosis of FTC.

Keywords: Follicular thyroid carcinoma; Formalin fixed paraffin embedded tissue; Liquid chromatography–tandem mass spectrometry; Machine learning; Protein biomarker; Thyroid nodule.

PubMed Disclaimer

Conflict of interest statement

CONFLICTS OF INTEREST

No potential conflict of interest relevant to this article was reported.

Figures

Fig. 1.
Fig. 1.
Liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomic results of 202 thyroid formalin-fixed, paraffin-embedded (FFPE) samples. (A) The number of identified MS spectra in the 202 thyroid FFPE tissues: MS/MS (MS2) spectra (magenta), open search spectrum-spectrum matches (open SSMs; green), open search peptide-spectrum matches (open PSMs; orange), closed search SSMs (red), and closed search PSMs (blue). (B) Venn diagram showing the number of proteins identified by open SSM and PSM searches compared with closed SSM and PSM searches. (C) Distribution of normalized protein abundances based on label-free quantification from the open SSM search, with thyroid gland-elevated proteins highlighted in red. (D) Scatter plot of total protein count versus the percentage of thyroglobulin (TG) spectra matched; open search results from MSFragger (black circles) and closed search results from Approximate Nearest Neighbor Spectral Library (ANN-SoLo) (blue circles). The inset shows the number of proteins identified by PSMs and SSMs. (E) Boxplots displaying the number of identified proteins and the percentages of three post-translation modifications (PTMs) (methylation, oxidation/hydroxylation, and iodination) in samples stored short-term (1–5 years; n=38) versus long-term (6–11 years; n=164). (F) Scatter plot comparing protein quantification between the two storage time groups, with a Pearson’s correlation coefficient of 0.93.
Fig. 2.
Fig. 2.
Partial least squares–discriminant analysis (PLS-DA) and functional pathway analysis of 202 thyroid formalin-fixed, paraffin-embedded (FFPE) tissue proteome. (A) Two-dimensional score plot of the three thyroid subtypes: follicular nodular disease (FND; n=62, cyan), follicular adenoma (FA; n=72, green), and follicular thyroid carcinoma (FTC; n=68, red) based on PLS-DA. (B) Heatmap showing the relative quantitative values of proteins mapped to the glycolysis/gluconeogenesis Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway for the three subtypes; color scale: –1 (green), 0 (gray), 1 (red). (C) Hierarchical Spearman’s rank correlation clustering of 22 hallmark gene sets (P<0.05 by one-way analysis of variance [ANOVA]) out of 50 hallmark gene sets. (D) Boxplots of four representative hallmark signal pathways using single-sample scores generated by the singscore algorithm. P values calculated using the Mann-Whitney U test. UV, ultraviolet; IL2, interleukin 2; STAT5, signal transducer and activator of transcription 5; PI3K, phosphoinositide 3-kinase; AKT, protein kinase B; mTOR, mammalian target of rapamycin; NS, not significant. aP<0.05; bP<0.01; cP<0.001.
Fig. 3.
Fig. 3.
Volcano plots and functional annotation of differentially expressed proteins (DEPs) among three thyroid subtypes: benign follicular nodular disease (FND; n=62, cyan), follicular adenoma (FA; n=72, green), and follicular thyroid carcinoma (FTC; n=68, red). (A) Volcano plot displaying log2 fold changes versus log10-adjusted P values for proteins comparing FND and FA; proteins upregulated by more than 2-fold (adjusted P<0.05) are highlighted. (B) Volcano plot comparing FND and FTC. (C) Volcano plot comparing FA and FTC. Cyan, green, and red circles represent upregulated proteins in FND, FA, and FTC, respectively, while gray circles indicate non-significant differences. (D) Venn diagram showing the overlap of DEPs among the three subtypes; the left side represents DEPs in FND, while the right side represents DEPs in FA and FTC. (E) Bar graphs visualizing significant Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO): biological process (BP) functional terms (with false discovery rate [FDR] q values) for FA and FTC compared to FND; functional terms in the Venn diagram (D) are color-coded accordingly (green, orange, and red). TCA, tricarboxylic acid.
Fig. 4.
Fig. 4.
Receiver operating characteristic (ROC) curves of three subtype-specific random forest classifiers: follicular nodular disease (FND)-machine learning (ML), follicular adenoma (FA)-ML, and follicular thyroid carcinoma (FTC)-ML and prediction scores by the classifiers of the entire sample. ROC curves were generated using 100 repeats of threefold cross-validation, plotting the 25th, 50th, and 75th percentiles of sensitivities for each 1-specificity value. (A) ROC curve for the FND-ML model based on five proteins, evaluated in a set of 183 samples (55 FND and 128 others; area under the ROC curve [AUROC], 0.832; 95% confidence interval [CI], 0.824 to 0.839). (B) ROC curve for the FA-ML model based on six proteins, evaluated in 183 samples (66 FA and 117 others; AUROC, 0.826; 95% CI, 0.817 to 0.835). (C) ROC curve for the FTC-ML model based on seven proteins, evaluated in 183 samples (62 FTC and 121 others; AUROC, 0.870; 95% CI, 0.863 to 0.877). (D) Disease prediction scores for the entire sample, with each sample assigned a classification based on the highest three probability scores from the classifiers.
Fig. 5.
Fig. 5.
Qualitative and quantitative differences in peptides of proteins with iodine modifications among three thyroid tumors: benign follicular nodular disease (FND; n=62, cyan), follicular adenoma (FA; n=72, green), and follicular thyroid carcinoma (FTC; n=68, red). (A) Protein-protein interaction network of 13 proteins with iodine modifications. The number of iodinated sites per protein for each tissue type is displayed in a bar plot or pie chart; P values were calculated using a categorical chi-square test. (B) Normalized spectral counts for nine significant iodinated sites on thyroglobulin (Y234, Y382, H1384, Y2194, Y2478, H2487, Y2540, Y2573, and Y2587), with P values calculated using the Mann-Whitney U test. An asterisk (*) indicates modifications that showed statistically significant differences among the three groups. aP<0.05; bP<0.01.
None

References

    1. Aschebrook-Kilfoy B, Grogan RH, Ward MH, Kaplan E, Devesa SS. Follicular thyroid cancer incidence patterns in the United States, 1980-2009. Thyroid. 2013;23:1015–21. - PMC - PubMed
    1. McHenry CR, Phitayakorn R. Follicular adenoma and carcinoma of the thyroid gland. Oncologist. 2011;16:585–93. - PMC - PubMed
    1. Faquin WC. The thyroid gland: recurring problems in histologic and cytologic evaluation. Arch Pathol Lab Med. 2008;132:622–32. - PubMed
    1. Ali SZ, Baloch ZW, Cochand-Priollet B, Schmitt FC, Vielh P, VanderLaan PA. The 2023 Bethesda system for reporting thyroid cytopathology. Thyroid. 2023;33:1039–44. - PubMed
    1. Nikiforov YE, Steward DL, Robinson-Smith TM, Haugen BR, Klopper JP, Zhu Z, et al. Molecular testing for mutations in improving the fine-needle aspiration diagnosis of thyroid nodules. J Clin Endocrinol Metab. 2009;94:2092–8. - PubMed

Substances