. 2023 Apr 7;13(1):5708.

doi: 10.1038/s41598-023-32955-0.

Automatic detection of circulating tumor cells and cancer associated fibroblasts using deep learning

Cheng Shen¹, Siddarth Rawal², Rebecca Brown², Haowen Zhou¹, Ashutosh Agarwal³, Mark A Watson², Richard J Cote⁴, Changhuei Yang⁵

Affiliations

¹ Department of Electrical Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
² Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA.
³ Department of Biomedical Engineering, DJTMF Biomedical Nanotechnology Institute, University of Miami, Coral Gables, FL, 33146, USA.
⁴ Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA. rcote@wustl.edu.
⁵ Department of Electrical Engineering, California Institute of Technology, Pasadena, CA, 91125, USA. chyang@caltech.edu.

PMID: 37029224
PMCID: PMC10082202
DOI: 10.1038/s41598-023-32955-0

Automatic detection of circulating tumor cells and cancer associated fibroblasts using deep learning

Cheng Shen et al. Sci Rep. 2023.

. 2023 Apr 7;13(1):5708.

doi: 10.1038/s41598-023-32955-0.

Authors

Cheng Shen¹, Siddarth Rawal², Rebecca Brown², Haowen Zhou¹, Ashutosh Agarwal³, Mark A Watson², Richard J Cote⁴, Changhuei Yang⁵

Affiliations

¹ Department of Electrical Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
² Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA.
³ Department of Biomedical Engineering, DJTMF Biomedical Nanotechnology Institute, University of Miami, Coral Gables, FL, 33146, USA.
⁴ Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, 63110, USA. rcote@wustl.edu.
⁵ Department of Electrical Engineering, California Institute of Technology, Pasadena, CA, 91125, USA. chyang@caltech.edu.

PMID: 37029224
PMCID: PMC10082202
DOI: 10.1038/s41598-023-32955-0

Abstract

Circulating tumor cells (CTCs) and cancer-associated fibroblasts (CAFs) from whole blood are emerging as important biomarkers that potentially aid in cancer diagnosis and prognosis. The microfilter technology provides an efficient capture platform for them but is confounded by two challenges. First, uneven microfilter surfaces makes it hard for commercial scanners to obtain images with all cells in-focus. Second, current analysis is labor-intensive with long turnaround time and user-to-user variability. Here we addressed the first challenge through developing a customized imaging system and data pre-processing algorithms. Utilizing cultured cancer and CAF cells captured by microfilters, we showed that images from our custom system are 99.3% in-focus compared to 89.9% from a top-of-the-line commercial scanner. Then we developed a deep-learning-based method to automatically identify tumor cells serving to mimic CTC (mCTC) and CAFs. Our deep learning method achieved precision and recall of 94% (± 0.2%) and 96% (± 0.2%) for mCTC detection, and 93% (± 1.7%) and 84% (± 3.1%) for CAF detection, significantly better than a conventional computer vision method, whose numbers are 92% (± 0.2%) and 78% (± 0.3%) for mCTC and 58% (± 3.9%) and 56% (± 3.5%) for CAF. Our custom imaging system combined with deep learning cell identification method represents an important advance on CTC and CAF analysis.

PubMed Disclaimer

Conflict of interest statement

R.J.C. and S.R. are co-founders and principals at Circulogix Inc. The other authors declare that there are no competing interests.

Figures

**Figure 1**
Schematic of overall design. (a) Multi-channel epifluorescence microscope imaging system. Since our target cells are distributed on the micro-filter at varied heights, the sample is three-dimensional in nature. They are scanned axially under four channels to fully capture the cell-specific biomarker expression. (b) Data preprocessing pipeline. The raw image data are synthesized into a single multi-color all-in-focus whole slide image for further analysis. (c) Data analysis. The classical way to detect CTCs and CAFs relies on human experts. ① First, the experienced pathologists review the whole slide, annotate cells of interest, and count their number. ② Then this annotation paired with fluorescence images is used to train a deep learning model. Because of inherent human observer bias in calling or ignoring positive cells, the prediction from the pre-trained deep learning model is used to cross-validate human expert annotation. ③ Finally, the well-trained deep learning model can independently conduct the cell detection and analysis task.

**Figure 2**
Auto-focusing principle during scanning. First, a coarse scanning with large step size over a wide z range is performed. Then, the image at each z position is used to calculate the focus measure (F-metric). The best focus z position is then estimated as the peak location by fitting a Gaussian function to discrete F-metrics. Centered on this estimated best focus z position, a fine axial scanning with small step size is performed to capture the whole 3D information. Autofocusing is repeated for every lateral x–y scanning position and executed only in DAPI channel. The estimated best focus z position will be used across all channels. Chromatic aberration can be compensated by axial scanning.

**Figure 3**
Data preprocessing pipeline. (a) Data flow starting from raw measurement and ending with a multi-channel all-in-focus whole slide image. Preprocessing consists of three algorithms, among which two are developed by authors and the other one is adapted from an existing work. (b) Principle of all-in-focus compression. Z-stack at each x–y location is split into smaller patches and the best focused z-patch is selected with focus measure. Finally, z-patches are fused into an all-in-focus x–y tile. c Principle of registration and stitching. There is overlap between adjacent x–y tiles due to the tilt between scanner lateral movement coordinates and camera frame coordinates. Subpixel image registration algorithm relies on the overlapping region to find the subpixel shift between two adjacent x–y tiles. Taking the upper left corner tile (x₁, y₁) as the anchor for the final mosaic, all other x–y tiles are translated and stitched to it by blending based on distance transform.

**Figure 4**
Comparison of the whole slide image focus quality by our developed scanner and Olympus VS120 scanner. (a1) Whole slide image (WSI) of a model sample under 20X objective from our developed scanner. (a2) WSI of the same model sample under 20X objective from Olympus VS120 scanner. (b1,b2) and (c1,c2) are the zoom-in on the same regions from two WSIs. Their area size is the same as the image tile from VS120 scanner, 366 μm × 287 μm. (d) Quantitative analysis of focus quality of WSI from both scanners in blue, green and red channel.

**Figure 5**
Cell detection via deep learning. (a) Training pipeline. An experienced pathologist annotates the cells of interest in training images with dots and simultaneously these images are processed by a conventional computer vision (CV) method to segment cell regions. Results from both methods are cross validated by matching annotation dots and segmentation regions. Any region containing annotation dots is used to generate a bounding box and paired with the annotation label. For dots which do not lie in any region, a bounding box centered at each of them is generated with the size of empirical cell diameter. Then, training images and their corresponding bounding boxes with class labels are used to train a generic object detection deep learning model. Here, transfer learning is adopted by using weights pretrained on the COCO benchmark dataset. (b) Testing pipeline. The unseen testing images are analyzed in three ways. First, the same experienced pathologist screens testing images by annotating the cells of interest with bounding boxes, which are sequentially double checked by another computational pathology researcher to make sure there is no oversight or mislabeling. This result is taken as ground truth. In parallel, testing images are segmented by the conventional CV method and then the prediction boxes with labels are generated from segmented regions. Finally, they are sent to our well-trained cell detection model and the predicted bounding boxes can be directly generated. Comparing results from the latter two methods with the ground truth, we find our trained deep learning model outperforms the conventional CV method.

**Figure 6**
Evaluation of mCTC detection. (a) Class distribution and the number of patches images in the training/testing/whole dataset. (b) Precision-recall curve of the ensemble deep learning (DL) model to detect mCTCs in testing patch images. The red dot represents the final chosen operating point. (c) Example of mCTC detection by conventional computer vision (CV) method and ensemble DL model shown horizontally with ground truth from human annotation. (d) Performance comparison between conventional CV method and ensemble DL model to detect mCTCs on the whole slide image level. Both precision and recall of ensemble DL model are significantly higher than the ones of conventional CV method. Statistical analysis uses the ensemble DL model result as the reference to test their difference significance, error bars show standard deviation of precisions and recalls by randomly sampling testing dataset 1000 times and the p-values are specified in the figure for *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, NS, not significant, two-sided z-test.

**Figure 7**
Evaluation of CAF detection. (a) Precision-recall curve of the ensemble deep learning (DL) model to detect CAFs in testing patch images. The red dot represents the final chosen operating point. The red star represents another operating point with higher recall but lower precision. Any possible CAF event will be caught but it requires further human analysis to exclude the false alarms. (b) CAF detection by conventional computer vision (CV) method. (c) Ground truth from human expert annotation. (d) Performance comparison between conventional CV method and ensemble DL model to detect CAFs on the patch image level. Both precision and recall of ensemble DL model are significantly higher than the ones of conventional CV method. Statistical analysis uses the ensemble DL model result as the reference to test their difference significance, error bars show standard deviation of precisions and recalls by randomly sampling testing dataset 1000 times and the p-values are specified in the figure for *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, NS not significant, two-sided z-test.

See this image and copyright information in PMC

Cited by

A Hybrid Intelligence Approach for Circulating Tumor Cell Enumeration in Digital Pathology by Using CNN and Weak Annotations.
Tong L, Wan Y. Tong L, et al. IEEE Access. 2023;11:142992-143003. doi: 10.1109/access.2023.3343701. Epub 2023 Dec 18. IEEE Access. 2023. PMID: 38957613 Free PMC article.
Contrastive Representation Learning for Single Cell Phenotyping in Whole Slide Imaging of Enrichment-free Liquid Biopsy.
Naghdloo A, Tessone D, Nagaraju RM, Zhang B, Kang J, Li S, Oberai A, Hicks JB, Kuhn P. Naghdloo A, et al. bioRxiv [Preprint]. 2025 May 24:2025.05.21.655334. doi: 10.1101/2025.05.21.655334. bioRxiv. 2025. PMID: 40475442 Free PMC article. Preprint.
Length-scale study in deep learning prediction for non-small cell lung cancer brain metastasis.
Zhou H, Lin S, Watson M, Bernadt CT, Zhang O, Liao L, Govindan R, Cote RJ, Yang C. Zhou H, et al. Sci Rep. 2024 Sep 27;14(1):22328. doi: 10.1038/s41598-024-73428-2. Sci Rep. 2024. PMID: 39333630 Free PMC article.
The Role of Circulating Tumor Cells as a Liquid Biopsy for Cancer: Advances, Biology, Technical Challenges, and Clinical Relevance.
Allen TA. Allen TA. Cancers (Basel). 2024 Mar 31;16(7):1377. doi: 10.3390/cancers16071377. Cancers (Basel). 2024. PMID: 38611055 Free PMC article. Review.
Advancements in Circulating Tumor Cell Detection for Early Cancer Diagnosis: An Integration of Machine Learning Algorithms with Microfluidic Technologies.
An L, Liu Y, Liu Y. An L, et al. Biosensors (Basel). 2025 Mar 29;15(4):220. doi: 10.3390/bios15040220. Biosensors (Basel). 2025. PMID: 40277534 Free PMC article. Review.

See all "Cited by" articles

References

1. Lambert AW, Pattabiraman DR, Weinberg RA. Emerging biological principles of metastasis. Cell. 2017;168(4):670–691. doi: 10.1016/j.cell.2016.11.037. - DOI - PMC - PubMed
1. Taftaf R, Liu X, Singh S, Jia Y, Dashzeveg NK, Hoffmann AD, El-Shennawy L, et al. ICAM1 initiates CTC cluster formation and trans-endothelial migration in lung metastasis of breast cancer. Nat. Commun. 2021;12(1):1–15. doi: 10.1038/s41467-021-25189-z. - DOI - PMC - PubMed
1. Plaks V, Koopman CD, Werb Z. Circulating tumor cells. Science. 2013;341(6151):1186–1188. doi: 10.1126/science.1235226. - DOI - PMC - PubMed
1. Williams SCP. Circulating tumor cells. Proc. Natl. Acad. Sci. 2013;110(13):4861–4861. doi: 10.1073/pnas.1304186110. - DOI - PMC - PubMed
1. Potdar PD, Lotey NK. Role of circulating tumor cells in future diagnosis and therapy of cancer. J. Cancer Metastasis Treatm. 2015;1:44–56. doi: 10.4103/2394-4722.158803. - DOI

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

U01 CA233363/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automatic detection of circulating tumor cells and cancer associated fibroblasts using deep learning

Affiliations

Automatic detection of circulating tumor cells and cancer associated fibroblasts using deep learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources