Reducing demographic bias in biomedical machine learning for cancer detection using cfDNA methylation
- PMID: 41736096
- DOI: 10.1186/s13059-026-04006-0
Reducing demographic bias in biomedical machine learning for cancer detection using cfDNA methylation
Abstract
Background: Machine learning models in biomedical research are often hindered by demographic imbalances in clinical datasets, leading to biased predictions that disadvantage minority populations. Existing bias-correction methods face limitations in handling the heterogeneity of biomedical data and the complexity of demographic influences.
Results: We present DeBias, a computational framework for mitigating demographic biases in high-dimensional biomedical datasets. DeBias identifies and removes bias-associated subspaces from the feature space using control samples, enabling global correction of demographic distortions while preserving disease-specific signals. To evaluate its effectiveness, we apply DeBias to cell-free DNA methylation data for cancer detection. DeBias achieves a significant reduction in the number of features exhibiting demographic bias and outperforms existing methods in improving cancer detection performance for minority populations. Performance gains are validated in independent cohorts, highlighting the robustness of the approach.
Conclusions: DeBias offers an effective and generalizable strategy for correcting demographic biases in biomedical machine learning. It represents a step toward more equitable machine learning models that can deliver reliable and unbiased predictions across diverse patient populations.
Keywords: Bias correction; Cancer detection; CfDNA methylation; Demographic bias; Machine learning.
© 2026. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: The institutional review board (IRB) of the University of California at Los Angeles approved this study (IRB#19-000618, IRB#19-000230, IRB#19-001488, IRB#16-000659), and our research complies with all relevant ethical regulations. All participants gave their written informed consent. Consent for publication: Not applicable. Competing interests: X.J.Z., W.H.W., and W.L. are co-founders and board members of EarlyDiagnostics, Inc. X.J.Z. has an executive leadership position at EarlyDiagnostics, Inc. M.L.S, X.N., and C.-C.L. are employees of EarlyDiagnostics,Inc and S.M.D. was a scientific advisor to EarlyDiagnostics, Inc. X.J.Z., W.L., and W.H.W. are stockholders of EarlyDiagnostics, Inc. M.L.S, W.Z., S.L., C.-C.L., Y.Z., X.N. have stock options with EarlyDiagnostics, Inc. S.L., W.L., W.Z., and Y.Z. are consultants for EarlyDiagnostics, Inc. X.J.Z., C.-C.L. X.N., M.L.S, and W.Z. are inventors on a patent application submitted by the Regents of the University of California and EarlyDiagnostics, Inc. (Patent No. WO2023283591A2). The other authors have no competing interests to declare.
References
-
- Li S, Noor ZS, Zeng W, Stackpole ML, Ni X, Zhou Y, et al. Sensitive detection of tumor mutations from blood and its application to immunotherapy prognosis. Nat Commun. 2021;12(1):4172. https://doi.org/10.1038/s41467-021-24457-2. PMID: 34234141; PMCID: PMC8263778.
-
- Zviran A, Schulman RC, Shah M, Hill STK, Deochand S, Khamnei CC, et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat Med. 2020;26(7):1114–24. https://doi.org/10.1038/s41591-020-0915-3. Epub 2020 Jun 1. PMID: 32483360; PMCID: PMC8108131.
-
- Li S, Zeng W, Ni X, Zhou Y, Stackpole ML, Noor ZS, et al. CfTrack: a method of exome-wide mutation analysis of cell-free DNA to simultaneously monitor the full spectrum of cancer treatment outcomes including MRD, recurrence, and evolution. Clin Cancer Res. 2022;28(9):1841–53. https://doi.org/10.1158/1078-0432.CCR-21-1242. PMID: 35149536; PMCID: PMC9126584.
-
- Zeng W, Liu CC, Li S, Zhou Y, Stackpole ML, Xiao Y, et al. Toward the simultaneous detection of multiple diseases with a highly cost-effective cell-free DNA methylome test. Datasets. European Genome-phenome Archive. https://www.ega-archive.org/studies/EGAS00001008125. 2025
-
- Gao Q, Lin YP, Li BS, Wang GQ, Dong LQ, Shen BY, et al. Unintrusive multi-cancer detection by circulating cell-free DNA methylation sequencing (THUNDER): development and independent validation studies. Ann Oncol. 2023;34(5):486–95. https://doi.org/10.1016/j.annonc.2023.02.010. Epub 2023 Feb 26. PMID: 36849097.
Grants and funding
LinkOut - more resources
Full Text Sources
