Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Feb 25.
doi: 10.1186/s13059-026-04006-0. Online ahead of print.

Reducing demographic bias in biomedical machine learning for cancer detection using cfDNA methylation

Affiliations
Free article

Reducing demographic bias in biomedical machine learning for cancer detection using cfDNA methylation

Shuo Li et al. Genome Biol. .
Free article

Abstract

Background: Machine learning models in biomedical research are often hindered by demographic imbalances in clinical datasets, leading to biased predictions that disadvantage minority populations. Existing bias-correction methods face limitations in handling the heterogeneity of biomedical data and the complexity of demographic influences.

Results: We present DeBias, a computational framework for mitigating demographic biases in high-dimensional biomedical datasets. DeBias identifies and removes bias-associated subspaces from the feature space using control samples, enabling global correction of demographic distortions while preserving disease-specific signals. To evaluate its effectiveness, we apply DeBias to cell-free DNA methylation data for cancer detection. DeBias achieves a significant reduction in the number of features exhibiting demographic bias and outperforms existing methods in improving cancer detection performance for minority populations. Performance gains are validated in independent cohorts, highlighting the robustness of the approach.

Conclusions: DeBias offers an effective and generalizable strategy for correcting demographic biases in biomedical machine learning. It represents a step toward more equitable machine learning models that can deliver reliable and unbiased predictions across diverse patient populations.

Keywords: Bias correction; Cancer detection; CfDNA methylation; Demographic bias; Machine learning.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The institutional review board (IRB) of the University of California at Los Angeles approved this study (IRB#19-000618, IRB#19-000230, IRB#19-001488, IRB#16-000659), and our research complies with all relevant ethical regulations. All participants gave their written informed consent. Consent for publication: Not applicable. Competing interests: X.J.Z., W.H.W., and W.L. are co-founders and board members of EarlyDiagnostics, Inc. X.J.Z. has an executive leadership position at EarlyDiagnostics, Inc. M.L.S, X.N., and C.-C.L. are employees of EarlyDiagnostics,Inc and S.M.D. was a scientific advisor to EarlyDiagnostics, Inc. X.J.Z., W.L., and W.H.W. are stockholders of EarlyDiagnostics, Inc. M.L.S, W.Z., S.L., C.-C.L., Y.Z., X.N. have stock options with EarlyDiagnostics, Inc. S.L., W.L., W.Z., and Y.Z. are consultants for EarlyDiagnostics, Inc. X.J.Z., C.-C.L. X.N., M.L.S, and W.Z. are inventors on a patent application submitted by the Regents of the University of California and EarlyDiagnostics, Inc. (Patent No. WO2023283591A2). The other authors have no competing interests to declare.

References

    1. Li S, Noor ZS, Zeng W, Stackpole ML, Ni X, Zhou Y, et al. Sensitive detection of tumor mutations from blood and its application to immunotherapy prognosis. Nat Commun. 2021;12(1):4172. https://doi.org/10.1038/s41467-021-24457-2. PMID: 34234141; PMCID: PMC8263778.
    1. Zviran A, Schulman RC, Shah M, Hill STK, Deochand S, Khamnei CC, et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat Med. 2020;26(7):1114–24. https://doi.org/10.1038/s41591-020-0915-3. Epub 2020 Jun 1. PMID: 32483360; PMCID: PMC8108131.
    1. Li S, Zeng W, Ni X, Zhou Y, Stackpole ML, Noor ZS, et al. CfTrack: a method of exome-wide mutation analysis of cell-free DNA to simultaneously monitor the full spectrum of cancer treatment outcomes including MRD, recurrence, and evolution. Clin Cancer Res. 2022;28(9):1841–53. https://doi.org/10.1158/1078-0432.CCR-21-1242. PMID: 35149536; PMCID: PMC9126584.
    1. Zeng W, Liu CC, Li S, Zhou Y, Stackpole ML, Xiao Y, et al. Toward the simultaneous detection of multiple diseases with a highly cost-effective cell-free DNA methylome test. Datasets. European Genome-phenome Archive. https://www.ega-archive.org/studies/EGAS00001008125. 2025
    1. Gao Q, Lin YP, Li BS, Wang GQ, Dong LQ, Shen BY, et al. Unintrusive multi-cancer detection by circulating cell-free DNA methylation sequencing (THUNDER): development and independent validation studies. Ann Oncol. 2023;34(5):486–95. https://doi.org/10.1016/j.annonc.2023.02.010. Epub 2023 Feb 26. PMID: 36849097.

LinkOut - more resources