. 2025 Jul:9:e2400263.

doi: 10.1200/CCI-24-00263. Epub 2025 Jul 22.

Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text

Chunyang Li^{1

2}, Michael Stringer², Vikas Patil^{1

2}, Richard Mcshinsky^{1

2}, Deborah Morreall^{1

2}, Christina Yong^{1

2}, Kelli M Rasmussen^{1

2}, Zachary Burningham^{1

2}, Suzanne Tamang^{3

4}, Carolyn S Menendez^{5

6

7}, Akiko Chiba^{5

6

7}, Haley A Moss^{5

6

7}, Sarah Colonna^{1

2

7

8}, Kerry Rowe⁷, Daphne Friedman^{5

6

7}, Michael J Kelley^{5

6

7}, Ahmad Halwani^{1

2

7

8}

Affiliations

¹ George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, UT.
² University of Utah School of Medicine, Salt Lake City, UT.
³ School of Medicine, Stanford University, Stanford, CA.
⁴ Department of Veterans Affairs, Menlo Park, CA.
⁵ Duke University School of Medicine, Durham, NC.
⁶ Durham Veterans Affairs Health Care System, Durham, NC.
⁷ Department of Veterans Affairs (VA), National Oncology Program, Washington, DC.
⁸ Huntsman Cancer Institute, Salt Lake City, UT.

PMID: 40694781
PMCID: PMC12303249
DOI: 10.1200/CCI-24-00263

Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text

Chunyang Li et al. JCO Clin Cancer Inform. 2025 Jul.

. 2025 Jul:9:e2400263.

doi: 10.1200/CCI-24-00263. Epub 2025 Jul 22.

Authors

Affiliations

¹ George E. Wahlen Veterans Affairs Medical Center, Salt Lake City, UT.
² University of Utah School of Medicine, Salt Lake City, UT.
³ School of Medicine, Stanford University, Stanford, CA.
⁴ Department of Veterans Affairs, Menlo Park, CA.
⁵ Duke University School of Medicine, Durham, NC.
⁶ Durham Veterans Affairs Health Care System, Durham, NC.
⁷ Department of Veterans Affairs (VA), National Oncology Program, Washington, DC.
⁸ Huntsman Cancer Institute, Salt Lake City, UT.

PMID: 40694781
PMCID: PMC12303249
DOI: 10.1200/CCI-24-00263

Abstract

Purpose: The ability of large language models (LLMs) to identify access to germline genetic testing from unstructured text remains unknown. The Department of Veterans Affairs (VA) assessed access in Veterans with breast cancer by implementing and evaluating the performance of open-source, locally deployable LLMs (Llama 3 70B, Llama 3 8B, and Llama 2 70B) in identifying access from clinical/consult notes.

Methods: We identified a cohort of 1,201 Veterans diagnosed with breast cancer between January 1, 2021, and December 31, 2022, who received cancer care within the nationwide VA system and had clinical and/or consult notes available. Notes from a subset of 200 randomly selected patients, reviewed by subject-matter experts to identify access to testing, were split into development and testing sets, and various hyperparameters and prompting approaches were applied. We evaluated LLM performance using accuracy, precision, recall, and F1, with expert consensus on the labeled subset serving as ground truth. We compared LLM-identified access distribution in the entire cohort with expert-identified access in the labeled subset using the chi-squared test.

Results: Llama 3 70B achieved an F1 score of 0.912 (95% CI, 0.853 to 0.971), besting Llama 3 8B (F1: 0.811; 95% CI, 0.720 to 0.901) and significantly outperforming Llama 2 70B (F1: 0.644; 95% CI, 0.514 to 0.773; the test set target variable prevalence was 0.72.) We observed no significant difference between the performance of Llama 3 70B and that of the average individual expert reviewer, nor between LLM-identified access distribution across the entire cohort and expert-identified distribution in the labeled subset.

Conclusion: An open-source, locally deployable LLM effectively and efficiently identified germline genetic testing access from clinical notes. LLMs may enhance care quality and efficiency, while safeguarding sensitive data.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Zachary Burningham

Research Funding: AbbVie, Genentech/Roche, Pharmacyclics

Daphne Friedman

Stock and Other Ownership Interests: Biogen (I), CVS (I), Johnson & Johnson (JNJ) (I), United Health Group (I), Zoetis (I)

Research Funding: Johnson & Johnson/Janssen (Inst), Karyopharm Therapeutics (Inst)

Michael J. Kelley

Research Funding: Novartis (Inst), Bristol Myers Squibb (Inst), Regeneron (Inst), Genentech (Inst), EQRx (Inst), Mirati Therapeutics (Inst)

Open Payments Link: https://openpaymentsdata.cms.gov/physician/827136

Ahmad Halwani

Research Funding: Bristol Myers Squibb (Inst), Kyowa Hakko Kirin (Inst), Roche/Genentech (Inst), AbbVie/Genentech (Inst), AbbVie (Inst), Immune Design (Inst), miRagen (Inst), Amgen (Inst), Seagen (Inst), Genentech (Inst), Takeda (Inst), Pharmacyclics (Inst), Bayer (Inst)

Travel, Accommodations, Expenses: Pharmacyclics, AbbVie, Seagen, Immune Design

No other potential conflicts of interest were reported.

Figures

**FIG 1.**
Patient cohort. VA, Veterans Affairs.

**FIG 2.**
Case selection for large language model development and testing.

See this image and copyright information in PMC

References

1. Trayes KP, Cokenakes SEH: Breast cancer treatment. Am Fam Physician 104:171-178, 2021 - PubMed
1. Smolarz B, Nowak AZ, Romanowicz H: Breast cancer-epidemiology, classification, pathogenesis and treatment (review of literature). Cancers 14:2569, 2022 - PMC - PubMed
1. Tung N, Lin NU, Kidd J, et al. : Frequency of germline mutations in 25 cancer susceptibility genes in a sequential series of patients with breast cancer. J Clin Oncol 34:1460-1468, 2016 - PMC - PubMed
1. Desai NV, Yadav S, Batalini F, et al. : Germline genetic testing in breast cancer: Rationale for the testing of all women diagnosed by the age of 60 years and for risk-based testing of those older than 60 years. Cancer 127:828-833, 2021 - PubMed
1. Manahan ER, Kuerer HM, Sebastian M, et al. : Consensus guidelines on genetic` testing for hereditary breast cancer from the American Society of Breast Surgeons. Ann Surg Oncol 26:3025-3031, 2019 - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text

Affiliations

Using Open-Source Large Language Models to Identify Access to Germline Genetic Testing in Veterans With Breast Cancer From Unstructured Text

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical