A generalist vision-language foundation model for diverse biomedical tasks

Kai Zhang¹, Rong Zhou¹, Eashan Adhikarla¹, Zhiling Yan¹, Yixin Liu¹, Jun Yu¹, Zhengliang Liu², Xun Chen³, Brian D Davison¹, Hui Ren⁴, Jing Huang^{5

6}, Chen Chen⁷, Yuyin Zhou⁸, Sunyang Fu⁹, Wei Liu¹⁰, Tianming Liu², Xiang Li¹¹, Yong Chen^{5

12

13

14}, Lifang He¹⁵, James Zou^{16

17}, Quanzheng Li⁴, Hongfang Liu⁹, Lichao Sun¹⁸

Affiliations

¹ Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA.
² School of Computing, University of Georgia, Athens, GA, USA.
³ Samsung Research America, Mountain View, CA, USA.
⁴ Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
⁵ Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
⁶ PolicyLab, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
⁷ Center for Research in Computer Vision, University of Central Florida, Orlando, FL, USA.
⁸ Department of Computer Science and Engineering, University of California, Santa Cruz, CA, USA.
⁹ McWilliams School of Biomedical Informatics, UTHealth, Houston, TX, USA.
¹⁰ Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, USA.
¹¹ Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. xli60@mgh.harvard.edu.
¹² The Center for Health AI and Synthesis of Evidence (CHASE), University of Pennsylvania, Philadelphia, PA, USA.
¹³ Penn Institute for Biomedical Informatics (IBI), Philadelphia, PA, USA.
¹⁴ Leonard Davis Institute of Health Economics, Philadelphia, PA, USA.
¹⁵ Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA. lih319@lehigh.edu.
¹⁶ Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.
¹⁷ Department of Computer Science, Stanford University, Stanford, CA, USA.
¹⁸ Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA. lis221@lehigh.edu.

PMID: 39112796
DOI: 10.1038/s41591-024-03185-2

A generalist vision-language foundation model for diverse biomedical tasks

Kai Zhang et al. Nat Med. 2024 Nov.

. 2024 Nov;30(11):3129-3141.

doi: 10.1038/s41591-024-03185-2. Epub 2024 Aug 7.

Authors

Affiliations

¹ Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA.
² School of Computing, University of Georgia, Athens, GA, USA.
³ Samsung Research America, Mountain View, CA, USA.
⁴ Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
⁵ Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
⁶ PolicyLab, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
⁷ Center for Research in Computer Vision, University of Central Florida, Orlando, FL, USA.
⁸ Department of Computer Science and Engineering, University of California, Santa Cruz, CA, USA.
⁹ McWilliams School of Biomedical Informatics, UTHealth, Houston, TX, USA.
¹⁰ Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, USA.
¹¹ Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. xli60@mgh.harvard.edu.
¹² The Center for Health AI and Synthesis of Evidence (CHASE), University of Pennsylvania, Philadelphia, PA, USA.
¹³ Penn Institute for Biomedical Informatics (IBI), Philadelphia, PA, USA.
¹⁴ Leonard Davis Institute of Health Economics, Philadelphia, PA, USA.
¹⁵ Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA. lih319@lehigh.edu.
¹⁶ Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.
¹⁷ Department of Computer Science, Stanford University, Stanford, CA, USA.
¹⁸ Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA. lis221@lehigh.edu.

PMID: 39112796
DOI: 10.1038/s41591-024-03185-2

Abstract

Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tailored outputs for diverse needs. However, existing biomedical generalist AI solutions are typically heavyweight and closed source to researchers, practitioners and patients. Here, we describe BiomedGPT, the first open-source and lightweight vision-language foundation model, designed as a generalist capable of performing various biomedical tasks. BiomedGPT achieved state-of-the-art results in 16 out of 25 experiments while maintaining a computing-friendly model scale. We also conducted human evaluations to assess the capabilities of BiomedGPT in radiology visual question answering, report generation and summarization. BiomedGPT exhibits robust prediction ability with a low error rate of 3.8% in question answering, satisfactory performance with an error rate of 8.3% in writing complex radiology reports, and competitive summarization ability with a nearly equivalent preference score to human experts. Our method demonstrates that effective training with diverse data can lead to more practical biomedical AI for improving diagnosis and workflow efficiency.

PubMed Disclaimer

Conflict of interest statement

Competing interests The research was conducted independently of any commercial or financial relationships that could be construed as a potential conflict of interest. Although X.C. is employed by Samsung, the company was not involved in any aspect of this research. The other authors declare no competing interests.

References

1. Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930–1940 (2023). - DOI - PubMed
1. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023). - DOI - PubMed
1. Moody, L. et al. The person-centred care guideline: from principle to practice. J. Patient Exp. 5, 282–288 (2018). - DOI - PubMed - PMC
1. Langberg, E. M., Dyhr, L. & Davidsen, A. S. Development of the concept of patient-centredness–a systematic review. Patient Educ. Couns. 102, 1228–1236 (2019). - DOI - PubMed
1. Bates, D. W. et al. Reducing the frequency of errors in medicine using information technology. J. Am. Med. Inform. Assoc. 8, 299–308 (2001). - DOI - PubMed - PMC

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A generalist vision-language foundation model for diverse biomedical tasks

Affiliations

A generalist vision-language foundation model for diverse biomedical tasks

Authors

Affiliations

Abstract

Conflict of interest statement

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources