Large Language Models Are Less Effective at Clinical Prediction Tasks Than Locally Trained Machine Learning Models
Moderator
Presenter
Statement of Purpose
Over the past several decades, medicine has been increasingly aided by artificial intelligence (AI) and particularly machine learning (ML). While model development has advanced, it is widely recognized that implementing ML models successfully often requires local, representative data. However, not all institutions have sufficient resources to implement ML effectively. Large language models (LLMs) have hinted at a potential to mitigate these challenges and fundamentally change the integration of ML in medicine. Closed-source LLMs are pre-trained and can be interacted with conversationally, characteristics that reduce technical friction required to create or use ML (or more broadly, AI) for healthcare settings.
In this study, we evaluate the utility, privacy, and fairness of LLMs compared to traditional ML, using electronic health record (EHR) data from Vanderbilt University Medical Center (VUMC) to predict the likelihood of patient discharge from the hospital within 24 hours and the public-use MIMIC-IV and MIMIC-IV ED datasets from Beth Israel Deaconess Medical Center (BIDMC) to predict the likelihood of transfer to the intensive care unit (ICU) within 24 hours after triage in the emergency department (ED).
Learning Objectives
- Design a multi-faceted LLM evaluation that includes predictive performance, output calibration, data privacy, and algorithmic fairness.
- Describe the advantages and disadvantages of LLMs and traditional ML for clinical prediction tasks.
Additional Information
Disclosures
Presenter
The following presenters have no relevant financial relationship(s) with ineligible companies to disclose.
- Katherine E. Brown, PhD
AMIA Staff
The AMIA staff have no relevant financial relationship(s) with ineligible companies to disclose.
*All of the relevant financial relationships listed for these individuals have been mitigated.