Large language models are less effective at clinical prediction tasks than locally trained machine learning models.
To determine the extent to which current large language models (LLMs) can serve as substitutes for traditional machine learning (ML) as clinical predictors using data from electronic health records (EHRs), we investigated various factors that can impact their adoption, including overall performance, calibration, fairness, and resilience to privacy protections that reduce data fidelity.
Author(s): Brown, Katherine E, Yan, Chao, Li, Zhuohang, Zhang, Xinmeng, Collins, Benjamin X, Chen, You, Clayton, Ellen Wright, Kantarcioglu, Murat, Vorobeychik, Yevgeniy, Malin, Bradley A
DOI: 10.1093/jamia/ocaf038