Evaluating prompt and data perturbation sensitivity in large language models for radiology reports classification.
Large language models (LLMs) offer potential in natural language processing tasks in healthcare. Due to the need for high accuracy, understanding their limitations is essential. The purpose of this study was to evaluate the performance of LLMs in classifying radiology reports for the presence of pulmonary embolism (PE) under various conditions, including different prompt designs and data perturbations.
Author(s): Sorin, Vera, Collins, Jeremy D, Bratt, Alex K, Kusmirek, Joanna E, Mugu, Vamshi K, Kline, Timothy L, Butler, Crystal L, Wood, Nadia G, Cook, Cole J, Korfiatis, Panagiotis
DOI: 10.1093/jamiaopen/ooaf073