A study of calibration as a measurement of trustworthiness of large language models in biomedical natural language processing.
To assess the calibration of 9 large language models (LLMs) within biomedical natural language processing (BioNLP) tasks, furthering understanding of trustworthiness and reliability in real-world settings.
Author(s): de Oliveira, Rodrigo, Garber, Matthew, Gwinnutt, James M, Rashidi, Emaan, Hwang, Jwu-Hsuan Shantina, Gilmour, William, Nanavati, Jay, Zine El Abidine, Khaldoun, Mack, Christina DeFilippo
DOI: 10.1093/jamiaopen/ooaf058