Comparing Large Language Models' Performances on Otolaryngology Knowledge Assessment Questions.
This study evaluates the performance of multiple large language models (LLMs) on specialized otolaryngology knowledge, comparing OpenAI's GPT-4 Turbo with 10 commercially available models to assess their potential utility in otolaryngology medical education.A total of 1,075 questions from OTO QUEST, the official self-assessment resource of the American Academy of Otolaryngology-Head and Neck Surgery, were administered to GPT-4 Turbo using a zero-shot approach. Accuracy was analyzed using logistic regression, adjusting for [...]
Author(s): Cook, Ryan, Kahan, Abner, Scharfenberger, Thomas, Tasoulas, Jason, Hawks-Ladds, Noah, Chouake, Robert, Jariwala, Sunit P, Arora, Shitij
DOI: 10.1055/a-2835-4634