Skip to main content

Webinar Library

Levels of Clinical Evaluation for LLMs: Towards More Realistic Evaluations

Large language models (LLMs) hold immense promise for democratizing access to medical information and assisting physicians in delivering higher-quality care. However, realistic evaluations of LLMs in clinical contexts have been limited, with much focus placed on multiple-choice evaluations of clinical knowledge. In this talk, I will present a four-level framework for clinical evaluations, encompassing multiple-choice knowledge assessments, open-ended human ratings, offline human evaluations of real tasks, and online real-world studies within actual workflows. I will discuss the strengths and weaknesses of each approach and argue that advancing towards more realistic evaluations is crucial for realizing the full potential of LLMs. Watch the Recording Presenter

Transforming Complex Information Into Compelling, Human Stories

Learn strategies for translating even the most technical information into compelling, human stories—so you can change the way people think, feel, and act. We’ll use sustainability communications as our lens for looking at messaging strategies that work (and those that don’t work). We’ll also talk about the buzzwords and cliches you should avoid in your writing. Finally, we’ll practice what we’ve learned with a simple writing exercise that brings the concepts to life. This session requires audience participation, so bring your ideas, your questions, and something to write with. Watch the Recording Presenter

Prevails and Mirages of Large Language Models in Clinical NLP

This talk will introduce the technologies powering LLMs, overview the recent prevails, and examine the mirages in the hype of LLM magic. Based on the experience in developing two clinical LLMs in the clinical domain, including GatorTron and GatorTronGPT, this talk will provide insight into the potential application of LLMs for clinical NLP and healthcare.

Reducing Diagnostic Delays in Acute Hepatic Porphyria Using Health Records Data and Machine Learning

Acute hepatic porphyria (AHP) is a rare but treatable condition with an average diagnostic delay of 15 years. Utilizing electronic health records (EHR) data and machine learning (ML) can potentially improve the timely recognition of AHP. This study used structured and notes-based EHR data from UCSF and UCLA to develop models predicting who will be referred for AHP testing and who will test positive. The referral model achieved an F-score of 86%-91%, and the diagnosis model achieved an F-score of 92%.