Working Group Webinar Library
Webinar Library
Can NLP-Generated Data Be Used in Clinical Research?
Human language is complex and often equivocal. Unsurprisingly, even the most sophisticated natural language processing (NLP) algorithms inevitably make mistakes. The impact of these mistakes on the results of clinical research that uses NLP-generated data as one of the inputs is uncertain. In this talk, Alexander Turchin will discuss a recent study that has demonstrated that real world evidence analyses are resilient to a moderate error rate in NLP-generated data, supporting the use of NLP in clinical research. Presenter
Patient Record Summarization in the Age of LLMs
Clinician burnout, driven in part by the increasing burden of EHR documentation, is a significant threat to healthcare quality. This talk will introduce SPEER, a novel approach to automatically generating clinically useful hospital-course summaries from raw clinical notes.
Behavioral Testing and Evaluation to Probe Language Models for Algorithmic Bias
With growing legal and scientific evidence for the importance of reducing model bias, both model developers and deployers need tools to quantify the bias. Unfortunately, algorithmic bias can take as many forms as there are implementations. In this talk, Paul M. Heider covesr a range clinical NLP use cases like de-identification and predicting diagnoses, highlighting the utility of behavioral testing and comparative evaluation methods to identify the scope of a model’s bias. These approaches can be leveraged at the training, testing, and evaluation stages to benefit researchers doing de novo model development and community members tasked with choosing between multiple third-party models to deploy. Presenter
ALNI/NIWG Joint Webinar: Using Python and an Open Source LLM to Facilitate Competency Mapping
This project leverages Python programming and an open-source large language model (LLM) to calculate semantic similarity scores between pairs of text strings. These scores are systematically recorded in an Excel workbook, enabling the organization of string pairings into ranked mappings. Dr. Macintosh employed this methodology to propose mappings between AACN graduate sub-competencies and course learning outcomes.
Levels of Clinical Evaluation for LLMs: Towards More Realistic Evaluations
Large language models (LLMs) hold immense promise for democratizing access to medical information and assisting physicians in delivering higher-quality care. However, realistic evaluations of LLMs in clinical contexts have been limited, with much focus placed on multiple-choice evaluations of clinical knowledge. In this talk, I will present a four-level framework for clinical evaluations, encompassing multiple-choice knowledge assessments, open-ended human ratings, offline human evaluations of real tasks, and online real-world studies within actual workflows. I will discuss the strengths and weaknesses of each approach and argue that advancing towards more realistic evaluations is crucial for realizing the full potential of LLMs. Watch the Recording Presenter