From Clinical Notes and Patient Narratives to Real-World Evidence: Case Studies and Evaluation Approaches for LLM-Based Clinical NLP Pipelines

Large language models (LLMs) are now the state-of-the-art backbone for many clinical NLP tasks, yet real-world systems in healthcare still depend on pipelines that combine LLM-based components with established NLP and knowledge-based methods. This talk will use unstructured text as a unifying theme to show how such pipelines transform both clinician-authored documentation and patient narratives into real-world evidence at scale.
I will highlight systems built on electronic health records that attempt to derive structured phenotypes and research variables from free-text notes. I will focus on two ongoing projects that use unstructured text for digital epidemiology: my NIH/NLM R01 developing AI methods for large-scale studies of medication adherence and tolerability across five diseases using patient reports from multiple countries, and the ARPA‑H KronosRx initiative, which leverages electronic health records and other data to improve drug safety prediction before and between clinical trials. These case studies illustrate how LLM-based clinical NLP pipelines connect patient- and clinician-authored narratives into cohesive, text-driven real-world evidence.
Drawing on these projects and long-running SMM4H and HeARD shared tasks, I will outline practical approaches for evaluating LLM systems in clinical settings, emphasizing task-grounded, end-to-end evaluation that accounts for downstream use, fairness, and calibration.

Presenters

Graciela Gonzalez-Hernandez, PhD

Professor and Vice Chair, Department of Computational Biomedicine

Cedars-Sinai Medical Center

Read Full Bio