Conventional NLP Classifiers versus Large Language Models for Risk Prediction in Clinical Care

Traditional machine learning classifiers using structured representations of text, such as randomly initialized concept embeddings (concept unique identifiers - CUIs), have demonstrated strong performance in clinical risk prediction tasks. In prior work, we developed a CUI-based convolutional neural network substance misuse classifier trained on clinical notes for hospital-based screening. While effective, such models require extensive feature engineering and are limited in their semantic understanding. Recent advances in large language models (LLMs) enable richer contextualization of clinical narratives through prompt engineering and parameter-efficient tuning for computable phenotyping.

In this study, we systematically compare conventional NLP classifiers with LLM-based approaches for clinical risk prediction across multiple domains. We evaluate performance across risk prediction versus next word prediction, identifying where different models excel. Our findings highlight the advantages of LLM-based approaches for retrieval, summarization, and information extraction; however, they remain limited as classifiers for direct risk prediction. These results inform pragmatic considerations for deploying AI tools in health system operations and suggest opportunities for hybrid models that leverage the strengths of both paradigms.

Watch the Recording

Presenter

Majid Afshar, MD

University of Wisconsin-Madison

Read full bio