Lessons learned on information retrieval in electronic health records: a comparison of embedding models and pooling strategies.
Applying large language models (LLMs) to the clinical domain is challenging due to the context-heavy nature of processing medical records. Retrieval-augmented generation (RAG) offers a solution by facilitating reasoning over large text sources. However, there are many parameters to optimize in just the retrieval system alone. This paper presents an ablation study exploring how different embedding models and pooling methods affect information retrieval for the clinical domain.
Author(s): Myers, Skatje, Miller, Timothy A, Gao, Yanjun, Churpek, Matthew M, Mayampurath, Anoop, Dligach, Dmitriy, Afshar, Majid
DOI: 10.1093/jamia/ocae308