Can GPT-3.5 generate and code discharge summaries?
Presenter
Statement of Purpose
Medical document coding is the process of assigning labels from a structured label space – a terminology or an ontology, e.g., the International Classification of Diseases (ICD) – to medical documents (e.g., discharge summaries) in order to summarize the concepts relevant to a patient's journey (e.g., conditions, or procedures) as structured data. This process (currently performed by human coders) is laborious, costly, and error-prone. In recent years, efforts have been made towards automating medical document coding with artificial neural network models. These early neural approaches under-utilize the rich information represented within of the ontology (structure, and concept descriptions, connections between ontologies), most notably through treating individual predictions as independent outputs and evaluating predictions as flat. Furthermore, the label spaces within this task are large (in the order of thousands of labels) and follow a big-head long-tail label distribution, giving rise to few-shot and zero-shot scenarios.
In this talk we will investigate the usefulness of a general-domain Large Language Model – GPT-3.5 in the context of ICD coding. The main focus will be on GPT’s ability to generate discharge summaries, their quality (according to the opinions of clinical professionals), and what they can be used for. We will comment on GPT's ability to perform clinical document coding compared to specialized neural network models. We will also briefly touch upon the ideas of ontology-driven hierarchical evaluation for assessing the correctness of a model’s prediction with respect to the structure of the label space) and data augmentation (in order address the data sparsity issue) in automatic ICD coding developed as part of my thesis.
Learning Outcomes
After this talk the attendees should be able to understand the strengths and weaknesses of GPT-3.5 in the context of generating synthetic discharge summaries in document coding with International Classification of Diseases, 10th revision (ICD-10); its value as a data generator for smaller artificial neural network models; and the level of its ability to code real discharge summaries produced by GPT-3.5. The attendees can further apply this understanding along with our suggestions of future directions in their own research into generation or automatic coding of discharge summaries.