A Transformer-Based Pipeline for German Clinical Document De-Identification.
Commercially available large language models such as Chat Generative Pre-Trained Transformer (ChatGPT) cannot be applied to real patient data for data protection reasons. At the same time, de-identification of clinical unstructured data is a tedious and time-consuming task when done manually. Since transformer models can efficiently process and analyze large amounts of text data, our study aims to explore the impact of a large training dataset on the performance of [...]
Author(s): Arzideh, Kamyar, Baldini, Giulia, Winnekens, Philipp, Friedrich, Christoph M, Nensa, Felix, Idrissi-Yaghir, Ahmad, Hosch, René
DOI: 10.1055/a-2424-1989