Clinical document corpora-real ones, translated and synthetic substitutes, and assorted domain proxies: a survey of diversity in corpus design, with focus on German text data.
We survey clinical document corpora, with a focus on German textual data. Due to rigid data privacy legislation in Germany, these resources, with only few exceptions, are stored in protected clinical data spaces and locked against clinic-external researchers. This situation stands in stark contrast with established workflows in the field of natural language processing, where easy accessibility and reuse of (textual) data collections are common practice. Hence, alternative corpus designs [...]
Author(s): Hahn, Udo
DOI: 10.1093/jamiaopen/ooaf024