Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.
Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to evaluate shallow and deep learning text classifiers and the impact of pretrained embeddings in a small clinical dataset.
Author(s): Oleynik, Michel, Kugic, Amila, Kasáč, Zdenko, Kreuzthaler, Markus
DOI: 10.1093/jamia/ocz149