Deep learning-based NLP data pipeline for EHR-scanned document information extraction.
Scanned documents in electronic health records (EHR) have been a challenge for decades, and are expected to stay in the foreseeable future. Current approaches for processing include image preprocessing, optical character recognition (OCR), and natural language processing (NLP). However, there is limited work evaluating the interaction of image preprocessing methods, NLP models, and document layout.
Author(s): Hsu, Enshuo, Malagaris, Ioannis, Kuo, Yong-Fang, Sultana, Rizwana, Roberts, Kirk
DOI: 10.1093/jamiaopen/ooac045