Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity.
We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared information between cancers and auxiliary class features, respectively, to boost performance using enriched annotations which give both location-based information and document level labels for each pathology report.
Author(s): Park, Briton, Altieri, Nicholas, DeNero, John, Odisho, Anobel Y, Yu, Bin
DOI: 10.1093/jamiaopen/ooab085