Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements.
This study aimed to reduce reliance on large training datasets in support vector machine (SVM)-based clinical text analysis by categorizing keyword features. An enhanced Mayo smoking status detection pipeline was deployed. We used a corpus of 709 annotated patient narratives. The pipeline was optimized for local data entry practice and lexicon. SVM classifier retraining used a grouped keyword approach for better efficiency. Accuracy, precision, and F-measure of the unaltered and [...]
Author(s): Khor, Richard, Yip, Wai-Kuan, Bressel, Mathias, Rose, William, Duchesne, Gillian, Foroudi, Farshad
DOI: 10.1136/amiajnl-2013-002090