Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.
Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy.
Author(s): Yu, Sheng, Liao, Katherine P, Shaw, Stanley Y, Gainer, Vivian S, Churchill, Susanne E, Szolovits, Peter, Murphy, Shawn N, Kohane, Isaac S, Cai, Tianxi
DOI: 10.1093/jamia/ocv034