Comparing natural language processing representations of coded disease sequences for prediction in electronic health records.
Natural language processing (NLP) algorithms are increasingly being applied to obtain unsupervised representations of electronic health record (EHR) data, but their comparative performance at predicting clinical endpoints remains unclear. Our objective was to compare the performance of unsupervised representations of sequences of disease codes generated by bag-of-words versus sequence-based NLP algorithms at predicting clinically relevant outcomes.
Author(s): Beaney, Thomas, Jha, Sneha, Alaa, Asem, Smith, Alexander, Clarke, Jonathan, Woodcock, Thomas, Majeed, Azeem, Aylin, Paul, Barahona, Mauricio
DOI: 10.1093/jamia/ocae091