Using natural language processing to construct a metastatic breast cancer cohort from linked cancer registry and electronic medical records data.
Most population-based cancer databases lack information on metastatic recurrence. Electronic medical records (EMR) and cancer registries contain complementary information on cancer diagnosis, treatment and outcome, yet are rarely used synergistically. To construct a cohort of metastatic breast cancer (MBC) patients, we applied natural language processing techniques within a semisupervised machine learning framework to linked EMR-California Cancer Registry (CCR) data.
Author(s): Ling, Albee Y, Kurian, Allison W, Caswell-Jin, Jennifer L, Sledge, George W, Shah, Nigam H, Tamang, Suzanne R
DOI: 10.1093/jamiaopen/ooz040