Skip to main content

Leveraging natural language processing to augment structured social determinants of health data in the electronic health record

Read the abstract


Watch the Recording

 

Presenters

Kevin Lybarger, PhD
George Mason University
Nic Dobbins, MLIS
University of Washington, Seattle

Manager

Hanyin Wang
PhD Candidate
Northwestern University, Feinberg School of Medicine
Evanston, IL

Learning Objectives

Participants should be able to:

  • Describe sources of social determinant of health information in the electronic health record (EHR), differentiating between structured data sources and free-text clinical notes
  • Identify important attributes for characterizing social determinants of health, like alcohol, drug, and tobacco use, living situation, and employment
  • Recognize the importance of incorporating information from the clinical narrative with structured data in the EHR to represent patients

Statement of Purpose

Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: i) develop a natural language processing (NLP) information extraction model to capture detailed SDOH information and ii) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data.

We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set and compared the extracted SDOH information with existing structured data. The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative.

Dates and Times: -
Type: JAMIA Journal Club
Course Format(s): On Demand
Price: Free
Share