JAMIA Journal Club Webinar - June 2023 | AMIA - American Medical Informatics Association

Leveraging natural language processing to augment structured social determinants of health data in the electronic health record

Read the abstract

Watch the Recording

Presenters

Kevin Lybarger, PhD

George Mason University

Nic Dobbins, MLIS

University of Washington, Seattle

Manager

Hanyin Wang

PhD Candidate

Northwestern University, Feinberg School of Medicine

Evanston, IL

Learning Objectives

Participants should be able to:

Describe sources of social determinant of health information in the electronic health record (EHR), differentiating between structured data sources and free-text clinical notes
Identify important attributes for characterizing social determinants of health, like alcohol, drug, and tobacco use, living situation, and employment
Recognize the importance of incorporating information from the clinical narrative with structured data in the EHR to represent patients

Statement of Purpose

Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: i) develop a natural language processing (NLP) information extraction model to capture detailed SDOH information and ii) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data.

We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set and compared the extracted SDOH information with existing structured data. The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative.