Skip to main content

Leveraging natural language processing to augment structured social determinants of health data in the electronic health record

Authors Kevin Lybarger and Nicholas J. Dobbins discuss this month’s JAMIA Journal Club selection.

Kevin Lybarger and others, Leveraging natural language processing to augment structured social determinants of health data in the electronic health record, Journal of the American Medical Informatics Association, 2023;, ocad073,

Read the abstract

Watch the Recording



Kevin Lybarger, PhD
George Mason University

Dr. Kevin Lybarger is an Assistant Professor in the Department of Information Sciences and Technology at George Mason University. His research interests combine data-driven machine learning and natural language processing (NLP) with important real-world problems. He investigates machine learning and NLP algorithm development and explores creative solutions to impactful use cases. His current research explores the intersection of NLP and clinical informatics, including extracting information from clinical text that can improve health care and advance clinical research. He earned a PhD in Electrical and Computer Engineering from the University of Washington, MS in Electrical and Computer Engineering from the University of Colorado Boulder, and BS in Electrical and Computer Engineering from Seattle University. He was a Postdoctoral Fellow at the University of Washington School of Medicine through the National Library of Medicine Biomedical Informatics Research Trainee Program.

Nic Dobbins, MLIS
University of Washington, Seattle

Nic Dobbins is the Principal Solutions Architect at UW Medicine Research IT and a PhD Candidate in biomedical informatics at the University of Washington. Nic’s research explores the intersections of cohort discovery, dynamic database query generation, question answering, data discovery, human-computer interaction and natural language processing (NLP) for real-world challenges. He is the creator of Leaf, a widely used open-source cohort discovery application at academic medical centers and commercial companies around the world. He previously earned an MLIS at the University of Washington Information School and BA in History and Japanese Language from the University of Minnesota.


Hanyin Wang
PhD Candidate
Northwestern University, Feinberg School of Medicine
Evanston, IL

Learning Objectives

Participants should be able to:

  • Describe sources of social determinant of health information in the electronic health record (EHR), differentiating between structured data sources and free-text clinical notes
  • Identify important attributes for characterizing social determinants of health, like alcohol, drug, and tobacco use, living situation, and employment
  • Recognize the importance of incorporating information from the clinical narrative with structured data in the EHR to represent patients

Statement of Purpose

Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: i) develop a natural language processing (NLP) information extraction model to capture detailed SDOH information and ii) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data.

We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set and compared the extracted SDOH information with existing structured data. The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative.


  • 35-minute presentation by article author(s) considering salient features of the published study and its potential impact on practice
  • 25-minute discussion of questions submitted by listeners via the webinar tools and moderated by JAMIA Student Editorial Board members. 

Accreditation Statement

The American Medical Informatics Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians.

Commercial Support

No commercial support was received for this activity. 

Dates and Times: -
Type: Webinar
Course Format(s): On Demand
Price: Free