Skip to main content

Risk Factor Extraction and A Comprehensive Review of Predictive Models

Pancreatic cancer (PC) is ranked as the 11th most common cancer in the world with 458,918 new cases in 2018. It is projected to be the second leading cause of cancer-related mortality in the United States by 2030. Most of the mortality is attributed to advanced stage at diagnosis, and hence, only a minority of patients (15-20%) are eligible for surgical resection. Earlier diagnosis of PC with localized disease correlates with improved survival. The low incidence of PC and lack of accurate biomarkers for early-stage disease have made effective screening challenging and hindered efforts to improve overall survival.

We adopt a comprehensive approach to the early detection of pancreatic cancer (PC), which includes Natural Language Processing (NLP), imaging, and blood-based biomarkers. Today, we will focus on our work involving NLP. Our research leverages NLP to analyze Electronic Health Record (EHR) data, extracting known risk factors to identify individuals at high risk of PC. This presentation will highlight recent advancements made by our team in developing NLP algorithms to automate the extraction of PC risk factors from unstructured clinical notes, along with a systematic review of the effectiveness of machine learning (ML) and artificial intelligence (AI) techniques applied to EHR data for predicting PC risk. The review also highlights the limitations of current studies and present best practice recommendations for AI/ML model development to predict PC early using EHR data.

Watch the Recording



Anup Kumar Mishra
Senior Data Scientist
Mayo Clinic
Shounak Majumder
Associate Professor
Mayo Clinic

Additional Information

Please refer to the listed publications for more information.

  • Sarwal D, Wang L, Gandhi S, Sagheb Hossein Pour E, Janssens LP, Delgado AM, Doering KA, Mishra AK, Greenwood JD, Liu H, Majumder S. Identification of pancreatic cancer risk factors from clinical notes using natural language processing. Pancreatology. 2024 Jun;24(4):572-578. doi: 10.1016/j.pan.2024.03.016. Epub 2024 Mar 26. PMID: 38693040.
  • Mishra AK, Chong B, Arunachalam SP, Oberg AL, Majumder S. Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data - A Systematic Review and Assessment. Am J Gastroenterol. 2024 May 16. doi: 10.14309/ajg.0000000000002870. Epub ahead of print. PMID: 38752654.
Dates and Times: -
Course Format(s): On Demand
Price: Free