AMIA’s Informatics Summit brings together researchers, academicians and innovative thinkers blending complementary thematic tracks in translational bioinformatics, clinical research informatics and data science. Take a look at the top 15 sessions from this year’s Summit. Individual sessions are available for purchase separately. Take up to two years to claim CME.
What’s Included
- Field-defining content. Access sessions presented by the foremost leaders in health informatics from AMIA’s conferences.
- Get the full in-the-room experience. Every session includes a video of the presenter, slides, and audio.
- Easy access. Learn everywhere with AMIA’s easy-to-use online learning platform, compatible across devices.
- Earn CME/CNE. On Demand sessions are eligible for CME or CNE. Take up to two years to claim credit.
The Top 15 Sessions
Understanding the Clinical Modalities Important in NeuroDegenerative Disorders and Risk of Patient Injury Using Machine Learning and Survival Analysis
Falls among the elderly and especially those with NeuroDegenerative Disorders (NDD) reduce life expectancy. The purpose of this study is to explore the role of Machine Learning on Electronic Health Records (EHR) data for time-to-event survival analysis prediction of injuries and the role of sensitive attributes, e.g., Race, Ethnicity, and Sex, in these models. We used multiple survival analysis methods on a cohort of 29,045 patients 65 years and older treated at PennMedicine for either NDD, Mild Cognitive Impairment (MCI), or another disease. We compare the algorithms and explore the role of multiple modalities on improving prediction of injuries among NDD patients, specifically medications and laboratory tests. Overall, we found that medication features resulted in either increased Hazard Ratios (HR) or reduced HR depending on the NDD type. We found that being of Black race significantly increased the risk of fall/injury in the models that included only medication and sensitive attribute features. The combined model that used both modalities (medications and laboratory information) removed this relationship between being of Black race and increases in fall/injury. Therefore, we found that combining modalities in these survival models in the prediction of fall/injury risk among NDD and MCI individuals results in findings that are robust to different Racial and Ethnic groups with no biases apparent in our final combined modality results. Furthermore, combining modalities (both medications and laboratory values) improved the survival analysis performance across multiple survival analysis methods, when compared using the C-index.
Learning Objectives
- Understanding how to address biases in Electronic Health Records (EHR) through the inclusion of sensitive attributes in model construction and evaluation
Speaker
- Mary Regina Boland, MA, MPhil, PhD, FAMIA (Saint Vincent College)
SLR: A Modified Logistic Regression Model with Sinkhorn Divergence for Alzheimer’s Disease Classification
Logistic regression is a widely used model in machine learning, particularly as a baseline for binary classification tasks due to its simplicity, effectiveness, and interpretability. It is especially powerful when dealing with categorical features. Despite its advantages, standard logistic regression fails to capture the distributional and geometric structure of data, especially when features are derived from structured spaces like brain imaging. For instance, in Voxel-Based Morphometry (VBM), measurements from distinct brain regions follow a clear spatial organization, which standard logistic regression cannot fully leverage. In this paper, we propose Sinkhorn Logistic Regression (SLR), a variant of logistic regression that incorporates the Sinkhorn divergence as a loss function. This adaptation enables the model to leverage geometric information about the data distribution, enhancing its performance on structured datasets.
Learning Objectives
- Recognize the significance and challenge for early diagnosis and disease progression monitoring of Alzheimer’s disease
- Understand the concept of optimal transport and Sinkhorn divergence
- Explain the development and methodology of a modified logistic regression model that incorporates Sinkhorn divergence
- Apply the modified logistic regression model to classify Alzheimer’s disease and evaluate its effectiveness
Speaker
- Li Shen, Ph.D. (University of Pennsylvania)
Phenotyping Cognitive Presentations in Alzheimer’s Disease: A Deep Clustering Approach
This study applied Deep Fusion Clustering Network (DFCN) to phenotype patients with clinician-diagnosed early-stage Alzheimer's disease (AD). When evaluated with data from the GERAS-US study, DFCN outperformed K-Prototype clustering in identifying patient subgroups with distinct baseline cognitive profiles and differing risks of cognitive decline within three years. These findings suggest that deep clustering techniques like DFCN can potentially enhance our understanding of the heterogeneity in disease progression of early AD.
Learning Objectives
- Learn about the approaches for applying and evaluating deep clustering techniques in phenotyping complex, mixed-type clinical data
- Understand the challenges in clustering complex clinical data and how advanced deep learning models outperform traditional clustering methods in addressing these challenges.
- Learn about the heterogeneity in patients with early-stage Alzheimer's Disease, including differences in impaired cognitive domains and risks of cognitive decline.
Speaker
- Jinying Chen, PhD (Boston University)
Leveraging Social Determinants of Health in Alzheimer’s Research Using LLM-Augmented Literature Mining and Knowledge Graphs
Growing evidence suggests that social determinants of health (SDoH), a set of nonmedical factors, affect individuals’ risks of developing Alzheimer’s disease (AD) and related dementias. Nevertheless, the etiological mechanisms underlying such relationships remain largely unclear, mainly due to difficulties in collecting relevant information. This study presents a novel, automated framework that leverages recent advancements of large language models (LLM) as well as classic natural language processing techniques to mine SDoH knowledge from extensive literature and to integrate it with AD-related biological entities extracted from the general-purpose knowledge graph PrimeKG. Utilizing graph neural networks, we performed link prediction tasks to evaluate the resultant SDoH-augmented knowledge graph. Our framework shows promise for enhancing knowledge discovery in AD and can be generalized to other SDoH-related research areas, offering a new tool for exploring the impact of social determinants on health outcomes. Our code is available at: GitHub Repository.
Learning Objectives
- Understand the importance of social determinants of health (SDoH) in the context of Alzheimer’s disease (AD) and its related dementias (ADRD) as nonmedical risk factors
- Discuss challenges and possible solutions in studying the effects of SDoH on AD etiology
- Appreciate the utility of new approaches based on the latest advancements of large language models (LLMs) and AI to unravel the relationship between SDoH and AD entities
- Learn a novel, automated framework that leverages LLMs to mine SDoH knowledge from extensive literature and integrate it with AD-related biological entities extracted from some established knowledge graph
Speakers
- Tianqi Shang, MS (University of Pennsylvania)
- Shu Yang, PhD (University of Pennsylvania)
Early Alzheimer's Detection Through Voice Analysis: Harnessing Locally Deployable LLMs via ADetectoLocum, a Privacy-Preserving Diagnostic System
Diagnosing Alzheimer's Disease (AD) early and cost-effectively is crucial. Recent advancements in Large Language Models (LLMs) like ChatGPT have made accurate, affordable AD detection feasible. Yet, HIPAA compliance and the challenge of integrating these models into hospital systems limit their use. Addressing these constraints, we introduce ADetectoLocum, an open-source LLM-equipped model designed for AD risk detection within hospital environments. This model evaluates AD risk through spontaneous patient speech, enhancing diagnostic processes without external data exchange. Our approach secures local deployment and significantly surpasses previous models in predictive accuracy for AD detection, especially in early-stage identification. ADetectoLocum therefore offers a reliable solution for AD diagnostics in healthcare institutions.
Learning Objectives
- Critically evaluate the feasibility and limitations of using locally deployable Large Language Models (LLMs) for early detection of Alzheimer’s Disease (AD), considering factors such as data privacy, accuracy trade-offs, and potential biases.
- Analyze the broader implications of AI-driven diagnostic tools in clinical practice, focusing on their reliability, ethical considerations, and practical challenges in real-world deployment.
- Assess the impact of deploying LLMs for AD detection on patient privacy, clinician decision-making, and healthcare outcomes.
- Develop an informed perspective on the ethical and practical challenges associated with integrating AI-based diagnostic tools into clinical workflows.
Speaker
- Genevieve Mortensen, B.S. (Indiana University)
GENEVIC: GENetic data Exploration and Visualization via Intelligent interactive Console
The generation of massive omics and phenotypic data has enabled investigators to study the genetic architecture and markers in many complex diseases; however, it poses a significant challenge in efficiently uncovering valuable knowledge. Here, we introduce GENEVIC, an AI-driven chat framework that tackles this challenge by bridging the gap between genetic data generation and biomedical knowledge discovery. Leveraging ChatGPT, we aim to make GENEVIC a biologist’s ‘copilot’. It automates the analysis, retrieval, and visualization of customized domain-specific genetic information, and integrates functionalities to generate protein interaction networks, enrich gene sets, and search scientific literature from PubMed, Google Scholar, and arXiv, making it a comprehensive tool for biomedical research. In its pilot phase, GENEVIC is assessed using a curated database that ranks genetic variants associated with Alzheimer’s disease, schizophrenia, and cognition, based on their effect weights from the Polygenic Score (PGS) Catalog, thus enabling researchers to prioritize genetic variants in complex diseases. The implementation of BrainGeneBot is set to transform genomic research for AD and other brain diseases by improving data accessibility, accelerating discovery processes, and refining the precision of genetic insights.
Learning Objectives
- Understand genetic association with complex disease and the methods for association studies.
- Learn the appropriate deep learning technologies for mining complex and heterogeneous genetic datasets.
- Assess the computational methods and resources for integrative studies of genetic markers in brain disease.
Speaker
- Zhongming Zhao, PhD (University of Texas Heal Sci Ctr Houston)
Estimating single sample gene program dysregulation using latent factor causal graphs
Gene expression programs that establish and maintain specific cellular states are orchestrated through a regulatory network composed of transcription factors, cofactors, and chromatin regulators. Dysregulation of this network can lead to a broad range of diseases. In this work, we introduce LaGrACE, a novel method designed to estimate the magnitude of dysregulation of gene programs utilizing both omics data and clinical information. LaGrACE first learns gene programs, represented as latent factors, from gene expression data of a set of reference samples. Then, it facilitates grouping of samples exhibiting similar patterns of gene program dysregulation, thereby enhancing the discovery of underlying molecular mechanisms. We rigorously evaluated LaGrACE’s performance using synthetic data, breast cancer and chronic obstructive pulmonary disease (COPD) datasets, and single-cell RNA sequencing (scRNA-seq) datasets. Our findings demonstrate that LaGrACE is exceptionally robust in identifying biologically meaningful and prognostic subtypes. Additionally, it effectively discerns drug-response signals at a single-cell resolution. The COPD analysis revealed a new association between LEF1 and COPD molecular mechanisms and mortality. Collectively, these results underscore the utility of LaGrACE as a valuable tool for elucidating disease mechanisms.
Learning Objectives
- Explain the role of transcription factors, cofactors, and chromatin regulators in orchestrating gene expression programs and how their dysregulation contributes to disease.
- Describe the functionality and workflow of LaGrACE, including how it learns gene programs from gene expression data and identifies dysregulated patterns.
Speaker
- Panayiotis Benos, PhD (University of Florida)
Genotype and phenotype risk score analyses of genetically admixed multiple sclerosis patients in All of Us
Multiple sclerosis (MS) is a demyelinating disease influenced by genetic and environmental risk factors. Current research indicates improved patients’ long-term health outcomes are associated with earlier diagnosis and treatment initiation. We developed a well-performing risk score model for MS based on genetic burden alone, and demonstrate the utility of phenotype-based risk scoring. Combination genotype-phenotype risk models have potential to aid in early screening and diagnosis of MS.
Learning Objectives
- Identify various types of risk scores that can be derived from electronic health records (EHRs) to assess and predict complex diseases.
Speaker
- Mary Davis, PhD (Brigham Young University)
AI-driven model to bridge pathology image and transcriptomics
Computational pathology has emerged as a powerful tool for revolutionizing routine pathology through AI-driven analysis of pathology images. Recent advancements in omics technologies, such as spatial transcriptomics, have further enriched the field by providing detailed transcriptomic information alongside tissue histology. However, existing sequencing platforms lack the ability to effectively harness the synergies between tissue images and genomic data. To address this gap, we develop Thor, an AI-based infrastructure for seamless integration of histological and genomic analysis of tissues. Thor infers single-cell resolution spatial transcriptome through an anti-shrinking Markov diffusion method. Its effectiveness and versatility were validated through simulations, diverse datasets, and compelling case studies involving human carcinoma and heart failure samples. Thor enabled unbiased screening of breast cancer hallmarks and identification of fibrotic regions in myocardial infarction tissue. With an extensible framework for genomic and tissue image analysis accessible through an interactive web platform, Thor empowers researchers to understand biological structures and decipher disease pathogenesis, paving the way for significant advancements in research and clinical applications. Our code is available at: GitHub Repository.
Learning Objectives
- Explain how computational pathology and AI-driven analysis of pathology images enhance routine pathology and facilitate integration with omics data.
- Describe the functionality of Thor, including its use of an anti-shrinking Markov diffusion method to infer single-cell resolution spatial transcriptomes.
- Demonstrate how Thor’s interactive web platform integrates histological and genomic analysis to empower researchers in understanding biological structures and disease pathogenesis.
Speaker
- Guangyu Wang, PhDS (Houston Methodist)
Clinical and Genomic Insights into Immune-Related Adverse Events
Immune checkpoint inhibitors (ICIs) have revolutionized cancer therapy by enhancing the immune system’s ability to target tumor cells, significantly improving survival outcomes in various cancers. However, ICIs are frequently associated with immune-related adverse events (irAEs), including acute kidney injury (ICI-AKI), which complicate patient management. Using data from the OneFlorida+ Clinical Research Network and the All of Us (AoU) cohort, this study identifies clinical and genetic risk factors for these adverse events. In the OneFlorida+ cohort of 6,526 ICI-treated patients, 56.2% developed irAEs, with younger patients, females, and those with comorbidities (e.g., myocardial infarction and renal disease) being at higher risk. Cancer type and treatment regimens also influenced irAE risk, with combined CTLA4+PD(L)1 inhibitors increasing the risk by 35%. Severe irAEs significantly impacted overall survival and the timing of irAE onset. The genetic analysis of 414 ICI-treated patients from the AoU cohort identified the rs16957301 variant in the PCCA gene as a significant risk marker for ICI-AKI in Caucasians. Patients with the risk genotypes (TC/CC) developed AKI significantly earlier (median: 3.6 months) than those with the reference genotype (TT, median: 7.0 months). The variant’s specificity to ICI-treated patients highlights its potential utility in personalized risk assessment. These findings emphasize the importance of integrating clinical and genomic insights to optimize ICI therapy. Identifying high-risk patients through genetic screening and tailored management strategies could mitigate adverse events and improve patient outcomes. Future research should validate these findings in diverse populations and explore underlying biological mechanisms.
Learning Objectives
- Identify clinical and genetic risk factors for irAEs and ICI-AKI, including demographic characteristics, comorbidities, cancer types, treatment regimens, and the rs16957301 variant in the PCCA gene.
- Analyze the influence of severe irAEs on overall survival and the timing of irAE onset in ICI-treated patients, based on data from the OneFlorida+ and All of Us (AoU) cohorts.
- Discuss the implications of integrating clinical and genomic insights to optimize ICI management and propose future research directions to validate findings and explore underlying biological mechanisms.
Speaker
- Qianqian Song, Ph.D. (University of Florida)
SmartState: An Automated Research Protocol Adherence System
Developing and enforcing study protocols is crucial in medical research, especially as interactions with participants become more intricate. Traditional rules-based systems struggle to provide the automation and flexibility required for real-time, personalized data collection. We introduce SmartState, a state-based system designed to act as a personal agent for each participant, continuously managing and tracking their unique interactions. Unlike traditional reporting systems, SmartState enables real-time, automated data collection with minimal oversight. By integrating large language models to distill conversations into structured data, SmartState reduces errors and safeguards data integrity through built-in protocol and participant auditing. We demonstrate its utility in research trials involving time-dependent participant interactions, addressing the increasing need for reliable automation in complex clinical studies.
Learning Objectives
- Understand how SmartState can be applied to their research
- Describe the benefits of state machines in improving research study compliance and verification
- Utilize large language models to interpret and extract conversational intent
Speaker
- Samuel Armstrong, MS (University of Kentucky)
Human-Centered Design of the Vanderbilt Algorithmovigilance Monitoring and Operations System
As AI adoption in healthcare grows, there is an increasing need for continuous monitoring after implementation, known as algorithmovigilance. While existing tools provide some support, few systems enable comprehensive proactive oversight and governance of AI across a healthcare system. This study outlines the human-centered design process used to develop the Vanderbilt Algorithmovigilance Monitoring and Operations System (VAMOS). We describe key insights and design recommendations to guide the development of robust algorithmovigilance tools for healthcare institutions.
Learning Objectives
- Describe key features and end-user needs for AI monitoring systems.
Speaker
- Megan Salwei, PhD (Vanderbilt University Medical Center)
From Scanner to Science: Reusing Clinically Acquired Medical Images for Research
Growth in the field of medical imaging research has revealed a need for larger volume and variety in available data. This need could be met using curated clinically acquired data, but the process for getting this data from the scanners to the scientists is complex and lengthy. We present a manifest-driven modular Extract, Transform, and Load (ETL) process named Locutus designed to appropriately handle difficulties present in the process of reusing clinically acquired medical imaging data. Based on four foundational assumptions about medical data, research data, and communication, Locutus presents a five-phase workflow for downloading, de-identifying, and delivering unique requests for imaging data. To date, this workflow has been used to process over 27,000 imaging accessions for research use. This number is expected to grow as technical challenges are addressed and the role of humans is expected to shift from frequent intervention to regular monitoring.
Learning Objectives
- Describe the challenges involved in reusing clinically acquired medical imaging data for research purposes.
- Explain the modular Extract, Transform, and Load (ETL) process used by Locutus to facilitate the secure transfer of imaging data from scanners to scientists.
- Outline the five-phase workflow of Locutus for downloading, de-identifying, and delivering imaging data.
Speaker
- Remo M. S. Williams, MS (Children's Hospital of Philadelphia)
Empowering Precision Medicine for Rare Diseases through Cloud Infrastructure Refactoring
Rare diseases affect approximately 1 in 11 Americans, yet their diagnosis remains challenging due to limited clinical evidence, low awareness, and lack of definitive treatments. Our project aims to accelerate rare disease diagnosis by developing a comprehensive informatics framework leveraging data mining, semantic web technologies, deep learning, and graph-based embedding techniques. However, our on-premises computational infrastructure faces significant challenges in scalability, maintenance, and collaboration. This study focuses on developing and evaluating a cloud-based computing infrastructure to address these challenges. By migrating to a scalable, secure, and collaborative cloud environment, we aim to enhance data integration, support advanced predictive modeling for differential diagnoses, and facilitate widespread dissemination of research findings to stakeholders, the research community, and the public and also proposed a facilitated through a reliable, standardized workflow designed to ensure minimal disruption and maintain data integrity for existing research project.
Learning Objectives
- Explain the challenges in diagnosing rare diseases and the role of informatics in improving diagnosis.
- Describe how cloud-based computing enhances scalability, security, and collaboration in rare disease research.
Speaker
- Hui Li, Phd (University of Texas Health Science Center at Houston)
Data Governance for a Novel Pet-Patient Data Registry
Significant opportunities for understanding disease co-occurrence across species in coincident households remain untapped. We determined the feasibility of creating a pet-patient registry for analysis of health data from UCHealth patients and their pets who received care at the geographically-adjacent Veterinary Teaching Hospital (CSU-VTH). 12,115 matches were identified, indicating 29% of CSU-VTH clients or a household member were UCHealth patients. Given the favorable linkage results, we describe data governance considerations for establishing secure pet-patient registries.
Learning Objectives
- Determine key data governance considerations necessary for establishing and maintaining secure pet-patient registries.
- Identify the components of a data registry team and a governance team for EHR linkage.
- Understand registry oversight mechanisms and appropriate data management for to ensuring the integrity, security, and accessibility of the information within a pet-patient registry.
Speaker
- Nadia Saklou, DVM, PhD (Colorado State University)
A Standardized Guideline for Assessing Extracted Electronic Health Records Cohorts: A Scoping Review
Assessing how accurately a cohort extracted from Electronic Health Records (EHR) represents the intended target population, or cohort fitness, is critical but often overlooked in secondary EHR data use. This scoping review aimed to (1) identify guidelines for assessing cohort fitness and (2) determine their thoroughness by examining whether they offer sufficient detail and computable methods for researchers. This scoping review follows the JBI guidance for scoping reviews and is refined based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for scoping reviews (PRISMA-ScR) checklists. Searches were performed in Medline, Embase, and Scopus. From 1,904 results, 30 articles and 2 additional references were reviewed. Nine articles (28.13%) include a framework for evaluating cohort fitness but only 5 (15.63%) contain sufficient details and quantitative methodologies. Overall, a more comprehensive guideline that provides best practices for measuring the cohort fitness is still needed.
Learning Objectives
- Summarize the current state of literature on guidelines for evaluating Electronic Health Record (EHR) cohort fitness.
Speaker
- Nattanit Songthangtham, PhD Health Informatics (University of Minnesota Twin Cities)
Real-World Computable Phenotypes of Patient-Reported Disability in Multiple Sclerosis
Common tools to measure multiple sclerosis (MS) disability are rarely available in the real-world clinical setting. Leveraging electronic health records (EHR) data and disability outcomes from two independent EHR-linked MS research registries, we aimed to develop test and validate computable phenotypes of patient-reported MS disability status. After multiple model iterations, random forest model containing ±6 months of codified EHR data alone reaches potentially clinically actionable accuracy and concordance index while still being the most pragmatic for clinical deployment. Our pragmatic computable phenotypes of patient-reported disability could improve MS patient monitoring at the point of care enable large-scale clinical investigations, and may have clinical applications beyond MS.
Learning Objectives
- Develop, test, and validate computable phenotypes for patient-reported MS disability status using EHR data from multi-center, clinic-based MS cohorts.
- Assess the accuracy and clinical relevance of machine learning models trained on various EHR feature sets to ensure their applicability in real-world clinical settings.
- Explore the broader application of machine learning with multi-center EHR-registry data to generate computable phenotypes with the potential for clinical deployment.
Speaker
- Wen Zhu, M.D. (University of Pittsburgh)
Harnessing Diverse Populations to Advance Multiple Sclerosis Research
To advance MS research we created a demographically diverse multiple sclerosis (MS) cohort from All of Us using an unsupervised approach (2,030 MS cases, 30% non-White). MS polygenic risk score based on existing MS genomic map predicts MS well for European ancestry but poorly for African ancestry. Known non-genetic MS risk factors (obesity, smoking, vitamin D deficiency) showed consistent association across racial/ethnic groups. This recourse helps increase knowledge of MS risk in diverse populations.
Learning Objectives
- Understand the importance of including demographically diverse populations (e.g., All of Us Research Program) in multiple sclerosis (MS) research for validating risk factors and advancing precision medicine.
- Learn the application of a novel unsupervised phenotyping method to identify MS cases more accurately than traditional rule-based methods Evaluate the effectiveness of genetic (e.g., polygenic risk score) and non-genetic (e.g., obesity, smoking, Vitamin D deficiency) predictors of MS susceptibility across different racial and ethnic groups.
Speaker
- Chen Hu, M.S. (University of Pittsburgh)
Development and Implementation of Electronic Phenotyping Algorithms for Precision Medicine: A Framework for EHR-Based Clinical Trial Recruitment
Germline genetic testing is increasingly recommended for conditions with genetic etiologies that influence medical management. However, it's underutilized due to barriers at system, patient, and clinician levels. This study will use a hybrid cluster randomized trial to test nudges, informed by behavioral economics, aimed at increasing genetic testing uptake. Rapid cycle optimization will ensure effective implementation in diverse healthcare settings.
Learning Objectives
- Recognize barriers to genetic testing at system, patient, and clinician levels. Assess behavioral nudges to increase genetic testing uptake using a randomized trial.
Speaker
- Anurag Verma (University of Pennsylvania)
A Phenotype Algorithm for Classification of Single Ventricle Physiology using Electronic Health Records
We developed a phenotyping algorithm for identifying individuals with single ventricle physiology based on data from the electronic health record. Our algorithm was developed using features extracted from a cohort of 1,020 patients with ferumoxytol-enhanced MRI scans seen at our institution. When evaluated on a separate, broader cohort of 2,500 patients with clinically-adjudicated congenital heart disease, our algorithm demonstrated an accuracy of 99.2% and sensitivity of 97.5%, exceeding the performance of existing published methods.
Learning Objectives
- Understand the challenges and complexities associated with diagnosing Single Ventricle Physiology (SVP) in congenital heart disease (CHD) patients.
- Comprehend the role of a phenotype algorithm in improving the classification and diagnosis of SVP using electronic health records (EHRs). Recognize how structured and unstructured data from EHRs can be leveraged to enhance diagnostic accuracy for rare and complex conditions like SVP.
Speaker
- Hang Xu, Ph.D (UCLA)
A Generalized Tool to Assess Algorithmic Fairness in Disease Phenotype Definitions
For evidence from observational studies to be reliable, researchers must ensure that the patient populations of interest are accurately defined. However, disease definitions can be extremely difficult to standardize and implement accurately across different datasets and study requirements. Furthermore, in this context, they must also ensure that populations are represented fairly to accurately reflect populations’ various demographic dynamics and to not overgeneralize across non-applicable populations. In this work, we present a generalized tool to assess the fairness of disease definitions by evaluating their implementation across common fairness metrics. Our approach calculates fairness metrics and provides a robust method to examine coarse and strongly intersecting populations across many characteristics. We highlight workflows when working with disease definitions, provide an example analysis using an OMOP CDM patient database, and discuss potential directions for future improvement and research.
Learning Objectives
- Understand what fairness within phenotype definitions involve, what potential fairness metrics are available, and ways to assess fairness and equity within observational health research settings.
Speaker
- Jacob Zelko, B.S. (Northeastern University Roux Institute)
Health-related quality of life (HRQol) is a crucial dimension of healthcare outcomes. Evidence from the past two decades suggests that HRQoL is associated with specialty care outcomes, mortality, service utilization, and healthcare costs. Many HRQoL measures exist, but methodological and implementation challenges impede primary care use. Multiple barriers at the patient, provider, and practice levels hinder on-the-ground use of these screeners. Practical solutions will require the co-design and implementation of processes to identify at-risk patients and relay risk information to providers in a manner that is easily accessible, interpretable, and actionable at the point of care.
Multidisciplinary teams working to integrate various perspectives have the best chance for success. Our team includes experts in medicine, dissemination and implementation, information technology, software development, statistics, and artificial intelligence, which cover a variety of perspectives. Yet, important questions remain. Meaningful dialogue with diverse audience members can generate creative and pragmatic approaches to problem-solving.
Speakers
- Glenn Kotz, MD (MidValley Family Practice)
- Rodger Kessler, PhD, ABPP (University of Colorado)
- Alex Kotz, PhD Student (CU Anschutz)
- Stephanie Grim, MS, MPH (University of Colorado School of Medicine)
The Evolve to Next-Gen Accrual to Clinical Trials (ENACT) (previously known as ACT) network was established in 2015 with funding from the NCATS. ENACT is a large federated network of EHR data repositories at 57 CTSA hubs that serves as an information superhighway for querying EHR data on >142M patients and providing data access to all CTSA hub investigators. As a substantial portion of vital information resides within clinical texts, the utilization of Natural Language Processing (NLP) techniques is critical to fully leverage EHRs for clinical and translational research. However, to date, no large EHR network has implemented NLP pipelines and systems to fully utilize the text data. The ENACT NLP working group was established with the primary goal of ensuring that NLP pipeline will be deployed network wise and NLP-derived concepts become accessible and searchable across the entire ENACT network. The working group consisted of ten participating ENACT sites, which were then split into several focus groups to pilot a few specific projects in different disease conditions. During this panel, we will introduce the current state of the ENACT NLP Working Group and share practical strategies we made and learned during the process. We will share the updates and lessons learned from three pilot projects, including housing status identification, delirium phenotype identification, and opioid disorder identification, with the AMIA community. This work will also benefit other large EHR networks, such as the PCORnet and OHDSI network, which are considering deploying NLP pipelines to unlock the potential of clinical texts.
Learning Objectives
- Describe the structure, goals, and current status of the ENACT NLP Working Group and its role in leveraging Natural Language Processing (NLP) to enhance the utility of clinical text data across the ENACT network.
- Explain the strategies and challenges associated with deploying NLP pipelines across a federated EHR network, including lessons learned from pilot projects.
- Analyze the outcomes of three pilot projects (housing status identification, delirium phenotype identification, and opioid disorder identification) and their implications for scaling NLP efforts in other large EHR networks.
- Identify opportunities for applying NLP techniques in clinical and translational research within other large-scale EHR networks, such as PCORnet and OHDSI.
Moderator
- Yanshan Wang, PhD, University of Pittsburgh
Speaker
- Yanshan Wang, PhD (University of Pittsburgh)
- Sunyang Fu, PhD, MHI (UTHealth)
- Paul Heider, PhD (Medical University of South Carolina)
- Daniel Harris (University of Kentucky)
- Michele Morris, BA (University of Pittsburgh)
The National Institutes of Health All of Us Research Program has launched an innovative pilot to inform U.S. healthcare interoperability leaders in translational science on the use of health information exchanges (HIEs) and health information networks (HINs) in participant-consented research studies. The panel will share perspectives on the technical and policy challenges of acquiring electronic health record (EHR) data for research by leveraging trust frameworks established to support other purposes. The team will compare its HIN/HIE data acquisition approach and data quality to typical data warehouse approaches. The pilot is a proximal use case to inform the use of the emerging Trusted Exchange Framework and Common Agreement (TEFCA) framework for research purposes. The panel will discuss challenges such as validation of the participant’s authorization to share EHR records, identity proofing, data restrictions in state laws, C-CDA and FHIR OMOP mapping, and exploration of novel implementations of FHIR.
Learning Objectives
- Articulate the value of health information exchange and health information networks for provisioning research participants' electronic health record data for research.
- Recognize the technical, legal, and regulatory challenges required to make the connections and acquire the data and their solutions.
- Understand the applicability of the solutions to the All of Us Research Program and how they might benefit other research studies.
Moderator
- Melissa Haendel, PhD (University of North Carolina)
Speakers
- Christopher Chute, MD DrPH (Johns Hopkins University)
- Jay Nakashima (eHealth Exchange)
- William Hogan, MD (Medical College of Wisconsin)
- Josh Lemieux, BA (OCHIN)
Health Records to Improve Opioid Use Disorder Clinical Care The purpose of the panel is to discuss and inform the researchers and practitioners about the differences between task-specific models versus general-purpose models, namely, between Named Entity Recognition (NER) models and Large Language Models (LLM). NER is an important task in natural language processing. Due to the complexity and ever-changing nature of language, recognizing a named entity requires a significant effort in model training in addition to the pre-trained models. LLMs on the other hand, do not require such effort. The panel will elucidate the differences between task-specific NER models and LLM on entity recognition tasks and showcase the comparisons between the two models in various use cases. In addition, the panel will offer tips and lessons learned from working with both task-specific NER and LLM in a large corpus of texts.
Learning Objectives
Identify key differences between Named Entity Recognition (NER) models and Large Language Models (LLMs) in training, adaptability, and performance. Analyze the effectiveness of NER models and LLMs across different use cases. Apply best practices for selecting, fine-tuning, and deploying NER models and LLMs.
Moderator
- Elise BerlinerOracle Life Sciences
Speakers
- Elise Berliner (Icahn School of Medicine at Mount Sinai)
- David Talby, Dr. (John Snow Labs)
- Hongfang Liu, PhD (University of Texas Health Science Center at Houston)
Unraveling Complex Temporal Patterns in EHRs via Robust Irregular Tensor Factorization
Electronic health records (EHRs) contain diverse patient data with varying visit frequencies, resulting in unaligned tensors in the time mode. While PARAFAC2 has been used for extracting meaningful medical concepts from EHRs, existing methods fail to capture non-linear and complex temporal patterns and struggle with missing entries. In this paper, we propose REPAR, an RNN Regularized Robust PARAFAC2 method to model complex temporal dependencies and enhance robustness in the presence of missing data. Our approach employs RNNs for temporal regularization and a low-rank constraint for robustness. We design a hybrid optimization framework that handles multiple regularizations and supports various data types. REPAR is evaluated on 3 real-world EHR datasets, demonstrating improved reconstruction and robustness under missing data. Two case studies further showcase REPAR's ability to extract meaningful dynamic phenotypes and enhance phenotype predictability from noisy temporal EHRs.
Learning Objectives
- Explain how REPAR extends the PARAFAC2 framework by incorporating recurrent neural networks (RNNs) to model temporal dependencies in electronic health record (EHR) data.
- Analyze how REPAR effectively handles missing and irregular EHR data to improve data quality and model performance.
- Evaluate the ability of REPAR to extract clinically meaningful phenotypes for patient subgrouping and disease progression analysis.
- Apply the REPAR framework to identify patient subgroups and track disease progression based on complex, time-dependent clinical data.
Speaker
- Linghui Zeng, MS (Emory University)
Systematic Exploration of Hospital Cost Variability: A Conformal Prediction-Based Outlier Detection Method for Electronic Health Records
Marked variability in inpatient hospitalization costs poses significant challenges to healthcare quality, resource allocation, and patient outcomes. Traditional methods like Diagnosis-Related Groups (DRGs) aid in cost management but lack practical solutions for enhancing hospital care value. We introduce a novel methodology for outlier detection in Electronic Health Records (EHRs) using Conformal Prediction. This approach identifies and prioritizes areas for optimizing high-value care processes. Unlike conventional predictive models that neglect uncertainty, our method employs Conformal Quantile Regression (CQR) to generate robust prediction intervals, offering a comprehensive view of cost variability. By integrating Conformal Prediction with machine learning models, healthcare professionals can more accurately pinpoint opportunities for quality and efficiency improvements. Our framework systematically evaluates unexplained hospital cost variations and generates interpretable hypotheses for refining clinical practices associated with atypical costs. This data-driven approach offers a systematic method to generate clinically sound hypotheses that may inform processes to enhance care quality and optimize resource utilization.
Learning Objectives
- Describe the challenges posed by variability in inpatient hospitalization costs and the limitations of traditional cost management methods like Diagnosis-Related Groups (DRGs).
- Analyze the use of Conformal Quantile Regression (CQR) to generate robust prediction intervals that account for uncertainty in cost variability.
Speaker
- François Grolleau, MD, PhD (Stanford Center for Biomedical Informatics Research)
powerROC: An Interactive Web Tool for Sample Size Calculation in Assessing Models' Discriminative Abilities
Rigorous external validation is crucial for assessing the generalizability of prediction models, particularly by evaluating their discrimination (AUROC) on new data. This often involves comparing a new model's AUROC to that of an established reference model. However, many studies rely on arbitrary rules of thumb for sample size calculations, often resulting in underpowered analyses and unreliable conclusions. This paper reviews crucial concepts for accurate sample size determination in AUROC-based external validation studies, making the theory and practice more accessible to researchers and clinicians. We introduce powerROC, an open-source web tool designed to simplify these calculations, enabling both the evaluation of a single model and the comparison of two models. The tool offers guidance on selecting target precision levels and employs flexible approaches, leveraging either pilot data or user-defined probability distributions. We illustrate powerROC’s utility through a case study on hospital mortality prediction using the MIMIC database.
Learning Objectives
- Understand that in healthcare, robust external validation of prediction models often has greater impact than developing new ones
Speaker
- François Grolleau, MD, PhD (Stanford Center for Biomedical Informatics Research)
Temporal Rule Mining for Enhanced Risk Pattern Extraction: A Case Study with Acute Kidney Injury
Association rule mining is a widely used data mining technique for extracting knowledge from large datasets. Its application in healthcare involves uncovering meaningful patterns within electronic health records (EHR) to inform clinical decision-making and treatment strategies. However, most association rule mining studies overlook temporal information, potentially missing valuable patterns associated with specific time periods or events. In recent years, several methods have been developed to mine temporal association rules, offering improved predictive and descriptive capabilities. We propose a multi-step rule mining framework that utilize temporal pattern mining algorithm to extract actionable and temporal risk patterns for acute kidney injury (AKI) using EHR data. Our algorithm discovered around 26K rules, with low support and high confidence, centered at 40 actionable. The derived rules have a median support of 0.057 and confidence of 0.49. We highlight selected rules, their potential etiology, and provide a network view of more specific actionable insights.
Learning Objectives
- Deploy temporal association rule mining on Electronic Health Record (EHR) data to generate actionable insights.
Speaker
- Ho Yin Chan, PhD (University of Florida)
Studying Veteran food insecurity longitudinally using electronic health record data and natural language processing
As AI adoption in healthcare grows, there is an increasing need for continuous monitoring after implementation, known as algorithmovigilance. While existing tools provide some support, few systems enable comprehensive proactive oversight and governance of AI across a healthcare system. This study outlines the human-centered design process used to develop the Vanderbilt Algorithmovigilance Monitoring and Operations System (VAMOS). We describe key insights and design recommendations to guide the development of robust algorithmovigilance tools for healthcare institutions.
Learning Objectives
- Understand the role food insecurity plays in a patient's health and the role healthcare systems play in addressing it.
- Learn how food insecurity can be measured over time using electronic health record data.
Speaker
- Alec Chapman, MS (University of Utah)
Large Language Models in Biomedical Named Entity Recognition
Large language models (LLMs), like GPT-4, have revolutionized natural language processing (NLP), demonstrating exceptional performance across various tasks. However, their effectiveness in biomedical named entity recognition (BioNER) remains limited due to the need for domain-specific knowledge. This study focuses on fine-tuning general-domain LLMs, specifically Llama-2 models, for BioNER tasks. We convert five BioNER datasets from the BLURB benchmark into an instruction-following format to optimize fine-tuning. Our approach incorporates zero-shot prompting, Chain-of-Thought (CoT) reasoning, and a perplexity-based evaluation method. We evaluate the fine-tuned Llama-2 models on the AnatEM, BioNLP11EPI, and BioNLP13GE datasets, and our method consistently outperforms baseline models such as UniNER-7B, InstructUIE-11B, and BioLinkBERT. Furthermore, larger models like Llama2-13B demonstrate superior performance compared to smaller ones, highlighting the significance of model parameters. This study underscores the potential of instruction-tuned LLMs for BioNER tasks and opens avenues for their application in other biomedical NLP tasks.
Learning Objectives
- Understand how instruction-following fine-tuning enhances generative large language models (LLMs) for biomedical named entity recognition (BioNER) by simulating real-world scenarios through question-answering.
- Explore how advanced prompting strategies—such as zero-shot prompting and Chain-of-Thought reasoning—can improve model performance, and how perplexity-based evaluation can be used for model selection in BioNER tasks.
- Discuss the benefits of fine-tuning LLMs using instruction-following data as a promising approach to improving BioNER performance.
Speaker
- Cong Sun, Ph.D. (Weill Cornell Medicine)
Identifying Necrotizing Enterocolitis Diagnosis from Progress Notes Using Natural Language Processing and Classification Models
Necrotizing Enterocolitis (NEC) is a serious neonatal condition with high mortality and morbidity. This study utilized NLP to analyze progress notes, enhancing NEC patient identification accuracy and specificity. The method surpasses manual chart review and traditional cohort discovery approaches. Improving precision in patient classification significantly reduces the reliance on labor-intensive reviews, offering a scalable solution for NEC identification.
Learning Objectives
- Improve cohort discovery using clinical notes for the neonatal population.
Speaker
- Woo Yeon Park, MS (Johns Hopkins University)
A Multipronged Approach: Harnessing LLMs and NLP on Structured and Unstructured Data to Enhance Traditional Chart Review
Accurate and efficient chart review is crucial for extracting clinically relevant information. It is performed for several purposes from validation studies to care assessments. The manual review process is time consuming, costly, and prone to human error. Using AI by leveraging LLMs combined with practical NLP, we can enhance the chart review process in a meaningful way. At MGB, we developed a flexible “reasoning chain pipeline” using LLMs and NLP to improve specificity and sensitivity.
Learning Objectives
- Describe the limitations of traditional manual chart review and explain how AI technologies, including LLMs and NLP, can address these challenges.
- Demonstrate an understanding of the “reasoning chain pipeline” approach developed at MGB and evaluate its impact on the specificity and sensitivity of clinical data extraction.
Speaker
- Nich Wattanasin, MS (Mass General Brigham)
A Comparison of Rule-based, Machine Learning, and Large Language Model Methods for Extracting Adverse Events from Clinical Notes
Adverse event detection is a necessary component of clinical trial data collection and currently requires massive expenditure of effort in the form of manual chart review. NLP techniques can automate this effort, but their performance is uncertain within the context of clinical trial replicability. We developed a rule-based AE detection approach and evaluated it alongside an LLM and a previously piloted best-of-breed technique in notes for patients with mantle cell lymphoma.
Learning Objectives
- Understand different methods for evaluating and extracting adverse events from clinical trial notes.
Speaker
- Aashri Aggarwal, BA (Weill Cornell Medicine)
PCORnet® Studies: After 10 Years, What is Fit-for-Purpose Informatics? Since 2014, the Patient Centered Outcomes Research Institute (PCORI) has invested over $900 million in the infrastructure of PCORnet®, designed to empower people to make informed health care decisions by enabling clinical research that is faster, easier, and, most importantly, more relevant to their needs. PCORnet is a large, highly representative, national “network of networks” that collects data routinely gathered in a variety of health care settings, including hospitals, doctors’ offices, and community clinics. These electronic health record data are standardized against the PCORnet Common Data Model (CDM) and highly curated. This presentation will highlight ongoing research projects and programs to demonstrate the informatics needs of observational epidemiology, national-scale surveillance, prospective trial research programs, and the perspectives of patient and clinician collaborators. This presentation will showcase research programs aiming to utilize real-world data and underscore the importance of collaborations between clinical researchers, public health professionals, informaticists, clinicians, and patients in improving the future of embedded research.
Learning Objectives
- Identify key informatics challenges and solutions in observational epidemiology, national-scale surveillance, and prospective trial research programs.
- Evaluate the impact of collaborative approaches among clinical researchers, public health professionals, informaticists, clinicians, and patients on the success of research programs.
- Discuss ongoing research initiatives that address informatics needs and highlight the integration of patient and clinician perspectives to improve research relevance and effectiveness.
Moderator
- Greg Merritt, PhD (PatientisPartner)
Speakers
- Emily O'Brien, PhD (Duke University)
- Jason Block, MD, MPH (Harvard Medical School)
- Schuyler Jones, MD (Duke Clinical Research Institute)
- Kathleen McTigue, MD, MPH (University of Pittsburgh)
- Jason Block (PCORI)
The Role of AI in Policy Design: A Case Study on Social Determinants of Health
Although recent studies have identified how social determinants of health (SDoH) barriers1 co-occur to form high risk subtypes, it is unclear how they can be translated into healthcare policy. Here we conduct a case study to explore with a panel of policy experts, how evidenced-based research on SDoH can be translated into healthcare policies, and the properties of artificial intelligence (AI) methods that facilitate such a translation. This understanding could help to bridge the current gap between data scientists knowledgeable about the rationality underlying the scientific process but with little knowledge of policy making, and conversely policy analysts well-versed in the rationality underlying the policy making process but with little knowledge of AI methods. Such a nexus of AI and policy could help to accelerate the translation of evidence-based research into policies with broad impact to patient care.
Learning Objectives
- Explain the challenges in translating evidence-based research on Social Determinants of Health (SDoH) into healthcare policies.
- Discuss strategies for leveraging AI and interdisciplinary collaboration to accelerate policy implementation that improves patient care and health equity.
Speaker
- Suresh Bhavnani, PhD (University of Texas Medical Branch)
Subtyping Social Determinants of Health in Cancer: Implications for Precision Healthcare Policies
Although mortality rates for many cancers have declined over the last 20 years, large disparities in cancer-related outcomes persist among subpopulations. Numerous studies in cancer have identified strong associations between specific social determinants of health (SDoH) such as income insecurity, and outcomes such as significantly lower rates of breast screening. However, most people experience multiple SDoH concurrently in their daily lives. For example, limited access to education, unstable employment, and lack of insurance tend to frequently co-occur leading to adverse outcomes such as delayed medical care and depression. Here we analyze how SDoH co-occur across all participants in the All of Us program with a cancer diagnosis, and its implications for designing precision policies to enable more targeted allocation of resources.
Learning Objectives
- Analyze how Social Determinants of Health (SDoH) co-occur to form distinct subtypes in the All of Us data.
- Estimate the risk associated with different SDoH subtypes and their impact on health outcomes.
- Evaluate the implications of identified SDoH subtypes for informing and shaping health equity policies.
Speaker
- Suresh Bhavnani, PhD (University of Texas Medical Branch)
SEDoH Information Extraction using Large Language Models
In this study we evaluated the ability of ChatGPT-4o-mini to extract three social and environmental determinants of health (SEDoH) indicators (housing stability, substance use and socio-economic status) from clinical notes compared to a manually annotated reference standard, showing extraction with moderate accuracy, precision and recall. The model exhibited a moderate performance in identifying “socio-economic status” highlighting its potential for use in standardizing and integrating SEDoH data into healthcare systems.
Learning Objectives
- Describe the application of Large Language Models (LLMs) in extracting Social and Environmental Determinants of Health (SEDoH) from unstructured clinical notes.
- Evaluate the benefits, challenges, and limitations of using LLMs for identifying SEDoH in clinical data.
- Analyze the potential future applications of LLMs in improving patient-level risk prediction and monitoring outcomes across diverse subgroups.
Speaker
- David Davila-Garcia, BS (Columbia University Department of Biomedical Informatics)
Investigating the Impact of Social Determinants of Health on Diagnostic Delays and Access to Antifibrotic Treatment in Idiopathic Pulmonary Fibrosis
Idiopathic pulmonary fibrosis (IPF) is a rare disease that is challenging to diagnose. Patients with IPF often spend years awaiting a diagnosis after the onset of initial respiratory symptoms, and only a small percentage receive antifibrotic treatment. In this study, we examine the associations between social determinants of health (SDoH) and two critical factors: time to IPF diagnosis following the onset of initial respiratory symptoms, and whether the patient receives antifibrotic treatment. To approximate individual SDoH characteristics, we extract demographic-specific averages from zip code-level data using the American Community Survey (via the U.S. Census Bureau API). Two classification models are constructed, including logistic regression and XGBoost classification. The results indicate that for time-to-diagnosis, the top three SDoH factors are education, gender, and insurance coverage. Patients with higher education levels and better insurance are more likely to receive a quicker diagnosis, with males having an advantage over females. For antifibrotic treatment, the top three SDoH factors are insurance, gender, and race. Patients with better insurance coverage are more likely to receive antifibrotic treatment, with males and White patients having an advantage over females and patients of other ethnicities. This research may help address disparities in the diagnosis and treatment of IPF related to socioeconomic status.
Learning Objectives
- Identify the types of measurements used to assess the association between Social Determinants of Health (SDoH) and clinical outcomes.
Speaker
- Rui, Li, PhD (UT health)
Enhancing Cross-Domain Generalizability in Social Determinants of Health Extraction with Prompt-Tuning Large Language Models
The progress in natural language processing (NLP) using large language models (LLMs) has greatly improved patient information extraction from clinical narratives. However, most methods based on the fine-tuning strategy have limited transfer learning ability for cross-domain applications. This study proposed a novel approach that employs a soft prompt-based learning architecture, which introduces trainable prompts to guide LLMs toward desired outputs. We examined two types of LLM architectures, including encoder-only GatorTron and decoder-only GatorTronGPT, and evaluated their performance for the extraction of social determinants of health (SDoH) using a cross-institution dataset from the 2022 n2c2 challenge and a cross-disease dataset from the University of Florida (UF) Health. The results show that decoder-only LLMs with prompt tuning achieved better performance in cross-domain applications. GatorTronGPT achieved the best F1 scores for both datasets, outperforming traditional fine-tuned GatorTron by 8.9% and 21.8% in a cross-institution setting, and 5.5% and 14.5% in a cross-disease setting.
Learning Objectives
- Describe the limitations of fine-tuning strategies in cross-domain applications of large language models (LLMs) for extracting patient information from clinical narratives.
- Evaluate the effectiveness of prompt-tuned GatorTronGPT in improving cross-domain performance over traditional fine-tuned models.
Speaker
- Cheng Peng, PhD (University of Florida)
Description and Real-World Outcomes of a Centralized Technology-based Solution to Improve Geospatial Data Capture and Enterprise Resiliency During Extreme Weather Events
We describe key components of an informatics-enabled, geospatially enriched framework to support operational resiliency and preserve continuity of care for a large, integrated healthcare enterprise. Real-world outcomes from Hurricane Beryl highlight accelerated hyperlocal response enabled by precise geographic identifiers to inform targeted actions and efficient resource distribution, including localized risk assessment, targeted emergency alerts, granular damage assessment, streamlined communication with local partners, and data-informed response and recovery plans. Key competencies required to execute on this framework include a rich data foundation with interoperability; advanced analytics; connectivity in the healthcare ecosystem, including a nationwide community footprint; benefit design; and subject matter expertise
Learning Objectives
- Describe the design and implementation of an informatics-enabled framework with enhanced geospatial capabilities for environmental preparedness.
- Explain how the framework improves enterprise resiliency in response to extreme weather events.
- Communicate the outcomes and potential impact of using geospatial informatics to enhance environmental preparedness and disaster response efforts.
Speaker
- Sean Horman, MPA (CVS Health)
Clinical phenotyping, which leverages real-world data primarily from electronic health records, is critical to clinical and translational science, clinical and quality registries, and improving direct patient care. While automated digital and electronic phenotyping methods have made significant advances and hold great promise, multiple gaps remain, including: reproducibility, clinical accuracy, and integration for knowledge generation and delivery of patient care. This panel will discuss these challenges and describe future trends for phenotyping along these important axes. Four expert panelists from leading health systems will provide information on the current state and describe their vision for the future of real-world phenotyping. Facilitated by a seasoned moderator, the panel and the audience will engage in a conversation on factors that influence their perspectives, the impacts of new phenotyping advancements and technologies, and what is needed to achieve next-generation capabilities. Discussants will review tensions between current and future approaches to phenotyping for real-world applications and the implications for informatics research, clinical quality, and value-based care. With a focus on multi-institutional efforts, panelists will describe potential barriers and facilitators to delivering next generation clinical phenotyping resources.
Learning Objectives
- Explain the role of clinical phenotyping in clinical and translational science, quality registries, and improving patient care by leveraging real-world data from electronic health records (EHRs).
- Identify key gaps in current automated phenotyping methods, including challenges related to reproducibility, clinical accuracy, and integration into clinical workflows.
- Describe emerging trends and future directions in real-world phenotyping, as outlined by experts from leading health systems.
- Analyze the implications of evolving phenotyping technologies for informatics research, clinical quality improvement, and value-based care models.
- Evaluate potential barriers and facilitators to implementing next-generation clinical phenotyping approaches in multi-institutional settings.
Moderator
- Genevieve Melton-Meaux, MD, PhD (University of Minnesota)
Speakers
- David Vawdrey, PhD (Geisinger)
- Marisa Conte, MLIS (University of Michigan)
- Rachel Richesson, PhD, MPH, FACMI (University of Michigan Medical School)
Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models
Predicting phenotypes with complex genetic bases based on a small, interpretable set of variant features remains a challenging task. Conventionally, data-driven approaches are utilized for this task, yet the high dimensional nature of genotype data make the analysis and prediction difficult. Motivated by the extensive knowledge encoded in pre-trained LLMs and their success in processing complex biomedical concepts, we set to examine the ability of LLMs in feature selection and engineering for tabular genotype data, with a novel knowledge-driven framework. We develop FREEFORM, Free-flow Reasoning and Ensembling for Enhanced Feature Output and Robust Modeling, designed with chain-of-thought and ensembling principles, to select and engineer features with the intrinsic knowledge of LLMs. Evaluated on two distinct genotype-phenotype datasets, genetic ancestry and hereditary hearing loss, we find this framework outperforms several data-driven methods, particularly on low-shot regimes. FREEFORM is available as open-source framework at GitHub: https://github.com/PennShenLab/FREEFORM
Learning Objectives
- Analyze the challenges associated with applying data-driven approaches to high-dimensional genotype data.
- Evaluate the effectiveness of advanced feature selection and engineering techniques informed by the latest developments in large language models (LLMs).
- Compare conventional data-driven methods with LLM-based knowledge-driven approaches for reducing genetic features and mitigating overfitting.
- Apply a novel knowledge-driven framework that leverages chain-of-thought reasoning and ensembling principles to enhance genetic feature selection and improve phenotype prediction with limited data.
Speaker
- Joseph Lee, Bachelor's of Science in Networked and Social Systems Engineering (University of Pennsylvania)
Inter-tissue coordination patterns of metabolic transcriptomes
Understanding inter-organ communication in the entire body is crucial for comprehending health and disease. We present a computational approach that allows to define inter-tissue communication and a general coordination pattern of metabolic transcriptomes at a whole-body scale, applied to 19 human tissues and validated using external datasets. We reveal known and novel inter-tissue metabolic links and a significant global coregulation pattern. Our framework may apply to other types of transcriptomes and used to detect changes across different conditions.
Learning Objectives
- Understand that metabolic transcriptomes are positively coordinated and form a significantly large community and are highly connected.
Speaker
- Judith Somekh, PhD (University of University of Haifa)
Evolution of Genomic Indicators for Pharmacogenomics: Retrospective Analysis and Implications for Knowledge Management
Pharmacogenomics (PGx) incorporates patient genetic data into pharmacotherapy guidelines to improve patient outcomes. Clinical decision support (CDS) systems rely on underlying knowledge bases, information models, and encoded rule logic to implement clinical guidelines. However, changes in PGx knowledge and result reporting standards necessitate continual maintenance of CDS rule logic and data reporting in electronic health records (EHRs). We reviewed over 12-years of PGx CDS implementation at Mayo Clinic, identifying three different methods of recording patient PGx data in multiple EHRs. Prior to enterprise-wide EHR convergence, each Mayo Clinic site followed task force developed gene-drug guidelines to develop rules for annotating gene-phenotype data within patient allergy and problem lists. These annotations frequently lacked discrete genotype or provenance data, precluding detailed tracking of changes in each system. After EHR convergence, all Mayo Clinic sites used Genomic Indicator (GI) profiles (N=158) within an EHR module specifically designed to capture gene-phenotype information. Several post-implementation modification events incorporated new PGx knowledge, including adding new gene-drug indicator sets, updating genotype-phenotype specifications, and assigning haplotype enzyme activity score data for quantitative phenotypes. The incorporation of phenotype results from a large multi-gene panel resulted in the creation of 29 test-specific indicators,12 of which were later removed or merged with previously established GIs due to the use of non-standardized nomenclature and classifications. Our results demonstrate limitations of using pre-coordinated terms for complex and evolving knowledge and suggest the need for a robust knowledge model and standardized nomenclature to provide adequate data provenance and support genomic medicine at scale.
Learning Objectives
- Describe the role of pharmacogenomics (PGx) in integrating patient genetic data into pharmacotherapy guidelines to improve patient outcomes.
- Describe 3 types of events that can impact the design of genomic indicators.
- Identify 3-5 design decisions to consider when creating genomic indicators, which may result in more stable implementations.
Speaker
- Robert Freimuth, PhD (Mayo Clinic)
Not the Models You Are Looking For: An Evaluation of Performance, Privacy, and Fairness of LLMs in EHR Tasks
We use a private dataset derived from Vanderbilt University Medical Center’s EHR and GPT-3.5, GPT-4, and traditional ML, measuring predictive performance, output calibration, privacy-utility tradeoff, and algorithmic fairness. Traditional ML vastly outperformed GPT-3.5 and GPT-4 with respect to predictive performance and output probability classification. We find that traditional ML is much more robust to efforts to generalize demographic information compared to GPT-3.5 and GPT-4. Surprisingly, GPT-4 is the fairest model according to our selected metrics. These findings imply additional research into LLMs is necessary before deploying as clinical prediction models.
Learning Objectives
- Identify the concerns associated with using large language models (LLMs) as clinical prediction models and demonstrate how to assess their performance and reliability in clinical applications.
Speakers
- Katherine Brown, PhD (Vanderbilt University Medical Center)
Leveraging Open-Source Large-Language Model-Enabled Identification of Undiagnosed Patients with Rare Genetic Aortopathies
Hereditary aortopathies are often underdiagnosed, with many patients not receiving genetic testing until after a cardiac event. In this pilot study, we investigate the use of open-source LLMs for recommending genetic testing based on clinical notes. We evaluate the utility of injecting disease-specific knowledge into retrieval augmentation generation-based and finetuned models. Our result of 93% accuracy using a base model alone surprisingly suggests that incorporating domain knowledge may sometimes hinder clinical model performance.
Learning Objectives
- Identify the challenges involved in diagnosing rare genetic diseases.
- Explain how large language models (LLMs) can be used to screen patients for genetic aortopathies.
- Apply an open-source LLM-enabled recommender pipeline to make patient-level diagnostic predictions using clinical notes.
Speakers
- Zilinghan Li, Master of Science (Argonne National Laboratory)
Identifying Opioid Overdose and Opioid Use Disorder and Related Information from Clinical Narratives Using Large Language Models
In this study we evaluated the ability of ChatGPT-4o-mini to extract three social and environmental determinants of health (SEDoH) indicators (housing stability, substance use and socio-economic status) from clinical notes compared to a manually annotated reference standard, showing extraction with moderate accuracy, precision and recall. The model exhibited a moderate performance in identifying “socio-economic status” highlighting its potential for use in standardizing and integrating SEDoH data into healthcare systems.
Learning Objectives
- Understanding the critical public health implications of opioid overdose and opioid use disorder (OUD) in the United States.
- Identify opioid overdose, problematic opioid use, and other related concepts to facilitate studies counter opioid crisis.
- Apply encoder-only large language models (LLMs) and decoder-based generative LLMs to extract opioid related information from clinical notes and understand the strengths and weaknesses of the two types of LLMs.
- Understand the cost-effective p-tuning algorithm to adopt encoder-based and decoder-based LLMs for patient information extraction.
Speakers
- Daniel Paredes, MS (University of Florida)
Exploring ChatGPT 3.5 for structured data extraction from oncological notes
In large-scale clinical informatics, there is a need to maximize the amount of usable data from electronic health records. With the adoption of large language models in HIPAA secure environments, there is potential to use them to extract structured data from unstructured clinical notes. We explored how ChatGPT 3.5 could be used to supplement data in cancer research. We assessed how GPT used clinical notes to answer six relevant clinical questions. Four prompt engineering strategies were used: zero-shot, zero-shot with context, few-shot, and few-shot with context. Few-shot prompting often decreased the accuracy of GPT outputs and context did not consistently improve accuracy. GPT extracted patients’ Gleason scores and ages with an F1 score of 0.99 and it identified if patients received palliative care with and if patients were in pain with an F1 score of 0.86. This has potential to increase interoperability between healthcare and clinical research.
Learning Objectives
- Recognize the potential and limitations of using ChatGPT to extract structured data from unstructured clinical notes.
Speakers
- Ty Skyles, BS candidate (Brigham Young University)
Enhancing Disease Detection in Radiology Reports Through Fine-tuning Lightweight LLM on Weak Labels
Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled datasets. In this work, we investigated the potential of improving a lightweight LLM, such as Llama 3.1-8B, through fine-tuning with datasets using synthetic labels. Two tasks are jointly trained by combining their respective instruction datasets. When the quality of the task-specific synthetic labels is relatively high (e.g., generated by GPT4-o), Llama 3.1-8B achieves satisfactory performance on the open-ended disease detection task, with a micro F1 score of 0.91. Conversely, when the quality of the task-relevant synthetic labels is relatively low (e.g., from the MIMIC-CXR dataset), fine-tuned Llama 3.1-8B is able to surpass its noisy teacher labels (micro F1 score of 0.67 v.s. 0.63) when calibrated against curated labels, indicating the strong inherent underlying capability of the model. These findings demonstrate the potential of fine-tuning LLMs with synthetic labels, offering a promising direction for future research on LLM specialization in the medical domain.
Learning Objectives
- Identify key limitations preventing the practical application of large language models (LLMs) in the medical domain, including model size constraints and lack of cohort-specific labeled datasets.
- Describe the process of fine-tuning lightweight LLMs, such as Llama 3.1-8B, using datasets with synthetic labels to improve task-specific performance.
- Evaluate the impact of synthetic label quality on model performance, comparing outcomes from high-quality sources (e.g., GPT-4o) and lower-quality datasets (e.g., MIMIC-CXR).
- Analyze the ability of fine-tuned Llama 3.1-8B to surpass noisy teacher labels, demonstrating the model’s inherent capability in disease detection and related tasks.
Speakers
- Yishu Wei, PhD (Department of Population Health Sciences, Weill Cornell Medicine)
Predicting Antibiotic Resistance Patterns Using Sentence-BERT: A Machine Learning Approach
Antibiotic resistance poses a significant threat in in-patient settings with high mortality. Using MIMIC-III data, we generated Sentence-BERT embeddings from clinical notes and applied Neural Networks and XGBoost to predict antibiotic susceptibility. XGBoost achieved an average F1 score of 0.86, while Neural Networks scored 0.84. This study is among the first to use document embeddings for predicting antibiotic resistance, offering a novel pathway for improving antimicrobial stewardship.
Learning Objectives
- Explain how Sentence-BERT embeddings and machine learning models, including neural networks and XGBoost, can be applied to predict antibiotic resistance patterns using clinical documentation and microbiology data.
Speakers
- Mahmoud Alwakeel, MD (Mahmoud Alwakeel, MD)