Skip to main content

Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models

Predicting phenotypes with complex genetic bases based on a small, interpretable set of variant features remains a challenging task. Conventionally, data-driven approaches are utilized for this task, yet the high dimensional nature of genotype data make the analysis and prediction difficult. Motivated by the extensive knowledge encoded in pre-trained LLMs and their success in processing complex biomedical concepts, we set to examine the ability of LLMs in feature selection and engineering for tabular genotype data, with a novel knowledge-driven framework. We develop FREEFORM, Free-flow Reasoning and Ensembling for Enhanced Feature Output and Robust Modeling, designed with chain-of-thought and ensembling principles, to select and engineer features with the intrinsic knowledge of LLMs. Evaluated on two distinct genotype-phenotype datasets, genetic ancestry and hereditary hearing loss, we find this framework outperforms several data-driven methods, particularly on low-shot regimes. FREEFORM is available as open-source framework at GitHub: https://github.com/PennShenLab/FREEFORM

Learning Objectives

  • Analyze the challenges associated with applying data-driven approaches to high-dimensional genotype data.
  • Evaluate the effectiveness of advanced feature selection and engineering techniques informed by the latest developments in large language models (LLMs).
  • Compare conventional data-driven methods with LLM-based knowledge-driven approaches for reducing genetic features and mitigating overfitting.
  • Apply a novel knowledge-driven framework that leverages chain-of-thought reasoning and ensembling principles to enhance genetic feature selection and improve phenotype prediction with limited data.

Speaker

  • Joseph Lee, Bachelor's of Science in Networked and Social Systems Engineering (University of Pennsylvania)

Inter-tissue coordination patterns of metabolic transcriptomes

Understanding inter-organ communication in the entire body is crucial for comprehending health and disease. We present a computational approach that allows to define inter-tissue communication and a general coordination pattern of metabolic transcriptomes at a whole-body scale, applied to 19 human tissues and validated using external datasets. We reveal known and novel inter-tissue metabolic links and a significant global coregulation pattern. Our framework may apply to other types of transcriptomes and used to detect changes across different conditions.

Learning Objectives

  • Understand that metabolic transcriptomes are positively coordinated and form a significantly large community and are highly connected.

Speaker

  • Judith Somekh, PhD (University of University of Haifa)

Evolution of Genomic Indicators for Pharmacogenomics: Retrospective Analysis and Implications for Knowledge Management

Pharmacogenomics (PGx) incorporates patient genetic data into pharmacotherapy guidelines to improve patient outcomes. Clinical decision support (CDS) systems rely on underlying knowledge bases, information models, and encoded rule logic to implement clinical guidelines. However, changes in PGx knowledge and result reporting standards necessitate continual maintenance of CDS rule logic and data reporting in electronic health records (EHRs). We reviewed over 12-years of PGx CDS implementation at Mayo Clinic, identifying three different methods of recording patient PGx data in multiple EHRs. Prior to enterprise-wide EHR convergence, each Mayo Clinic site followed task force developed gene-drug guidelines to develop rules for annotating gene-phenotype data within patient allergy and problem lists. These annotations frequently lacked discrete genotype or provenance data, precluding detailed tracking of changes in each system. After EHR convergence, all Mayo Clinic sites used Genomic Indicator (GI) profiles (N=158) within an EHR module specifically designed to capture gene-phenotype information. Several post-implementation modification events incorporated new PGx knowledge, including adding new gene-drug indicator sets, updating genotype-phenotype specifications, and assigning haplotype enzyme activity score data for quantitative phenotypes. The incorporation of phenotype results from a large multi-gene panel resulted in the creation of 29 test-specific indicators,12 of which were later removed or merged with previously established GIs due to the use of non-standardized nomenclature and classifications. Our results demonstrate limitations of using pre-coordinated terms for complex and evolving knowledge and suggest the need for a robust knowledge model and standardized nomenclature to provide adequate data provenance and support genomic medicine at scale.

Learning Objectives

  • Describe the role of pharmacogenomics (PGx) in integrating patient genetic data into pharmacotherapy guidelines to improve patient outcomes.
  • Describe 3 types of events that can impact the design of genomic indicators.
  • Identify 3-5 design decisions to consider when creating genomic indicators, which may result in more stable implementations.

Speaker

  • Robert Freimuth, PhD (Mayo Clinic)

Continuing Education Credit

Physicians

The American Medical Informatics Association is accredited by the Accreditation Council for Continuing Medical Education (ACCME) to provide continuing medical education for physicians.

The American Medical Informatics Association designates this online enduring material for 1.0 AMA PRA Category 1™ credits. Physicians should claim only the credit commensurate with the extent of their participation in the activity.

Claim credit no later than March 10, 2028 or within two years of your purchase date, whichever is sooner. No credit will be issued after March 10, 2028.

ACHIPsTM

AMIA Health Informatics Certified ProfessionalsTM (ACHIPsTM) can earn 1 professional development unit (PDU) per contact hour.

ACHIPsTM may use CME/CNE certificates or the ACHIPsTM  Recertification Log to report 2024 Symposium sessions attended for ACHIPsTM Recertification.

Claim credit no later than March 10, 2028 or within two years of your purchase date, whichever is sooner. No credit will be issued after March 10, 2028.

FAQs

All content was recorded live at AMIA’s Annual Symposium event November 9-13, 2024, in San Francisco, CA. Plan now to join us for the next Annual Symposium!

Yes! Purchase the AMIA 2024 Annual Symposium On Demand Bundle to enjoy all recorded sessions available at the best value. Get the bundle.

Purchase the AMIA 2024 Annual Symposium On Demand Bundlefor the best value on all top 20 sessions. Additional individual sessions are also available for purchase in the catalog.

Claim credit no later than January 20, 2028 or within two years of your purchase date, whichever is sooner. No credit will be issued after January 20, 2028.

Yes! AMIA 2024 Annual Symposium On Demand is available for anyone to purchase. Become an AMIA member before you purchase to receive exclusive member discounts. Join AMIA today.

We’re glad you asked! AMIA offers a variety of membership options, all with exclusive benefits and abundant networking opportunities. Choose the membership that’s right for you.

The Audio-only format of all 20 sessions is available free of charge exclusively to AMIA members. Access the AMIA 2024 Annual Symposium On Demand Audio Library. Log in required.

Join us at the next Annual Symposium and engage with leaders from across the health informatics field. Learn more.

Yes! You can claim Self-Study credit when you complete AMIA 2024 Annual Symposium On Demand sessions, in addition to claiming Live credit for attending the live event. View the full details on self-study accreditation for this product.

Yes, The AMIA 2024 Annual Symposium On Demand Bundle (Presenter, Slides, and Audio) may be purchased for 8 educational credits using your health system’s code at checkout. Individual sessions (Presenter, Slides, and Audio) may be purchased for 1 educational credit per session using your health system’s code at checkout.

Available Until:
Dates and Times:
Type: AMIA On Demand
Course Format(s): On Demand
Credits:
1.00
CME
Price: Member: $60, Nonmember: $85
Purchase now
Share