Translational Bioinformatics

Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models

Predicting phenotypes with complex genetic bases based on a small, interpretable set of variant features remains a challenging task. Conventionally, data-driven approaches are utilized for this task, yet the high dimensional nature of genotype data make the analysis and prediction difficult. Motivated by the extensive knowledge encoded in pre-trained LLMs and their success in processing complex biomedical concepts, we set to examine the ability of LLMs in feature selection and engineering for tabular genotype data, with a novel knowledge-driven framework. We develop FREEFORM, Free-flow Reasoning and Ensembling for Enhanced Feature Output and Robust Modeling, designed with chain-of-thought and ensembling principles, to select and engineer features with the intrinsic knowledge of LLMs. Evaluated on two distinct genotype-phenotype datasets, genetic ancestry and hereditary hearing loss, we find this framework outperforms several data-driven methods, particularly on low-shot regimes. FREEFORM is available as open-source framework at GitHub: https://github.com/PennShenLab/FREEFORM

Learning Objectives

Analyze the challenges associated with applying data-driven approaches to high-dimensional genotype data.
Evaluate the effectiveness of advanced feature selection and engineering techniques informed by the latest developments in large language models (LLMs).
Compare conventional data-driven methods with LLM-based knowledge-driven approaches for reducing genetic features and mitigating overfitting.
Apply a novel knowledge-driven framework that leverages chain-of-thought reasoning and ensembling principles to enhance genetic feature selection and improve phenotype prediction with limited data.

Speaker

Joseph Lee, Bachelor's of Science in Networked and Social Systems Engineering (University of Pennsylvania)

Inter-tissue coordination patterns of metabolic transcriptomes

Understanding inter-organ communication in the entire body is crucial for comprehending health and disease. We present a computational approach that allows to define inter-tissue communication and a general coordination pattern of metabolic transcriptomes at a whole-body scale, applied to 19 human tissues and validated using external datasets. We reveal known and novel inter-tissue metabolic links and a significant global coregulation pattern. Our framework may apply to other types of transcriptomes and used to detect changes across different conditions.

Learning Objectives

Understand that metabolic transcriptomes are positively coordinated and form a significantly large community and are highly connected.

Speaker

Judith Somekh, PhD (University of University of Haifa)

Evolution of Genomic Indicators for Pharmacogenomics: Retrospective Analysis and Implications for Knowledge Management

Pharmacogenomics (PGx) incorporates patient genetic data into pharmacotherapy guidelines to improve patient outcomes. Clinical decision support (CDS) systems rely on underlying knowledge bases, information models, and encoded rule logic to implement clinical guidelines. However, changes in PGx knowledge and result reporting standards necessitate continual maintenance of CDS rule logic and data reporting in electronic health records (EHRs). We reviewed over 12-years of PGx CDS implementation at Mayo Clinic, identifying three different methods of recording patient PGx data in multiple EHRs. Prior to enterprise-wide EHR convergence, each Mayo Clinic site followed task force developed gene-drug guidelines to develop rules for annotating gene-phenotype data within patient allergy and problem lists. These annotations frequently lacked discrete genotype or provenance data, precluding detailed tracking of changes in each system. After EHR convergence, all Mayo Clinic sites used Genomic Indicator (GI) profiles (N=158) within an EHR module specifically designed to capture gene-phenotype information. Several post-implementation modification events incorporated new PGx knowledge, including adding new gene-drug indicator sets, updating genotype-phenotype specifications, and assigning haplotype enzyme activity score data for quantitative phenotypes. The incorporation of phenotype results from a large multi-gene panel resulted in the creation of 29 test-specific indicators,12 of which were later removed or merged with previously established GIs due to the use of non-standardized nomenclature and classifications. Our results demonstrate limitations of using pre-coordinated terms for complex and evolving knowledge and suggest the need for a robust knowledge model and standardized nomenclature to provide adequate data provenance and support genomic medicine at scale.

Learning Objectives

Describe the role of pharmacogenomics (PGx) in integrating patient genetic data into pharmacotherapy guidelines to improve patient outcomes.
Describe 3 types of events that can impact the design of genomic indicators.
Identify 3-5 design decisions to consider when creating genomic indicators, which may result in more stable implementations.

Speaker

Robert Freimuth, PhD (Mayo Clinic)

Continuing Education Credit

Physicians

The American Medical Informatics Association is accredited by the Accreditation Council for Continuing Medical Education (ACCME) to provide continuing medical education for physicians.

The American Medical Informatics Association designates this online enduring material for 1.0 AMA PRA Category 1™ credits. Physicians should claim only the credit commensurate with the extent of their participation in the activity.

Claim credit no later than March 10, 2028 or within two years of your purchase date, whichever is sooner. No credit will be issued after March 10, 2028.

ACHIPs^TM

AMIA Health Informatics Certified Professionals^TM (ACHIPs^TM) can earn 1 professional development unit (PDU) per contact hour.

ACHIPs^TM may use CME/CNE certificates or the ACHIPs^TM Recertification Log to report 2025 Summit sessions attended for ACHIPs^TM Recertification.

Claim credit no later than within three years of the release date or within two years of your purchase date, whichever is sooner.

FAQs

Content was recorded live at AMIA's Informatics Summit March 10-13, 2025 in Pittsburgh, PA and at AMIA’s Annual Symposium event November 9-13, 2024, in San Francisco, CA.

Plan now to join us for the next Annual Symposium or Informatics Summit!

CME or CNE credit must be claimed no later than two from the release date or within one year of your purchase date, whichever is sooner. No credit will be issued that time.

Yes! AMIA On Demand is available for anyone to purchase. Become an AMIA member before you purchase to receive exclusive member discounts. Join AMIA today.

We’re glad you asked! AMIA offers a variety of membership options, all with exclusive benefits and abundant networking opportunities. Choose the membership that’s right for you.

The Audio-only format of all sessions is available free of charge exclusively to AMIA members.

Access the audio recordings now (login required):

Join us at the next AMIA event and engage with leaders from across the health informatics field.

Yes! You can claim Self-Study credit when you complete AMIA On Demand sessions, in addition to claiming Live credit for attending the live event. View the full details on self-study accreditation for this product.

Yes, The AMIA 2024 Annual Symposium On Demand Bundle (Presenter, Slides, and Audio) may be purchased for 8 educational credits using your health system’s code at checkout. Individual sessions (Presenter, Slides, and Audio) may be purchased for 1 educational credit per session using your health system’s code at checkout.

Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models

Learning Objectives

Speaker

Inter-tissue coordination patterns of metabolic transcriptomes

Learning Objectives

Speaker

Evolution of Genomic Indicators for Pharmacogenomics: Retrospective Analysis and Implications for Knowledge Management

Learning Objectives

Speaker

Continuing Education Credit

Physicians

ACHIPsTM

FAQs

When was the On Demand content recorded?

How long is CME/CNE credit available if I complete this activity?

I’m not an AMIA Member. Can I still purchase this content?

How do I become an AMIA member?

I only want to listen to the audio of these sessions. Is that format available?

Where can I learn more about AMIA's live events?

I attended an AMIA event in person. Can I still claim CME/CNE credit if I complete these online sessions?

Can On Demand Sessions be purchased with Health System Membership educational credits?

ACHIPs^TM