Language models (LMs) such as BERT and GPT have brought remarkable advancements to natural language processing (NLP). Nevertheless, privacy-sensitive domains, particularly in the medical field, confront challenges in training LMs due to limited data access and stringent privacy regulations like the Health Insurance Portability and Accountability Act (HIPPA) and the General Data Protection Regulation (GDPR).
In this context, federated learning (FL) presents a decentralized solution that enables collaborative learning while upholding data privacy. By allowing the sharing of model updates rather than raw data, healthcare providers can effectively build AI systems that leverage distributed clinical data in real-world scenarios.
The first portion of this presentation delves into the application of FL for clinical language models and highlights its potential impact on advancing AI in healthcare. In the second half, presenters will discuss their work to assess the generalizability of CancerBERT by evaluating their performances on two corpora collected from two clinical institutes through a clinical information extraction task, as comparison with other benchmark models. Notably, the CancerBERT models emerged as the most adept in terms of generalizability, as they exhibited superior capabilities in learning the contextual intricacies of target phenotypes and effectively accommodating the textual variations encountered in clinical texts.
Watch the Recording
Presenters
Le Peng is a fifth-year Ph.D. student in Computer Science and Engineering at the University of Minnesota, under the guidance of Dr. Ju Sun and Dr. Rui Zhang. His research interests encompass a wide spectrum of Machine Learning, including computer vision, NLP, and AI for healthcare.
Sicheng Zhou is a fifth-year PhD student of Health Informatics at the University of Minnesota, supervised by Dr. Rui Zhang. His research focuses on developing NLP methods for clinical text information extraction in the application of breast cancer.
Rui Zhang, PhD, FAMIA is Founding Chief of Division of Computational Health Sciences and Associate Professor in the Department of Surgery at the University of Minnesota Medical School. He is named as McKnight Presidential Fellow and the Director of NLP/IE research program. He is Scientific Co-Director of Innovative Methods & Data Science program within the Center for Learning Health Systems Science. His research interests are clinical NLP, text mining, literature-based discovery, and complementary and alternative medicine informatics. His research has been supported through multiple R01 projects funded by NCCIH, NIA, ODS, AHRQ, CISCO and Medtronic. Dr. Zhang served on multiple NIH study sections and JAMIA Editorial Board.
