Skip to main content

As part of the Department of Energy’s partnership with the National Cancer Institute, the Modeling Outcomes Using Surveillance Data and Scalable AI for Cancer (MOSSAIC) project aims to develop deep learning models to support near-real-time cancer surveillance at the population level. Through the NCI’s Surveillance, Epidemiology, and End Results (SEER) program, we deploy models for the automated coding of cancer cases across state and regional cancer registries throughout the United States.

A key challenge in developing AI models for clinical and health applications is the scarcity of shareable datasets, largely due to privacy concerns and regulations. While traditional privacy approaches such as redaction and de-identification are limited, synthetic data generation offers a promising alternative. This presentation evaluates the potential of generative AI with large language models to produce high-fidelity synthetic pathology reports while preserving privacy and maintaining utility.

Presenter

John Gounley
Oak Ridge National Laboratory