Should we synthesize more than we need: impact of synthetic data generation for high-dimensional cross-sectional medical data.
In medical research and education, generative artificial intelligence/machine learning (AI/ML) models to synthesize artificial medical data can enable the sharing of high-quality data while preserving the privacy of patients. Given that such data is often high-dimensional, a relevant consideration is whether to synthesize the entire dataset when only a task-relevant subset is needed. This study evaluates how the number of variables in training impacts fidelity, utility, and privacy of the [...]
Author(s): Pilgram, Lisa, El Kababji, Samer, Liu, Dan, El Emam, Khaled
DOI: 10.1093/jamia/ocaf169