The detectability paradox: bilingual medical report generation with open-weight models and the limits of human oversight.
The automation of medical report generation using large language models (LLMs) could significantly reduce physicians' documentation burden while enhancing healthcare efficiency. However, the misuse of generative artificial intelligence in medical reporting can lead to important safety risks for patients. We addressed 2 questions: (1) What is the quality of medical reports generated by LLMs in English and French? and (2) Can we distinguish between human-written and LLM-generated medical reports?
Author(s): Rouhizadeh, Hossein, Sandralegar, Abiram, Yazdani, Anthony, Feng, Weibo, Schreier, Oren, Ahn-Kim, Yonnou, Sirbal, Assiya, Pirelli, Valentino, Yang, Rui, Sveikata, Lukas, Tessitore, Elena, Liu, Nan, Bijlenga, Philippe, Teodoro, Douglas
DOI: 10.1093/jamia/ocag070