Deciphering genomic codes using advanced natural language processing techniques: a scoping review.
The vast and complex nature of human genomic sequencing data presents challenges for effective analysis. This review aims to investigate the application of natural language processing (NLP) techniques, particularly large language models (LLMs) and transformer architectures, in deciphering genomic codes, focusing on tokenization, transformer models, and regulatory annotation prediction. The goal of this review is to assess data and model accessibility in the most recent literature, gaining a better understanding [...]
Author(s): Cheng, Shuyan, Wei, Yishu, Zhou, Yiliang, Xu, Zihan, Wright, Drew N, Liu, Jinze, Peng, Yifan
DOI: 10.1093/jamia/ocaf029