1
|
Liu J, Yang M, Yu Y, Xu H, Wang T, Li K, Zhou X. Advancing bioinformatics with large language models: components, applications and perspectives. ARXIV 2025:arXiv:2401.04155v2. [PMID: 38259343 PMCID: PMC10802675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Large language models (LLMs) are a class of artificial intelligence models based on deep learning, which have great performance in various tasks, especially in natural language processing (NLP). Large language models typically consist of artificial neural networks with numerous parameters, trained on large amounts of unlabeled input using self-supervised or semi-supervised learning. However, their potential for solving bioinformatics problems may even exceed their proficiency in modeling human language. In this review, we will provide a comprehensive overview of the essential components of large language models (LLMs) in bioinformatics, spanning genomics, transcriptomics, proteomics, drug discovery, and single-cell analysis. Key aspects covered include tokenization methods for diverse data types, the architecture of transformer models, the core attention mechanism, and the pre-training processes underlying these models. Additionally, we will introduce currently available foundation models and highlight their downstream applications across various bioinformatics domains. Finally, drawing from our experience, we will offer practical guidance for both LLM users and developers, emphasizing strategies to optimize their use and foster further innovation in the field.
Collapse
Affiliation(s)
- Jiajia Liu
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Mengyuan Yang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi’an Jiaotong University Health Science Center, Xi’an, China
| | - Yankai Yu
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China
| | - Haixia Xu
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Tiangang Wang
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Xiaobo Zhou
- Center for Computational Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, 77030, USA
- McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
2
|
Chen S, Liu M, Yi W, Li H, Yu Q. Micropeptides derived from long non-coding RNAs: Computational analysis and functional roles in breast cancer and other diseases. Gene 2025; 935:149019. [PMID: 39461573 DOI: 10.1016/j.gene.2024.149019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/08/2024] [Accepted: 10/16/2024] [Indexed: 10/29/2024]
Abstract
Long non-coding RNAs (lncRNAs), once thought to be mere transcriptional noise, are now revealing a hidden code. Recent advancements like ribosome sequencing have unveiled that many lncRNAs harbor small open reading frames and can potentially encode functional micropeptides. Emerging research suggests these micropeptides, not the lncRNAs themselves, play crucial roles in regulating homeostasis, inflammation, metabolism, and especially in breast cancer progression. This review delves into the rapidly evolving computational tools used to predict and validate lncRNA-encoded micropeptides. We then explore the diverse functions and mechanisms of action of these micropeptides in breast cancer pathogenesis, with a focus on their roles in various species. Ultimately, this review aims to illuminate the functional landscape of lncRNA-encoded micropeptides and their potential as therapeutic targets in cancer.
Collapse
Affiliation(s)
- Saisai Chen
- Department of Breast Surgery, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei 230031, China
| | - Mengru Liu
- Department of Infection, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230000, China
| | - Weizhen Yi
- Department of Breast Surgery, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei 230031, China
| | - Huagang Li
- Department of Breast Surgery, The First Affiliated Hospital of Anhui University of Traditional Chinese Medicine, Hefei 230031, China
| | - Qingsheng Yu
- Institute of Chinese Medicine Surgery, Anhui Academy of Chinese Medicine, Hefei 230031, China.
| |
Collapse
|
3
|
Poloni JF, Oliveira FHS, Feltes BC. Localization is the key to action: regulatory peculiarities of lncRNAs. Front Genet 2024; 15:1478352. [PMID: 39737005 PMCID: PMC11683014 DOI: 10.3389/fgene.2024.1478352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 11/27/2024] [Indexed: 01/01/2025] Open
Abstract
To understand the transcriptomic profile of an individual cell in a multicellular organism, we must comprehend its surrounding environment and the cellular space where distinct molecular stimuli responses are located. Contradicting the initial perception that RNAs were nonfunctional and that only a few could act in chromatin remodeling, over the last few decades, research has revealed that they are multifaceted, versatile regulators of most cellular processes. Among the various RNAs, long non-coding RNAs (LncRNAs) regulate multiple biological processes and can even impact cell fate. In this sense, the subcellular localization of lncRNAs is the primary determinant of their functions. It affects their behavior by limiting their potential molecular partner and which process it can affect. The fine-tuned activity of lncRNAs is also tissue-specific and modulated by their cis and trans regulation. Hence, the spatial context of lncRNAs is crucial for understanding the regulatory networks by which they influence and are influenced. Therefore, predicting a lncRNA's correct location is not just a technical challenge but a critical step in understanding the biological meaning of its activity. Hence, examining these peculiarities is crucial to researching and discussing lncRNAs. In this review, we debate the spatial regulation of lncRNAs and their tissue-specific roles and regulatory mechanisms. We also briefly highlight how bioinformatic tools can aid research in the area.
Collapse
Affiliation(s)
| | | | - Bruno César Feltes
- Department of Biophysics, Laboratory of DNA Repair and Aging, Institute of Biosciences, Federal University of Rio Grande do Sul, Porto Alegre, Rio Grande do Sul, Brazil
| |
Collapse
|
4
|
AlSaad R, Abd-Alrazaq A, Boughorbel S, Ahmed A, Renault MA, Damseh R, Sheikh J. Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook. J Med Internet Res 2024; 26:e59505. [PMID: 39321458 PMCID: PMC11464944 DOI: 10.2196/59505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 08/07/2024] [Accepted: 08/20/2024] [Indexed: 09/27/2024] Open
Abstract
In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data-driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems.
Collapse
Affiliation(s)
- Rawan AlSaad
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| | | | - Sabri Boughorbel
- Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Arfan Ahmed
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| | | | - Rafat Damseh
- Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Javaid Sheikh
- Weill Cornell Medicine-Qatar, Education City, Doha, Qatar
| |
Collapse
|
5
|
Zhang Y. LncRNA-encoded peptides in cancer. J Hematol Oncol 2024; 17:66. [PMID: 39135098 PMCID: PMC11320871 DOI: 10.1186/s13045-024-01591-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 08/05/2024] [Indexed: 08/15/2024] Open
Abstract
Long non-coding RNAs (lncRNAs), once considered transcriptional noise, have emerged as critical regulators of gene expression and key players in cancer biology. Recent breakthroughs have revealed that certain lncRNAs can encode small open reading frame (sORF)-derived peptides, which are now understood to contribute to the pathogenesis of various cancers. This review synthesizes current knowledge on the detection, functional roles, and clinical implications of lncRNA-encoded peptides in cancer. We discuss technological advancements in the detection and validation of sORFs, including ribosome profiling and mass spectrometry, which have facilitated the discovery of these peptides. The functional roles of lncRNA-encoded peptides in cancer processes such as gene transcription, translation regulation, signal transduction, and metabolic reprogramming are explored in various types of cancer. The clinical potential of these peptides is highlighted, with a focus on their utility as diagnostic biomarkers, prognostic indicators, and therapeutic targets. The challenges and future directions in translating these findings into clinical practice are also discussed, including the need for large-scale validation, development of sensitive detection methods, and optimization of peptide stability and delivery.
Collapse
Affiliation(s)
- Yaguang Zhang
- Laboratory of Gastrointestinal Tumor Epigenetics and Genomics, Frontiers Science Center for Disease-Related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, People's Republic of China.
| |
Collapse
|
6
|
Adjeroh DA, Zhou X, Paschoal AR, Dimitrova N, Derevyanchuk EG, Shkurat TP, Loeb JA, Martinez I, Lipovich L. Challenges in LncRNA Biology: Views and Opinions. Noncoding RNA 2024; 10:43. [PMID: 39195572 DOI: 10.3390/ncrna10040043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 06/26/2024] [Accepted: 07/04/2024] [Indexed: 08/29/2024] Open
Abstract
This is a mini-review capturing the views and opinions of selected participants at the 2021 IEEE BIBM 3rd Annual LncRNA Workshop, held in Dubai, UAE. The views and opinions are expressed on five broad themes related to problems in lncRNA, namely, challenges in the computational analysis of lncRNAs, lncRNAs and cancer, lncRNAs in sports, lncRNAs and COVID-19, and lncRNAs in human brain activity.
Collapse
Affiliation(s)
- Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University (WVU), Morgantown, WV 26506, USA
| | - Xiaobo Zhou
- Department of Bioinformatics and Systems Medicine, University of Texas Health Science Center, Houston, TX 77030, USA
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Bioinformatics and Pattern Recognition Group, Federal University of Technology-Paraná-UTFPR, Curitiba 86300-000, Brazil
- Rosalind Franklin Institute, Harwell Science and Innovation Campus, Didcot OX11 0FA, UK
| | - Nadya Dimitrova
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA
| | | | - Tatiana P Shkurat
- Department of Genetics, Southern Federal University, Rostov-on-Don 344090, Russia
| | - Jeffrey A Loeb
- Department of Neurology and Rehabilitation, The Center for Clinical and Translational Science, The University of Illinois NeuroRepository, University of Illinois, Chicago, IL 60607, USA
| | - Ivan Martinez
- Department of Microbiology, Immunology & Cell Biology, WVU Cancer Institute, West Virginia University (WVU) School of Medicine, Morgantown, WV 26505, USA
| | - Leonard Lipovich
- Shenzhen Huayuan Biological Science Research Institute, Shenzhen Huayuan Biotechnology Co., Ltd., Shenzhen 518000, China
- Center for Molecular Medicine and Genetics, School of Medicine, Wayne State University, Detroit, MI 48201, USA
- College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou 325060, China
| |
Collapse
|
7
|
Abubakar M, Hajjaj M, Naqvi ZEZ, Shanawaz H, Naeem A, Padakanti SSN, Bellitieri C, Ramar R, Gandhi F, Saleem A, Abdul Khader AHS, Faraz MA. Non-Coding RNA-Mediated Gene Regulation in Cardiovascular Disorders: Current Insights and Future Directions. J Cardiovasc Transl Res 2024; 17:739-767. [PMID: 38092987 DOI: 10.1007/s12265-023-10469-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/23/2023] [Indexed: 09/04/2024]
Abstract
Cardiovascular diseases (CVDs) pose a significant burden on global health. Developing effective diagnostic, therapeutic, and prognostic indicators for CVDs is critical. This narrative review explores the role of select non-coding RNAs (ncRNAs) and provides an in-depth exploration of the roles of miRNAs, lncRNAs, and circRNAs in different aspects of CVDs, offering insights into their mechanisms and potential clinical implications. The review also sheds light on the diverse functions of ncRNAs, including their modulation of gene expression, epigenetic modifications, and signaling pathways. It comprehensively analyzes the interplay between ncRNAs and cardiovascular health, paving the way for potential novel interventions. Finally, the review provides insights into the methodologies used to investigate ncRNA-mediated gene regulation in CVDs, as well as the implications and challenges associated with translating ncRNA research into clinical applications. Considering the broader implications, this research opens avenues for interdisciplinary collaborations, enhancing our understanding of CVDs across scientific disciplines.
Collapse
Affiliation(s)
- Muhammad Abubakar
- Department of Internal Medicine, Ameer-Ud-Din Medical College, Lahore General Hospital, Lahore, Punjab, Pakistan.
| | - Mohsin Hajjaj
- Department of Internal Medicine, Jinnah Hospital, Lahore, Punjab, Pakistan
| | - Zil E Zehra Naqvi
- Department of Internal Medicine, Jinnah Hospital, Lahore, Punjab, Pakistan
| | - Hameed Shanawaz
- Department of Internal Medicine, Windsor University School of Medicine, Cayon, Saint Kitts and Nevis
| | - Ammara Naeem
- Department of Cardiology, Heart & Vascular Institute, Dearborn, Michigan, USA
| | | | | | - Rajasekar Ramar
- Department of Internal Medicine, Rajah Muthiah Medical College, Chidambaram, Tamil Nadu, India
| | - Fenil Gandhi
- Department of Family Medicine, Lower Bucks Hospital, Bristol, PA, USA
| | - Ayesha Saleem
- Department of Internal Medicine, Jinnah Hospital, Lahore, Punjab, Pakistan
| | | | - Muhammad Ahmad Faraz
- Department of Forensic Medicine, Postgraduate Medical Institute, Lahore, Punjab, Pakistan
| |
Collapse
|
8
|
Zhang Q, Liu L. Novel insights into small open reading frame-encoded micropeptides in hepatocellular carcinoma: A potential breakthrough. Cancer Lett 2024; 587:216691. [PMID: 38360139 DOI: 10.1016/j.canlet.2024.216691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 01/13/2024] [Accepted: 01/27/2024] [Indexed: 02/17/2024]
Abstract
Traditionally, non-coding RNAs (ncRNAs) are regarded as a class of RNA transcripts that lack encoding capability; however, advancements in technology have revealed that some ncRNAs contain small open reading frames (sORFs) that are capable of encoding micropeptides of approximately 150 amino acids in length. sORF-encoded micropeptides (SEPs) have emerged as intriguing entities in hepatocellular carcinoma (HCC) research, shedding light on this previously unexplored realm. Recent studies have highlighted the regulatory functions of SEPs in the occurrence and progression of HCC. Some SEPs exhibit inhibitory effects on HCC, but others facilitate its development. This discovery has revolutionized the landscape of HCC research and clinical management. Here, we introduce the concept and characteristics of SEPs, summarize their associations with HCC, and elucidate their carcinogenic mechanisms in HCC metabolism, signaling pathways, cell proliferation, and metastasis. In addition, we propose a step-by-step workflow for the investigation of HCC-associated SEPs. Lastly, we discuss the challenges and prospects of applying SEPs in the diagnosis and treatment of HCC. This review aims to facilitate the discovery, optimization, and clinical application of HCC-related SEPs, inspiring the development of early diagnostic, individualized, and precision therapeutic strategies for HCC.
Collapse
Affiliation(s)
- Qiangnu Zhang
- Division of Hepatobiliary and Pancreas Surgery, Department of General Surgery, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital, Southern University of Science and Technology), 518020, Shenzhen, China
| | - Liping Liu
- Division of Hepatobiliary and Pancreas Surgery, Department of General Surgery, Shenzhen People's Hospital (The Second Clinical Medical College, Jinan University, The First Affiliated Hospital, Southern University of Science and Technology), 518020, Shenzhen, China.
| |
Collapse
|