1
|
He YJ, Liu PL, Wei T, Liu T, Li YF, Yang J, Fan WX. Artificial intelligence in kidney transplantation: a 30-year bibliometric analysis of research trends, innovations, and future directions. Ren Fail 2025; 47:2458754. [PMID: 39910843 PMCID: PMC11803763 DOI: 10.1080/0886022x.2025.2458754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 01/16/2025] [Accepted: 01/21/2025] [Indexed: 02/07/2025] Open
Abstract
Kidney transplantation is the definitive treatment for end-stage renal disease (ESRD), yet challenges persist in optimizing donor-recipient matching, postoperative care, and immunosuppressive strategies. This study employs bibliometric analysis to evaluate 890 publications from 1993 to 2023, using tools such as CiteSpace and VOSviewer, to identify global trends, research hotspots, and future opportunities in applying artificial intelligence (AI) to kidney transplantation. Our analysis highlights the United States as the leading contributor to the field, with significant outputs from Mayo Clinic and leading authors like Cheungpasitporn W. Key research themes include AI-driven advancements in donor matching, deep learning for post-transplant monitoring, and machine learning algorithms for personalized immunosuppressive therapies. The findings underscore a rapid expansion in AI applications since 2017, with emerging trends in personalized medicine, multimodal data fusion, and telehealth. This bibliometric review provides a comprehensive resource for researchers and clinicians, offering insights into the evolution of AI in kidney transplantation and guiding future studies toward transformative applications in transplantation science.
Collapse
Affiliation(s)
- Ying Jia He
- Department of Nephrology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, China
| | - Pin Lin Liu
- Department of Nephrology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, China
| | - Tao Wei
- Department of Library, Kunming Medical University, Kunming, Yunnan Province, China
| | - Tao Liu
- Organ Transplantation Center, First Affiliated Hospital, Kunming Medical University, Kunming, Yunnan Province, China
| | - Yi Fei Li
- Organ Transplantation Center, First Affiliated Hospital, Kunming Medical University, Kunming, Yunnan Province, China
| | - Jing Yang
- Department of Nephrology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, China
| | - Wen Xing Fan
- Department of Nephrology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, China
| |
Collapse
|
2
|
Lara-Abelenda FJ, Chushig-Muzo D, Peiro-Corbacho P, Wägner AM, Granja C, Soguero-Ruiz C. Personalized glucose forecasting for people with type 1 diabetes using large language models. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108737. [PMID: 40188577 DOI: 10.1016/j.cmpb.2025.108737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 02/19/2025] [Accepted: 03/22/2025] [Indexed: 04/08/2025]
Abstract
BACKGROUND AND OBJECTIVE Type 1 Diabetes (T1D) is an autoimmune disease that requires exogenous insulin via Multiple Daily Injections (MDIs) or subcutaneous pumps to maintain targeted glucose levels. Despite the advances in Continuous Glucose Monitoring (CGM), controlling glucose levels remains challenging. Large Language Models (LLMs) have produced impressive results in text processing, but their performance with other data modalities remains unexplored. The aim of this study is three-fold. First, to evaluate the effectiveness of LLM-based models for glucose forecasting. Second, to compare the performance of different models for predicting glucose in T1D individuals treated with MDIs and pumps. Lastly, to create a personalized approach based on patient-specific training and adaptive model selection. METHODS CGM data from the T1DEXI study were used for forecasting glucose levels. Different predictive models were evaluated using the mean absolute error (MAE) and the root mean squared error and considering the Prediction Horizons (PHs) of 60, 90, and 120 min. RESULTS For short-term PHs (60 and 90 min), the personalized approach achieved the best results, with an average MAE of 15.7 and 20.2 for MDIs, and a MAE of 15.2 and 17.2 for pumps. For long-term PH (120 min), TIDE obtained an MAE of 19.8 for MDIs, whereas Patch-TST obtained a MAE of 18.5. CONCLUSION LLM-based models provided similar MAE values to state-of-the-art models but presented a reduced variability. The proposed personalized approach obtained the best results for short-term periods. Our work contributes to developing personalized glucose prediction models for enhancing glycemic control, reducing diabetes-related complications.
Collapse
Affiliation(s)
- Francisco J Lara-Abelenda
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain.
| | - David Chushig-Muzo
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain.
| | - Pablo Peiro-Corbacho
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain.
| | - Ana M Wägner
- Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain.
| | - Conceição Granja
- Norwegian Centre for E-health Research, University Hospital of North, Norway, Tromsø, Norway.
| | - Cristina Soguero-Ruiz
- Department of Signal Theory and Communications, Telematics and Computing Systems, Rey Juan Carlos University, Madrid, Spain.
| |
Collapse
|
3
|
Lo JH, Huang HP, Lo JS. LLM-based robot personality simulation and cognitive system. Sci Rep 2025; 15:16993. [PMID: 40379754 PMCID: PMC12084333 DOI: 10.1038/s41598-025-01528-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 05/06/2025] [Indexed: 05/19/2025] Open
Abstract
The inherence of personality in human-robot interaction enhances conversational dynamics and user experience. The deployment of Chat GPT-4 within a cognitive robot framework is designed by using state-space realization to emulate specific personality traits, incorporating elements of emotion, motivation, visual attention, and both short-term and long-term memory. The encoding and retrieval of long-term memory are facilitated through document embedding techniques, while emotions are generated based on predictions of future events. This framework processes textual and visual information, responding or initiating actions in accordance with the configured personality settings and cognitive processes. The constancy and effectiveness of the personality simulation have been compared to human baseline and validated via two personality assessments: the International Personality Item Pool - Neuroticism, Extraversion and Openness (IPIP-NEO) and the Big Five personality test. Our proposed personality model of cognitive robot is designed by using Kelly's role construct repertory, Cattell's 16 personality factors and preferences, which are analyzed by construct validity and compared to human subjects. Theory of mind is observed in personality simulation, which perform better second-order of belief compared to other agent on the improved theory of mind dataset (ToMi dataset). Based on the proposed methods, our designed robot, Mobi, is enable to chat based on its own personality, handle social conflicts and understand user's intent. Such simulations can achieve a high degree of human likeness, characterized by conversations that are flexible and imbued with intention.
Collapse
Affiliation(s)
- Jia-Hsun Lo
- Department of Mechanical Engineering, National Taiwan University, Taipei, Taiwan
| | - Han-Pang Huang
- Department of Mechanical Engineering, National Taiwan University, Taipei, Taiwan.
| | - Jie-Shih Lo
- Department of Health Psychology, Chang Jung Christian University, Tainan, Taiwan
| |
Collapse
|
4
|
Guan H, Novoa-Laurentiev J, Zhou L. CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records. J Biomed Inform 2025; 166:104830. [PMID: 40320101 DOI: 10.1016/j.jbi.2025.104830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 03/28/2025] [Accepted: 04/13/2025] [Indexed: 05/08/2025]
Abstract
BACKGROUND Early detection of cognitive decline during the preclinical stage of Alzheimer's disease and related dementias (AD/ADRD) is crucial for timely intervention and treatment. Clinical notes in the electronic health record contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline. METHODS We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse of Mass General Brigham. To train the model, we developed CD-Tron, built upon a large clinical language model that was finetuned using 4,949 expert-labeled note sections. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values (SHapley Additive exPlanations), to interpret the model's predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model's prediction. RESULTS CD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting cognitive decline (CD). Tested on many real-world clinical notes, CD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate CD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding. CONCLUSION CD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to free-text EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.
Collapse
Affiliation(s)
- Hao Guan
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
| | - John Novoa-Laurentiev
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
5
|
Wei B, Yao L, Hu X, Hu Y, Rao J, Ji Y, Dong Z, Duan Y, Wu X. Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study. J Med Internet Res 2025; 27:e67883. [PMID: 40209226 PMCID: PMC12022522 DOI: 10.2196/67883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 02/12/2025] [Accepted: 03/12/2025] [Indexed: 04/12/2025] Open
Abstract
BACKGROUND Ocular myasthenia gravis (OMG) is a neuromuscular disorder primarily affecting the extraocular muscles, leading to ptosis and diplopia. Effective patient education is crucial for disease management; however, in China, limited health care resources often restrict patients' access to personalized medical guidance. Large language models (LLMs) have emerged as potential tools to bridge this gap by providing instant, AI-driven health information. However, their accuracy and readability in educating patients with OMG remain uncertain. OBJECTIVE The purpose of this study was to systematically evaluate the effectiveness of multiple LLMs in the education of Chinese patients with OMG. Specifically, the validity of these models in answering patients with OMG-related questions was assessed through accuracy, completeness, readability, usefulness, and safety, and patients' ratings of their usability and readability were analyzed. METHODS The study was conducted in two phases: 130 choice ophthalmology examination questions were input into 5 different LLMs. Their performance was compared with that of undergraduates, master's students, and ophthalmology residents. In addition, 23 common patients with OMG-related patient questions were posed to 4 LLMs, and their responses were evaluated by ophthalmologists across 5 domains. In the second phase, 20 patients with OMG interacted with the 2 LLMs from the first phase, each asking 3 questions. Patients assessed the responses for satisfaction and readability, while ophthalmologists evaluated the responses again using the 5 domains. RESULTS ChatGPT o1-preview achieved the highest accuracy rate of 73% on 130 ophthalmology examination questions, outperforming other LLMs and professional groups like undergraduates and master's students. For 23 common patients with OMG-related questions, ChatGPT o1-preview scored highest in correctness (4.44), completeness (4.44), helpfulness (4.47), and safety (4.6). GEMINI (Google DeepMind) provided the easiest-to-understand responses in readability assessments, while GPT-4o had the most complex responses, suitable for readers with higher education levels. In the second phase with 20 patients with OMG, ChatGPT o1-preview received higher satisfaction scores than Ernie 3.5 (Baidu; 4.40 vs 3.89, P=.002), although Ernie 3.5's responses were slightly more readable (4.31 vs 4.03, P=.01). CONCLUSIONS LLMs such as ChatGPT o1-preview may have the potential to enhance patient education. Addressing challenges such as misinformation risk, readability issues, and ethical considerations is crucial for their effective and safe integration into clinical practice.
Collapse
Affiliation(s)
- Bin Wei
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Lili Yao
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Xin Hu
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Yuxiang Hu
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Jie Rao
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Yu Ji
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Zhuoer Dong
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Yichong Duan
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| | - Xiaorong Wu
- Jiangxi Medical College, The First Affiliated Hospital, Nanchang University, Nanchang, China
| |
Collapse
|
6
|
Eghbali N, Klochko C, Mahdi Z, Alhiari L, Lee J, Knisely B, Craig J, Ghassemi MM. Enhancing Radiology Clinical Histories Through Transformer-Based Automated Clinical Note Summarization. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01477-8. [PMID: 40195229 DOI: 10.1007/s10278-025-01477-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 03/05/2025] [Accepted: 03/07/2025] [Indexed: 04/09/2025]
Abstract
Insufficient clinical information provided in radiology requests, coupled with the cumbersome nature of electronic health records (EHRs), poses significant challenges for radiologists in extracting pertinent clinical data and compiling detailed radiology reports. Considering the challenges and time involved in navigating electronic medical records (EMR), an automated method to accurately compress the text while maintaining key semantic information could significantly enhance the efficiency of radiologists' workflow. The purpose of this study is to develop and demonstrate an automated tool for clinical note summarization with the goal of extracting the most pertinent clinical information for the radiological assessments. We adopted a transfer learning methodology from the natural language processing domain to fine-tune a transformer model for abstracting clinical reports. We employed a dataset consisting of 1000 clinical notes from 970 patients who underwent knee MRI, all manually summarized by radiologists. The fine-tuning process involved a two-stage approach starting with self-supervised denoising and then focusing on the summarization task. The model successfully condensed clinical notes by 97% while aligning closely with radiologist-written summaries evidenced by a 0.9 cosine similarity and a ROUGE-1 score of 40.18. In addition, statistical analysis, indicated by a Fleiss kappa score of 0.32, demonstrated fair agreement among specialists on the model's effectiveness in producing more relevant clinical histories compared to those included in the exam requests. The proposed model effectively summarized clinical notes for knee MRI studies, thereby demonstrating potential for improving radiology reporting efficiency and accuracy.
Collapse
|
7
|
García-Olea A, Domingo-Aldama AG, Merino M, Gojenola K, Goikoetxea J, Atutxa A, Ormaetxe JM. The Application of Deep Learning Tools on Medical Reports to Optimize the Input of an Atrial-Fibrillation-Recurrence Predictive Model. J Clin Med 2025; 14:2297. [PMID: 40217746 PMCID: PMC11989490 DOI: 10.3390/jcm14072297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 03/19/2025] [Accepted: 03/21/2025] [Indexed: 04/14/2025] Open
Abstract
Background: Artificial Intelligence (AI) techniques, particularly Deep Learning (DL) and Natural Language Processing (NLP), have seen exponential growth in the biomedical field. This study focuses on enhancing predictive models for atrial fibrillation (AF) recurrence by extracting valuable data from electronic health records (EHRs) and unstructured medical reports. Although existing models show promise, their reliability is hampered by inaccuracies in coded data, with significant false positives and false negatives impacting their performance. To address this, the authors propose an automated system using DL and NLP techniques to process medical reports, extracting key predictive variables, and identifying new AF cases. The main purpose is to improve dataset reliability so future predictive models can respond more accurately Methods and Results: The study analyzed over one million discharge reports, applying regular expressions and DL tools to extract variables and identify AF onset. The performance of DL models, particularly a feedforward neural network combined with tf-idf, demonstrated high accuracy (0.986) in predicting AF onset. The application of DL tools on unstructured text reduced the error rate in AF identification by 50%, achieving an error rate of less than 2%. Conclusions: This work underscores the potential of AI in optimizing dataset accuracy to develop predictive models and consequently improving the healthcare predictions, offering valuable insights for research groups utilizing secondary data for predictive analytics in this particular setting.
Collapse
Affiliation(s)
- Alain García-Olea
- Biobizkaia Research Institute, Basurto University Hospital, 48013 Bilbao, Spain
| | - Ane G Domingo-Aldama
- Hitz Group, Bilbao School of Engineering, University of the Basque Country, 48940 Bilbao, Spain
| | - Marcos Merino
- Hitz Group, Bilbao School of Engineering, University of the Basque Country, 48940 Bilbao, Spain
| | - Koldo Gojenola
- Hitz Group, Bilbao School of Engineering, University of the Basque Country, 48940 Bilbao, Spain
| | - Josu Goikoetxea
- Hitz Group, Bilbao School of Engineering, University of the Basque Country, 48940 Bilbao, Spain
| | - Aitziber Atutxa
- Hitz Group, Bilbao School of Engineering, University of the Basque Country, 48940 Bilbao, Spain
| | | |
Collapse
|
8
|
Casmin E, Oliveira R. Survey on Context-Aware Radio Frequency-Based Sensing. SENSORS (BASEL, SWITZERLAND) 2025; 25:602. [PMID: 39943241 PMCID: PMC11820419 DOI: 10.3390/s25030602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 01/10/2025] [Accepted: 01/17/2025] [Indexed: 02/16/2025]
Abstract
Radio frequency (RF) spectrum sensing is critical for applications requiring precise object and posture detection and classification. This survey aims to provide a focused review of context-aware RF-based sensing, emphasizing its principles, advancements, and challenges. It specifically examines state-of-the-art techniques such as phased array radar, synthetic aperture radar, and passive RF sensing, highlighting their methodologies, data input domains, and spatial diversity strategies. The paper evaluates feature extraction methods and machine learning approaches used for detection and classification, presenting their accuracy metrics across various applications. Additionally, it investigates the integration of RF sensing with other modalities, such as inertial sensors, to enhance context awareness and improve performance. Challenges like environmental interference, scalability, and regulatory constraints are addressed, with insights into real-world mitigation strategies. The survey concludes by identifying emerging trends, practical applications, and future directions for advancing RF sensing technologies.
Collapse
Affiliation(s)
- Eugene Casmin
- Departamento de Engenharia Electrotécnica e de Computadores, Faculdade de Ciências e Tecnologia (FCT), Universidade Nova de Lisboa, 2829-516 Caparica, Portugal;
- Instituto de Telecomunicações, 1049-001 Lisbon, Portugal
| | - Rodolfo Oliveira
- Departamento de Engenharia Electrotécnica e de Computadores, Faculdade de Ciências e Tecnologia (FCT), Universidade Nova de Lisboa, 2829-516 Caparica, Portugal;
- Instituto de Telecomunicações, 1049-001 Lisbon, Portugal
| |
Collapse
|
9
|
Zhao PC, Wei XX, Wang Q, Wang QH, Li JN, Shang J, Lu C, Shi JY. Single-step retrosynthesis prediction via multitask graph representation learning. Nat Commun 2025; 16:814. [PMID: 39827189 PMCID: PMC11742932 DOI: 10.1038/s41467-025-56062-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/08/2025] [Indexed: 01/22/2025] Open
Abstract
Inferring appropriate synthesis reaction (i.e., retrosynthesis) routes for newly designed molecules is vital. Recently, computational methods have produced promising single-step retrosynthesis predictions. However, template-based methods are limited by the known synthesis templates; template-free methods are weakly interpretable; and semi template-based methods are deficient with regard to utilizing the associations between chemical entities. To address these issues, this paper leverages the intra-associations between synthons, the inter-associations between synthons and leaving groups (LGs), and the intra-associations between LGs. It develops a multitask graph representation learning model for single-step retrosynthesis prediction (Retro-MTGR) to solve reaction centre deduction and LG identification simultaneously. A comparison with 16 state-of-the-art methods first demonstrates the superiority of Retro-MTGR. Then, its robustness and scalability and the contributions of its crucial components are validated. More importantly, it can determine whether a bond can be a reaction centre and what LGs are appropriate for a given synthon, respectively. The answers reflect underlying chemical synthesis rules, especially opposite electrical properties between chemical entities (e.g., reaction sites, synthons, and LGs). Finally, case studies demonstrate that the retrosynthesis routes inferred by Retro-MTGR are promising for single-step synthesis reactions. The code and data of this study are freely available at https://doi.org/10.5281/zenodo.14346324 .
Collapse
Affiliation(s)
- Peng-Cheng Zhao
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Xue-Xin Wei
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qiong Wang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Qi-Hao Wang
- School of Chemistry and Chemical Engineering, Northwestern Polytechnical University, Xi'an, China
| | - Jia-Ning Li
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Jie Shang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| | - Cheng Lu
- Institute of Basic Research in Clinical Medicine China Academy of Chinese Medical Sciences, Beijing, China.
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
10
|
Li X, Peng L, Wang YP, Zhang W. Open challenges and opportunities in federated foundation models towards biomedical healthcare. BioData Min 2025; 18:2. [PMID: 39755653 DOI: 10.1186/s13040-024-00414-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Accepted: 12/09/2024] [Indexed: 01/06/2025] Open
Abstract
This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) in biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforcement learning from human feedback, represent significant advancements in machine learning. These models, with their ability to generate coherent text and realistic images, are crucial for biomedical applications that require processing diverse data forms such as clinical reports, diagnostic images, and multimodal patient interactions. The incorporation of FL with these sophisticated models presents a promising strategy to harness their analytical power while safeguarding the privacy of sensitive medical data. This approach not only enhances the capabilities of FMs in medical diagnostics and personalized treatment but also addresses critical concerns about data privacy and security in healthcare. This survey reviews the current applications of FMs in federated settings, underscores the challenges, and identifies future research directions including scaling FMs, managing data diversity, and enhancing communication efficiency within FL frameworks. The objective is to encourage further research into the combined potential of FMs and FL, laying the groundwork for healthcare innovations.
Collapse
Affiliation(s)
- Xingyu Li
- Department of Computer Science, Tulane University, New Orleans, LA, USA
| | - Lu Peng
- Department of Computer Science, Tulane University, New Orleans, LA, USA.
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, USA
| | - Weihua Zhang
- School of Computer Science, Fudan University, Shanghai, China
| |
Collapse
|
11
|
Yuan H, Hicks P, Ahmadian M, Johnson KA, Valtadoros L, Krishnan A. Annotating publicly-available samples and studies using interpretable modeling of unstructured metadata. Brief Bioinform 2024; 26:bbae652. [PMID: 39710433 DOI: 10.1093/bib/bbae652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 10/31/2024] [Accepted: 12/13/2024] [Indexed: 12/24/2024] Open
Abstract
Reusing massive collections of publicly available biomedical data can significantly impact knowledge discovery. However, these public samples and studies are typically described using unstructured plain text, hindering the findability and further reuse of the data. To combat this problem, we propose txt2onto 2.0, a general-purpose method based on natural language processing and machine learning for annotating biomedical unstructured metadata to controlled vocabularies of diseases and tissues. Compared to the previous version (txt2onto 1.0), which uses numerical embeddings as features, this new version uses words as features, resulting in improved interpretability and performance, especially when few positive training instances are available. Txt2onto 2.0 uses embeddings from a large language model during prediction to deal with unseen-yet-relevant words related to each disease and tissue term being predicted from the input text, thereby explaining the basis of every annotation. We demonstrate the generalizability of txt2onto 2.0 by accurately predicting disease annotations for studies from independent datasets, using proteomics and clinical trials as examples. Overall, our approach can annotate biomedical text regardless of experimental types or sources. Code, data, and trained models are available at https://github.com/krishnanlab/txt2onto2.0.
Collapse
Affiliation(s)
- Hao Yuan
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48823, United States
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48823, United States
| | - Parker Hicks
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - Mansooreh Ahmadian
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - Kayla A Johnson
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - Lydia Valtadoros
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| |
Collapse
|
12
|
Cho HN, Jun TJ, Kim YH, Kang H, Ahn I, Gwon H, Kim Y, Seo J, Choi H, Kim M, Han J, Kee G, Park S, Ko S. Task-Specific Transformer-Based Language Models in Health Care: Scoping Review. JMIR Med Inform 2024; 12:e49724. [PMID: 39556827 PMCID: PMC11612605 DOI: 10.2196/49724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/10/2023] [Accepted: 10/21/2024] [Indexed: 11/20/2024] Open
Abstract
BACKGROUND Transformer-based language models have shown great potential to revolutionize health care by advancing clinical decision support, patient interaction, and disease prediction. However, despite their rapid development, the implementation of transformer-based language models in health care settings remains limited. This is partly due to the lack of a comprehensive review, which hinders a systematic understanding of their applications and limitations. Without clear guidelines and consolidated information, both researchers and physicians face difficulties in using these models effectively, resulting in inefficient research efforts and slow integration into clinical workflows. OBJECTIVE This scoping review addresses this gap by examining studies on medical transformer-based language models and categorizing them into 6 tasks: dialogue generation, question answering, summarization, text classification, sentiment analysis, and named entity recognition. METHODS We conducted a scoping review following the Cochrane scoping review protocol. A comprehensive literature search was performed across databases, including Google Scholar and PubMed, covering publications from January 2017 to September 2024. Studies involving transformer-derived models in medical tasks were included. Data were categorized into 6 key tasks. RESULTS Our key findings revealed both advancements and critical challenges in applying transformer-based models to health care tasks. For example, models like MedPIR involving dialogue generation show promise but face privacy and ethical concerns, while question-answering models like BioBERT improve accuracy but struggle with the complexity of medical terminology. The BioBERTSum summarization model aids clinicians by condensing medical texts but needs better handling of long sequences. CONCLUSIONS This review attempted to provide a consolidated understanding of the role of transformer-based language models in health care and to guide future research directions. By addressing current challenges and exploring the potential for real-world applications, we envision significant improvements in health care informatics. Addressing the identified challenges and implementing proposed solutions can enable transformer-based language models to significantly improve health care delivery and patient outcomes. Our review provides valuable insights for future research and practical applications, setting the stage for transformative advancements in medical informatics.
Collapse
Affiliation(s)
- Ha Na Cho
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Tae Joon Jun
- Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul, Republic of Korea
| | - Young-Hak Kim
- Division of Cardiology, Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Heejun Kang
- Division of Cardiology, Asan Medical Center, Seoul, Republic of Korea
| | - Imjin Ahn
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Hansle Gwon
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Yunha Kim
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jiahn Seo
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Heejung Choi
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Minkyoung Kim
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jiye Han
- Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Gaeun Kee
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Seohyun Park
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| | - Soyoung Ko
- Department of Information Medicine, Asan Medical Center, Seoul, Republic of Korea
| |
Collapse
|
13
|
Torabi M, Haririan I, Foroumadi A, Ghanbari H, Ghasemi F. A deep learning model based on the BERT pre-trained model to predict the antiproliferative activity of anti-cancer chemical compounds. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:971-992. [PMID: 39605280 DOI: 10.1080/1062936x.2024.2431486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2024] [Accepted: 11/13/2024] [Indexed: 11/29/2024]
Abstract
Identifying new compounds with minimal side effects to enhance patients' quality of life is the ultimate goal of drug discovery. Due to the expensive and time-consuming nature of experimental investigations and the scarcity of data in traditional QSAR studies, deep transfer learning models, such as the BERT model, have recently been suggested. This study evaluated the model's performance in predicting the anti-proliferative activity of five cancer cell lines (HeLa, MCF7, MDA-MB231, PC3, and MDA-MB) using over 3,000 synthesized molecules from PubChem. The results indicated that the model could predict the class of designed small molecules with acceptable accuracy for most cell lines, except for PC3 and MDA-MB. The model's performance was further tested on an in-house dataset of approximately 25 small molecules per cell line, based on IC50 values. The model accurately predicted the biological activity class for HeLa with an accuracy of 0.77 ± 0.4 and demonstrated acceptable performance for MCF7 and MDA-MB231, with accuracy between 0.56 and 0.66. However, the results were less reliable for PC3 and HepG2. In conclusion, the ChemBERTa fine-tuned model shows potential for predicting outcomes on in-house datasets.
Collapse
Affiliation(s)
- M Torabi
- Biosensor Research Centre, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - I Haririan
- Department of Pharmaceutics, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran
- Department of Pharmaceutical Biomaterials and Medical Biomaterials Research Center (MBRC), Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran
| | - A Foroumadi
- Department of Medicinal Chemistry, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran
- Drug Design and Development Research Center, The Institute of Pharmaceutical Sciences (TIPS), Tehran University of Medical Sciences, Tehran, Iran
| | - H Ghanbari
- Department of Medical Nanotechnology, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - F Ghasemi
- Department of Bioinformatics and Systems Biology, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
- Bioinformatics Research Center, School of Pharmacy and Pharmaceutical Sciences, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
14
|
Li R, Zhang L, Wang Z, Li X. FCSwinU: Fourier Convolutions and Swin Transformer UNet for Hyperspectral and Multispectral Image Fusion. SENSORS (BASEL, SWITZERLAND) 2024; 24:7023. [PMID: 39517942 PMCID: PMC11548634 DOI: 10.3390/s24217023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 09/25/2024] [Accepted: 09/26/2024] [Indexed: 11/16/2024]
Abstract
The fusion of low-resolution hyperspectral images (LR-HSI) with high-resolution multispectral images (HR-MSI) provides a cost-effective approach to obtaining high-resolution hyperspectral images (HR-HSI). Existing methods primarily based on convolutional neural networks (CNNs) struggle to capture global features and do not adequately address the significant scale and spectral resolution differences between LR-HSI and HR-MSI. To tackle these challenges, our novel FCSwinU network leverages the spectral fast Fourier convolution (SFFC) module for spectral feature extraction and utilizes the Swin Transformer's self-attention mechanism for multi-scale global feature fusion. FCSwinU employs a UNet-like encoder-decoder framework to effectively merge spatiospectral features. The encoder integrates the Swin Transformer feature abstraction module (SwinTFAM) to encode pixel correlations and perform multi-scale transformations, facilitating the adaptive fusion of hyperspectral and multispectral data. The decoder then employs the Swin Transformer feature reconstruction module (SwinTFRM) to reconstruct the fused features, restoring the original image dimensions and ensuring the precise recovery of spatial and spectral details. Experimental results from three benchmark datasets and a real-world dataset robustly validate the superior performance of our method in both visual representation and quantitative assessment compared to existing fusion methods.
Collapse
Affiliation(s)
- Rumei Li
- College of Resource Environment and Tourism, Capital Normal University, No. 105, North Road of West 3rd Ring, Beijing 100048, China; (R.L.); (Z.W.); (X.L.)
| | - Liyan Zhang
- College of Resource Environment and Tourism, Capital Normal University, No. 105, North Road of West 3rd Ring, Beijing 100048, China; (R.L.); (Z.W.); (X.L.)
- Key Laboratory of 3-Dimensional Information Acquisition and Application, Ministry of Education, Capital Normal University, No. 105, North Road of West 3rd Ring, Beijing 100048, China
| | - Zun Wang
- College of Resource Environment and Tourism, Capital Normal University, No. 105, North Road of West 3rd Ring, Beijing 100048, China; (R.L.); (Z.W.); (X.L.)
| | - Xiaojuan Li
- College of Resource Environment and Tourism, Capital Normal University, No. 105, North Road of West 3rd Ring, Beijing 100048, China; (R.L.); (Z.W.); (X.L.)
- Key Laboratory of 3-Dimensional Information Acquisition and Application, Ministry of Education, Capital Normal University, No. 105, North Road of West 3rd Ring, Beijing 100048, China
| |
Collapse
|
15
|
Korban M, Youngs P, Acton ST. A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6055-6069. [PMID: 38483796 DOI: 10.1109/tpami.2024.3377192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
This paper presents a novel spatiotemporal transformer network that introduces several original components to detect actions in untrimmed videos. First, the multi-feature selective semantic attention model calculates the correlations between spatial and motion features to model spatiotemporal interactions between different action semantics properly. Second, the motion-aware network encodes the locations of action semantics in video frames utilizing the motion-aware 2D positional encoding algorithm. Such a motion-aware mechanism memorizes the dynamic spatiotemporal variations in action frames that current methods cannot exploit. Third, the sequence-based temporal attention model captures the heterogeneous temporal dependencies in action frames. In contrast to standard temporal attention used in natural language processing, primarily aimed at finding similarities between linguistic words, the proposed sequence-based temporal attention is designed to determine both the differences and similarities between video frames that jointly define the meaning of actions. The proposed approach outperforms the state-of-the-art solutions on four spatiotemporal action datasets: AVA 2.2, AVA 2.1, UCF101-24, and EPIC-Kitchens.
Collapse
|
16
|
Varela-Vega A, Posada-Reyes AB, Méndez-Cruz CF. Automatic extraction of transcriptional regulatory interactions of bacteria from biomedical literature using a BERT-based approach. Database (Oxford) 2024; 2024:baae094. [PMID: 39213391 PMCID: PMC11363960 DOI: 10.1093/database/baae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 08/09/2024] [Accepted: 08/14/2024] [Indexed: 09/04/2024]
Abstract
Transcriptional regulatory networks (TRNs) give a global view of the regulatory mechanisms of bacteria to respond to environmental signals. These networks are published in biological databases as a valuable resource for experimental and bioinformatics researchers. Despite the efforts to publish TRNs of diverse bacteria, many of them still lack one and many of the existing TRNs are incomplete. In addition, the manual extraction of information from biomedical literature ("literature curation") has been the traditional way to extract these networks, despite this being demanding and time-consuming. Recently, language models based on pretrained transformers have been used to extract relevant knowledge from biomedical literature. Moreover, the benefit of fine-tuning a large pretrained model with new limited data for a specific task ("transfer learning") opens roads to address new problems of biomedical information extraction. Here, to alleviate this lack of knowledge and assist literature curation, we present a new approach based on the Bidirectional Transformer for Language Understanding (BERT) architecture to classify transcriptional regulatory interactions of bacteria as a first step to extract TRNs from literature. The approach achieved a significant performance in a test dataset of sentences of Escherichia coli (F1-Score: 0.8685, Matthew's correlation coefficient: 0.8163). The examination of model predictions revealed that the model learned different ways to express the regulatory interaction. The approach was evaluated to extract a TRN of Salmonella using 264 complete articles. The evaluation showed that the approach was able to accurately extract 82% of the network and that it was able to extract interactions absent in curation data. To the best of our knowledge, the present study is the first effort to obtain a BERT-based approach to extract this specific kind of interaction. This approach is a starting point to address the limitations of reconstructing TRNs of bacteria and diseases of biological interest. Database URL: https://github.com/laigen-unam/BERT-trn-extraction.
Collapse
Affiliation(s)
- Alfredo Varela-Vega
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, UNAM, Av. Universidad S/N Col. Chamilpa, Cuernavaca, Morelos 62210, México
| | - Ali-Berenice Posada-Reyes
- Laboratorio de Microbiología, Inmunología y Salud Pública, Facultad de Estudios Superiores Cuautitlán, UNAM, Carretera Cuautitlán-Teoloyucan Km. 2.5, Xhala, Cuautitlán Izcalli, Estado de México 54714, México
| | - Carlos-Francisco Méndez-Cruz
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, UNAM, Av. Universidad S/N Col. Chamilpa, Cuernavaca, Morelos 62210, México
| |
Collapse
|
17
|
Bishal MM, Chowdory MRH, Das A, Kabir MA. COVIDHealth: A novel labeled dataset and machine learning-based web application for classifying COVID-19 discourses on Twitter. Heliyon 2024; 10:e34103. [PMID: 39100452 PMCID: PMC11295851 DOI: 10.1016/j.heliyon.2024.e34103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 06/27/2024] [Accepted: 07/03/2024] [Indexed: 08/06/2024] Open
Abstract
The COVID-19 pandemic has sparked widespread health-related discussions on social media platforms like Twitter (now named 'X'). However, the lack of labeled Twitter data poses significant challenges for theme-based classification and tweet aggregation. To address this gap, we developed a machine learning-based web application that automatically classifies COVID-19 discourses into five categories: health risks, prevention, symptoms, transmission, and treatment. We collected and labeled 6,667 COVID-19-related tweets using the Twitter API, and applied various feature extraction methods to extract relevant features. We then compared the performance of seven classical machine learning algorithms (Decision Tree, Random Forest, Stochastic Gradient Descent, Adaboost, K-Nearest Neighbor, Logistic Regression, and Linear SVC) and four deep learning techniques (LSTM, CNN, RNN, and BERT) for classification. Our results show that the CNN achieved the highest precision (90.41%), recall (90.4%), F1 score (90.4%), and accuracy (90.4%). The Linear SVC algorithm exhibited the highest precision (85.71%), recall (86.94%), and F1 score (86.13%) among classical machine learning approaches. Our study advances the field of health-related data analysis and classification, and offers a publicly accessible web-based tool for public health researchers and practitioners. This tool has the potential to support addressing public health challenges and enhancing awareness during pandemics. The dataset and application are accessible at https://github.com/Bishal16/COVID19-Health-Related-Data-Classification-Website.
Collapse
Affiliation(s)
- Mahathir Mohammad Bishal
- Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chattogram, 4349, Bangladesh
| | - Md. Rakibul Hassan Chowdory
- Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chattogram, 4349, Bangladesh
| | - Anik Das
- Department of Computer Science, St. Francis Xavier University, Antigonish, B2G 2W5, NS, Canada
| | - Muhammad Ashad Kabir
- School of Computing, Mathematics, and Engineering, Charles Sturt University, Bathurst, 2795, NSW, Australia
| |
Collapse
|
18
|
Madan S, Lentzen M, Brandt J, Rueckert D, Hofmann-Apitius M, Fröhlich H. Transformer models in biomedicine. BMC Med Inform Decis Mak 2024; 24:214. [PMID: 39075407 PMCID: PMC11287876 DOI: 10.1186/s12911-024-02600-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 07/08/2024] [Indexed: 07/31/2024] Open
Abstract
Deep neural networks (DNN) have fundamentally revolutionized the artificial intelligence (AI) field. The transformer model is a type of DNN that was originally used for the natural language processing tasks and has since gained more and more attention for processing various kinds of sequential data, including biological sequences and structured electronic health records. Along with this development, transformer-based models such as BioBERT, MedBERT, and MassGenie have been trained and deployed by researchers to answer various scientific questions originating in the biomedical domain. In this paper, we review the development and application of transformer models for analyzing various biomedical-related datasets such as biomedical textual data, protein sequences, medical structured-longitudinal data, and biomedical images as well as graphs. Also, we look at explainable AI strategies that help to comprehend the predictions of transformer-based models. Finally, we discuss the limitations and challenges of current models, and point out emerging novel research directions.
Collapse
Affiliation(s)
- Sumit Madan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany.
- Institute of Computer Science, University of Bonn, Bonn, 53115, Germany.
| | - Manuel Lentzen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany
| | - Johannes Brandt
- School of Medicine, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Daniel Rueckert
- School of Medicine, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
- School of Computation, Information and Technology, Technical University Munich, Munich, Germany
- Department of Computing, Imperial College London, London, UK
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany
| | - Holger Fröhlich
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53757, Germany.
- Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn, Bonn, 53115, Germany.
| |
Collapse
|
19
|
Wada S, Takeda T, Okada K, Manabe S, Konishi S, Kamohara J, Matsumura Y. Oversampling effect in pretraining for bidirectional encoder representations from transformers (BERT) to localize medical BERT and enhance biomedical BERT. Artif Intell Med 2024; 153:102889. [PMID: 38728811 DOI: 10.1016/j.artmed.2024.102889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 05/03/2024] [Accepted: 05/04/2024] [Indexed: 05/12/2024]
Abstract
BACKGROUND Pretraining large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural language processing. With the introduction of transformer-based language models, such as bidirectional encoder representations from transformers (BERT), the performance of information extraction from free text has improved significantly in both the general and medical domains. However, it is difficult to train specific BERT models to perform well in domains for which few databases of a high quality and large size are publicly available. OBJECTIVE We hypothesized that this problem could be addressed by oversampling a domain-specific corpus and using it for pretraining with a larger corpus in a balanced manner. In the present study, we verified our hypothesis by developing pretraining models using our method and evaluating their performance. METHODS Our proposed method was based on the simultaneous pretraining of models with knowledge from distinct domains after oversampling. We conducted three experiments in which we generated (1) English biomedical BERT from a small biomedical corpus, (2) Japanese medical BERT from a small medical corpus, and (3) enhanced biomedical BERT pretrained with complete PubMed abstracts in a balanced manner. We then compared their performance with those of conventional models. RESULTS Our English BERT pretrained using both general and small medical domain corpora performed sufficiently well for practical use on the biomedical language understanding evaluation (BLUE) benchmark. Moreover, our proposed method was more effective than the conventional methods for each biomedical corpus of the same corpus size in the general domain. Our Japanese medical BERT outperformed the other BERT models built using a conventional method for almost all the medical tasks. The model demonstrated the same trend as that of the first experiment in English. Further, our enhanced biomedical BERT model, which was not pretrained on clinical notes, achieved superior clinical and biomedical scores on the BLUE benchmark with an increase of 0.3 points in the clinical score and 0.5 points in the biomedical score. These scores were above those of the models trained without our proposed method. CONCLUSIONS Well-balanced pretraining using oversampling instances derived from a corpus appropriate for the target task allowed us to construct a high-performance BERT model.
Collapse
Affiliation(s)
- Shoya Wada
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan.
| | - Toshihiro Takeda
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| | - Katsuki Okada
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| | - Shirou Manabe
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| | - Shozo Konishi
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| | | | - Yasushi Matsumura
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| |
Collapse
|
20
|
Invernici F, Bernasconi A, Ceri S. Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation. J Med Internet Res 2024; 26:e52655. [PMID: 38814687 PMCID: PMC11176882 DOI: 10.2196/52655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 03/06/2024] [Accepted: 03/30/2024] [Indexed: 05/31/2024] Open
Abstract
BACKGROUND Since the beginning of the COVID-19 pandemic, >1 million studies have been collected within the COVID-19 Open Research Dataset, a corpus of manuscripts created to accelerate research against the disease. Their related abstracts hold a wealth of information that remains largely unexplored and difficult to search due to its unstructured nature. Keyword-based search is the standard approach, which allows users to retrieve the documents of a corpus that contain (all or some of) the words in a target list. This type of search, however, does not provide visual support to the task and is not suited to expressing complex queries or compensating for missing specifications. OBJECTIVE This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19-related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience. METHODS We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications' abstracts using terms selected from the Unified Medical Language System and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths. RESULTS We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching. CONCLUSIONS Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available.
Collapse
Affiliation(s)
- Francesco Invernici
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Anna Bernasconi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Stefano Ceri
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| |
Collapse
|
21
|
Paranou D, Chatzigoulas A, Cournia Z. Using deep learning and large protein language models to predict protein-membrane interfaces of peripheral membrane proteins. BIOINFORMATICS ADVANCES 2024; 4:vbae078. [PMID: 39559823 PMCID: PMC11572487 DOI: 10.1093/bioadv/vbae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/04/2024] [Accepted: 05/26/2024] [Indexed: 11/20/2024]
Abstract
Motivation Characterizing interactions at the protein-membrane interface is crucial as abnormal peripheral protein-membrane attachment is involved in the onset of many diseases. However, a limiting factor in studying and understanding protein-membrane interactions is that the membrane-binding domains of peripheral membrane proteins (PMPs) are typically unknown. By applying artificial intelligence techniques in the context of natural language processing (NLP), the accuracy and prediction time for protein-membrane interface analysis can be significantly improved compared to existing methods. Here, we assess whether NLP and protein language models (pLMs) can be used to predict membrane-interacting amino acids for PMPs. Results We utilize available experimental data and generate protein embeddings from two pLMs (ProtTrans and ESM) to train classifier models. Overall, the results demonstrate the first proof of concept study and the promising potential of using deep learning and pLMs to predict protein-membrane interfaces for PMPs faster, with similar accuracy, and without the need for 3D structural data compared to existing tools. Availability and implementation The code is available at https://github.com/zoecournia/pLM-PMI. All data are available in the Supplementary material.
Collapse
Affiliation(s)
- Dimitra Paranou
- Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens 15784, Greece
| | | | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens 15784, Greece
| |
Collapse
|
22
|
Wang B, Lian Y, Xiong X, Zhou H, Liu Z, Zhou X. DCT-net: Dual-domain cross-fusion transformer network for MRI reconstruction. Magn Reson Imaging 2024; 107:69-79. [PMID: 38237693 DOI: 10.1016/j.mri.2024.01.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/26/2023] [Accepted: 01/14/2024] [Indexed: 01/22/2024]
Abstract
Current challenges in Magnetic Resonance Imaging (MRI) include long acquisition times and motion artifacts. To address these issues, under-sampled k-space acquisition has gained popularity as a fast imaging method. However, recovering fine details from under-sampled data remains challenging. In this study, we introduce a pioneering deep learning approach, namely DCT-Net, designed for dual-domain MRI reconstruction. DCT-Net seamlessly integrates information from the image domain (IRM) and frequency domain (FRM), utilizing a novel Cross Attention Block (CAB) and Fusion Attention Block (FAB). These innovative blocks enable precise feature extraction and adaptive fusion across both domains, resulting in a significant enhancement of the reconstructed image quality. The adaptive interaction and fusion mechanisms of CAB and FAB contribute to the method's effectiveness in capturing distinctive features and optimizing image reconstruction. Comprehensive ablation studies have been conducted to assess the contributions of these modules to reconstruction quality and accuracy. Experimental results on the FastMRI (2023) and Calgary-Campinas datasets (2021) demonstrate the superiority of our MRI reconstruction framework over other typical methods (most are illustrated in 2023 or 2022) in both qualitative and quantitative evaluations. This holds for knee and brain datasets under 4× and 8× accelerated imaging scenarios.
Collapse
Affiliation(s)
- Bin Wang
- National Institute of Metrology, Beijing 100029, China; Key Laboratory of Metrology Digitalization and Digital Metrology for State Market Regulation, Beijing 100029, China; School of Printing and Packaging Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
| | - Yusheng Lian
- School of Printing and Packaging Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
| | - Xingchuang Xiong
- National Institute of Metrology, Beijing 100029, China; Key Laboratory of Metrology Digitalization and Digital Metrology for State Market Regulation, Beijing 100029, China.
| | - Han Zhou
- School of Printing and Packaging Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
| | - Zilong Liu
- National Institute of Metrology, Beijing 100029, China; Key Laboratory of Metrology Digitalization and Digital Metrology for State Market Regulation, Beijing 100029, China.
| | - Xiaohao Zhou
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China.
| |
Collapse
|
23
|
Huang MS, Han JC, Lin PY, You YT, Tsai RTH, Hsu WL. Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource. Brief Bioinform 2024; 25:bbae132. [PMID: 38609331 PMCID: PMC11014787 DOI: 10.1093/bib/bbae132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 11/06/2023] [Accepted: 03/02/2023] [Indexed: 04/14/2024] Open
Abstract
Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein-protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD's compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models' performances on the PEDD. This paper's outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.
Collapse
Affiliation(s)
- Ming-Siang Huang
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
- National Institute of Cancer Research, National Health Research Institutes, Tainan, Taiwan
- Department of Computer Science and Information Engineering, College of Information and Electrical Engineering, Asia University, Taichung, Taiwan
| | - Jen-Chieh Han
- Intelligent Information Service Research Laboratory, Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Pei-Yen Lin
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
| | - Yu-Ting You
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
| | - Richard Tzong-Han Tsai
- Intelligent Information Service Research Laboratory, Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
- Center for Geographic Information Science, Research Center for Humanities and Social Sciences, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Intelligent Agent Systems Laboratory, Department of Computer Science and Information Engineering, Asia University, New Taipei City, Taiwan
- Department of Computer Science and Information Engineering, College of Information and Electrical Engineering, Asia University, Taichung, Taiwan
| |
Collapse
|
24
|
Jahan I, Laskar MTR, Peng C, Huang JX. A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks. Comput Biol Med 2024; 171:108189. [PMID: 38447502 DOI: 10.1016/j.compbiomed.2024.108189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/14/2024] [Accepted: 02/18/2024] [Indexed: 03/08/2024]
Abstract
Recently, Large Language Models (LLMs) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets has been conducted. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art models when they were fine-tuned only on the training set of these datasets. This suggests that pre-training on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.
Collapse
Affiliation(s)
- Israt Jahan
- Department of Biology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada.
| | - Md Tahmid Rahman Laskar
- School of Information Technology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada; Dialpad Inc., Canada.
| | - Chun Peng
- Department of Biology, York University, Canada.
| | - Jimmy Xiangji Huang
- School of Information Technology, York University, Canada; Information Retrieval and Knowledge Management Research Lab, York University, Canada.
| |
Collapse
|
25
|
Zhu L, Wang L, Yang Z, Xu P, Yang S. PPSNO: A Feature-Rich SNO Sites Predictor by Stacking Ensemble Strategy from Protein Sequence-Derived Information. Interdiscip Sci 2024; 16:192-217. [PMID: 38206557 DOI: 10.1007/s12539-023-00595-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 11/20/2023] [Accepted: 11/21/2023] [Indexed: 01/12/2024]
Abstract
The protein S-nitrosylation (SNO) is a significant post-translational modification that affects the stability, activity, cellular localization, and function of proteins. Therefore, highly accurate prediction of SNO sites aids in grasping biological function mechanisms. In this document, we have constructed a predictor, named PPSNO, forecasting protein SNO sites using stacked integrated learning. PPSNO integrates multiple machine learning techniques into an ensemble model, enhancing its predictive accuracy. First, we established benchmark datasets by collecting SNO sites from various sources, including literature, databases, and other predictors. Second, various techniques for feature extraction are applied to derive characteristics from protein sequences, which are subsequently amalgamated into the PPSNO predictor for training. Five-fold cross-validation experiments show that PPSNO outperformed existing predictors, such as PSNO, PreSNO, pCysMod, DeepNitro, RecSNO, and Mul-SNO. The PPSNO predictor achieved an impressive accuracy of 92.8%, an area under the curve (AUC) of 96.1%, a Matthews correlation coefficient (MCC) of 81.3%, an F1-score of 85.6%, an SN of 79.3%, an SP of 97.7%, and an average precision (AP) of 92.2%. We also employed ROC curves, PR curves, and radar plots to show the superior performance of PPSNO. Our study shows that fused protein sequence features and two-layer stacked ensemble models can improve the accuracy of predicting SNO sites, which can aid in comprehending cellular processes and disease mechanisms. The codes and data are available at https://github.com/serendipity-wly/PPSNO .
Collapse
Affiliation(s)
- Lun Zhu
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| | - Liuyang Wang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| | - Zexi Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| | - Piao Xu
- College of Economics and Management, Nanjing Forestry University, Nanjing, 210037, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China.
- The Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China.
| |
Collapse
|
26
|
Cascella M, Semeraro F, Montomoli J, Bellini V, Piazza O, Bignami E. The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives. J Med Syst 2024; 48:22. [PMID: 38366043 PMCID: PMC10873461 DOI: 10.1007/s10916-024-02045-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/10/2024] [Indexed: 02/18/2024]
Abstract
Within the domain of Natural Language Processing (NLP), Large Language Models (LLMs) represent sophisticated models engineered to comprehend, generate, and manipulate text resembling human language on an extensive scale. They are transformer-based deep learning architectures, obtained through the scaling of model size, pretraining of corpora, and computational resources. The potential healthcare applications of these models primarily involve chatbots and interaction systems for clinical documentation management, and medical literature summarization (Biomedical NLP). The challenge in this field lies in the research for applications in diagnostic and clinical decision support, as well as patient triage. Therefore, LLMs can be used for multiple tasks within patient care, research, and education. Throughout 2023, there has been an escalation in the release of LLMs, some of which are applicable in the healthcare domain. This remarkable output is largely the effect of the customization of pre-trained models for applications like chatbots, virtual assistants, or any system requiring human-like conversational engagement. As healthcare professionals, we recognize the imperative to stay at the forefront of knowledge. However, keeping abreast of the rapid evolution of this technology is practically unattainable, and, above all, understanding its potential applications and limitations remains a subject of ongoing debate. Consequently, this article aims to provide a succinct overview of the recently released LLMs, emphasizing their potential use in the field of medicine. Perspectives for a more extensive range of safe and effective applications are also discussed. The upcoming evolutionary leap involves the transition from an AI-powered model primarily designed for answering medical questions to a more versatile and practical tool for healthcare providers such as generalist biomedical AI systems for multimodal-based calibrated decision-making processes. On the other hand, the development of more accurate virtual clinical partners could enhance patient engagement, offering personalized support, and improving chronic disease management.
Collapse
Affiliation(s)
- Marco Cascella
- Anesthesia and Pain Medicine, Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana", University of Salerno, Via S. Allende, Baronissi, 84081, Italy
| | - Federico Semeraro
- Department of Anesthesia, Intensive Care and Prehospital Emergency, Maggiore Hospital Carlo Alberto Pizzardi, Bologna, Italy
| | - Jonathan Montomoli
- Department of Anesthesia and Intensive Care, Infermi Hospital, AUSL Romagna, Viale Settembrini 2, Rimini, 47923, Italy
| | - Valentina Bellini
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Viale Gramsci 14, Parma, 43126, Italy.
| | - Ornella Piazza
- Anesthesia and Pain Medicine, Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana", University of Salerno, Via S. Allende, Baronissi, 84081, Italy
| | - Elena Bignami
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Viale Gramsci 14, Parma, 43126, Italy
| |
Collapse
|
27
|
Huang L, Xu Y, Wang S, Sang L, Ma H. SRT: Swin-residual transformer for benign and malignant nodules classification in thyroid ultrasound images. Med Eng Phys 2024; 124:104101. [PMID: 38418029 DOI: 10.1016/j.medengphy.2024.104101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 12/13/2023] [Accepted: 01/01/2024] [Indexed: 03/01/2024]
Abstract
With the advancement of deep learning technology, computer-aided diagnosis (CAD) is playing an increasing role in the field of medical diagnosis. In particular, the emergence of Transformer-based models has led to a wider application of computer vision technology in the field of medical image processing. In the diagnosis of thyroid diseases, the diagnosis of benign and malignant thyroid nodules based on the TI-RADS classification is greatly influenced by the subjective judgment of ultrasonographers, and at the same time, it also brings an extremely heavy workload to ultrasonographers. To address this, we propose Swin-Residual Transformer (SRT) in this paper, which incorporates residual blocks and triplet loss into Swin Transformer (SwinT). It improves the sensitivity to global and localized features of thyroid nodules and better distinguishes small feature differences. In our exploratory experiments, SRT model achieves an accuracy of 0.8832 with an AUC of 0.8660, outperforming state-of-the-art convolutional neural network (CNN) and Transformer models. Also, ablation experiments have demonstrated the improved performance in the thyroid nodule classification task after introducing residual blocks and triple loss. These results validate the potential of the proposed SRT model to improve the diagnosis of thyroid nodules' ultrasound images. It also provides a feasible guarantee to avoid excessive puncture sampling of thyroid nodules in future clinical diagnosis.
Collapse
Affiliation(s)
- Long Huang
- Department of Oncology, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, China.
| | - Yanran Xu
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110169, China.
| | - Shuhuan Wang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110169, China.
| | - Liang Sang
- Department of Ultrasound, The First Hospital of China Medical University, Shenyang, Liaoning, 110001, China.
| | - He Ma
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110169, China; National University of Singapore (Suzhou) Research Institute, Suzhou, Jiangsu, 215123, China.
| |
Collapse
|
28
|
Wegner P, Balabin H, Ay MC, Bauermeister S, Killin L, Gallacher J, Hofmann-Apitius M, Salimi Y. Semantic Harmonization of Alzheimer's Disease Datasets Using AD-Mapper. J Alzheimers Dis 2024; 99:1409-1423. [PMID: 38759012 PMCID: PMC11191441 DOI: 10.3233/jad-240116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/18/2024] [Indexed: 05/19/2024]
Abstract
Background Despite numerous past endeavors for the semantic harmonization of Alzheimer's disease (AD) cohort studies, an automatic tool has yet to be developed. Objective As cohort studies form the basis of data-driven analysis, harmonizing them is crucial for cross-cohort analysis. We aimed to accelerate this task by constructing an automatic harmonization tool. Methods We created a common data model (CDM) through cross-mapping data from 20 cohorts, three CDMs, and ontology terms, which was then used to fine-tune a BioBERT model. Finally, we evaluated the model using three previously unseen cohorts and compared its performance to a string-matching baseline model. Results Here, we present our AD-Mapper interface for automatic harmonization of AD cohort studies, which outperformed a string-matching baseline on previously unseen cohort studies. We showcase our CDM comprising 1218 unique variables. Conclusion AD-Mapper leverages semantic similarities in naming conventions across cohorts to improve mapping performance.
Collapse
Affiliation(s)
- Philipp Wegner
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| | - Helena Balabin
- Department of Neurosciences, Laboratory for Cognitive Neurology, KU Leuven, Leuven, Belgium
- Department of Computer Science, Language Intelligence and Information Retrieval Lab, KU Leuven, Leuven, Belgium
| | - Mehmet Can Ay
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Sarah Bauermeister
- Department of Psychiatry, Warneford Hospital, University of Oxford, Oxford, UK
| | - Lewis Killin
- SYNAPSE Research Management Partners, Barcelona, Spain
| | - John Gallacher
- Department of Psychiatry, Warneford Hospital, University of Oxford, Oxford, UK
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Yasamin Salimi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | | | | | | | | | | |
Collapse
|
29
|
Nachtegael C, De Stefani J, Lenaerts T. A study of deep active learning methods to reduce labelling efforts in biomedical relation extraction. PLoS One 2023; 18:e0292356. [PMID: 38100453 PMCID: PMC10723703 DOI: 10.1371/journal.pone.0292356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 09/19/2023] [Indexed: 12/17/2023] Open
Abstract
Automatic biomedical relation extraction (bioRE) is an essential task in biomedical research in order to generate high-quality labelled data that can be used for the development of innovative predictive methods. However, building such fully labelled, high quality bioRE data sets of adequate size for the training of state-of-the-art relation extraction models is hindered by an annotation bottleneck due to limitations on time and expertise of researchers and curators. We show here how Active Learning (AL) plays an important role in resolving this issue and positively improve bioRE tasks, effectively overcoming the labelling limits inherent to a data set. Six different AL strategies are benchmarked on seven bioRE data sets, using PubMedBERT as the base model, evaluating their area under the learning curve (AULC) as well as intermediate results measurements. The results demonstrate that uncertainty-based strategies, such as Least-Confident or Margin Sampling, are statistically performing better in terms of F1-score, accuracy and precision, than other types of AL strategies. However, in terms of recall, a diversity-based strategy, called Core-set, outperforms all strategies. AL strategies are shown to reduce the annotation need (in order to reach a performance at par with training on all data), from 6% to 38%, depending on the data set; with Margin Sampling and Least-Confident Sampling strategies moreover obtaining the best AULCs compared to the Random Sampling baseline. We show through the experiments the importance of using AL methods to reduce the amount of labelling needed to construct high-quality data sets leading to optimal performance of deep learning models. The code and data sets to reproduce all the results presented in the article are available at https://github.com/oligogenic/Deep_active_learning_bioRE.
Collapse
Affiliation(s)
- Charlotte Nachtegael
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Bruxelles, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Bruxelles, Belgium
| | - Jacopo De Stefani
- Machine Learning Group, Université Libre de Bruxelles, Bruxelles, Belgium
- Technology, Policy and Management Faculty, Technische Universiteit Delft, Delft, Netherlands
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, Bruxelles, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Bruxelles, Belgium
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Bruxelles, Belgium
| |
Collapse
|
30
|
Romano MF, Shih LC, Paschalidis IC, Au R, Kolachalama VB. Large Language Models in Neurology Research and Future Practice. Neurology 2023; 101:1058-1067. [PMID: 37816646 PMCID: PMC10752640 DOI: 10.1212/wnl.0000000000207967] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 09/06/2023] [Indexed: 10/12/2023] Open
Abstract
Recent advancements in generative artificial intelligence, particularly using large language models (LLMs), are gaining increased public attention. We provide a perspective on the potential of LLMs to analyze enormous amounts of data from medical records and gain insights on specific topics in neurology. In addition, we explore use cases for LLMs, such as early diagnosis, supporting patient and caregivers, and acting as an assistant for clinicians. We point to the potential ethical and technical challenges raised by LLMs, such as concerns about privacy and data security, potential biases in the data for model training, and the need for careful validation of results. Researchers must consider these challenges and take steps to address them to ensure that their work is conducted in a safe and responsible manner. Despite these challenges, LLMs offer promising opportunities for improving care and treatment of various neurologic disorders.
Collapse
Affiliation(s)
- Michael F Romano
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA
| | - Ludy C Shih
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA
| | - Ioannis C Paschalidis
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA
| | - Rhoda Au
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA
| | - Vijaya B Kolachalama
- From the Department of Medicine (M.F.R., R.A., V.B.K.), Boston University Chobanian & Avedisian School of Medicine, MA; Department of Radiology and Biomedical Imaging (M.F.R.), University of California, San Francisco; Department of Neurology (L.C.S., R.A.), Boston University Chobanian & Avedisian School of Medicine; Department of Electrical and Computer Engineering (I.C.P.), Division of Systems Engineering, and Department of Biomedical Engineering; Faculty of Computing and Data Sciences (I.C.P., V.B.K.), Boston University; Department of Anatomy and Neurobiology (R.A.); The Framingham Heart Study, Boston University Chobanian & Avedisian School of Medicine; Department of Epidemiology, Boston University School of Public Health; Boston University Alzheimer's Disease Research Center (R.A.); and Department of Computer Science (V.B.K.), Boston University, MA.
| |
Collapse
|
31
|
Dong H, Donegan S, Shah M, Chi Y. A lightweight transformer for faster and robust EBSD data collection. Sci Rep 2023; 13:21253. [PMID: 38040823 PMCID: PMC10692076 DOI: 10.1038/s41598-023-47936-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 11/19/2023] [Indexed: 12/03/2023] Open
Abstract
Three dimensional electron back-scattered diffraction (EBSD) microscopy is a critical tool in many applications in materials science, yet its data quality can fluctuate greatly during the arduous collection process, particularly via serial-sectioning. Fortunately, 3D EBSD data is inherently sequential, opening up the opportunity to use transformers, state-of-the-art deep learning architectures that have made breakthroughs in a plethora of domains, for data processing and recovery. To be more robust to errors and accelerate this 3D EBSD data collection, we introduce a two step method that recovers missing slices in an 3D EBSD volume, using an efficient transformer model and a projection algorithm to process the transformer's outputs. Overcoming the computational and practical hurdles of deep learning with scarce high dimensional data, we train this model using only synthetic 3D EBSD data with self-supervision and obtain superior recovery accuracy on real 3D EBSD data, compared to existing methods.
Collapse
Affiliation(s)
- Harry Dong
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, 15289, USA.
| | - Sean Donegan
- Air Force Research Laboratory, Materials and Manufacturing Directorate, Wright-Patterson AFB, Dayton, 45433, USA
| | - Megna Shah
- Air Force Research Laboratory, Materials and Manufacturing Directorate, Wright-Patterson AFB, Dayton, 45433, USA
| | - Yuejie Chi
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, 15289, USA
| |
Collapse
|
32
|
Chen L, Qi Y, Wu A, Deng L, Jiang T. TeaBERT: An Efficient Knowledge Infused Cross-Lingual Language Model for Mapping Chinese Medical Entities to the Unified Medical Language System. IEEE J Biomed Health Inform 2023; 27:6029-6038. [PMID: 37703167 DOI: 10.1109/jbhi.2023.3315143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
Medical entity normalization is an important task for medical information processing. The Unified Medical Language System (UMLS), a well-developed medical terminology system, is crucial for medical entity normalization. However, the UMLS primarily consists of English medical terms. For languages other than English, such as Chinese, a significant challenge for normalizing medical entities is the lack of robust terminology systems. To address this issue, we propose a translation-enhancing training strategy that incorporates the translation and synonym knowledge of the UMLS into a language model using the contrastive learning approach. In this work, we proposed a cross-lingual pre-trained language model called TeaBERT, which can align synonymous Chinese and English medical entities across languages at the concept level. As the evaluation results showed, the TeaBERT language model outperformed previous cross-lingual language models with Acc@5 values of 92.54%, 87.14% and 84.77% on the ICD10-CN, CHPO and RealWorld-v2 datasets, respectively. It also achieved a new state-of-the-art cross-lingual entity mapping performance without fine-tuning. The translation-enhancing strategy is applicable to other languages that face the similar challenge due to the absence of well-developed medical terminology systems.
Collapse
|
33
|
Türkmen H, Dikenelli O, Eraslan C, Çallı MC, Özbek SS. BioBERTurk: Exploring Turkish Biomedical Language Model Development Strategies in Low-Resource Setting. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023; 7:433-446. [PMID: 37927378 PMCID: PMC10620363 DOI: 10.1007/s41666-023-00140-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 03/06/2023] [Accepted: 07/28/2023] [Indexed: 11/07/2023]
Abstract
Pretrained language models augmented with in-domain corpora show impressive results in biomedicine and clinical Natural Language Processing (NLP) tasks in English. However, there has been minimal work in low-resource languages. Although some pioneering works have shown promising results, many scenarios still need to be explored to engineer effective pretrained language models in biomedicine for low-resource settings. This study introduces the BioBERTurk family and four pretrained models in Turkish for biomedicine. To evaluate the models, we also introduced a labeled dataset to classify radiology reports of head CT examinations. Two parts of the reports, impressions and findings, are evaluated separately to observe the performance of models on longer and less informative text. We compared the models with the Turkish BERT (BERTurk) pretrained with general domain text, multilingual BERT (mBERT), and LSTM+attention-based baseline models. The first model initialized from BERTurk and then further pretrained with biomedical corpus performs statistically better than BERTurk, multilingual BERT, and baseline for both datasets. The second model continues to pretrain the BERTurk model by using only radiology Ph.D. theses to test the effect of task-related text. This model slightly outperformed all models on the impression dataset and showed that using only radiology-related data for continual pre-training could be effective. The third model continues to pretrain by adding radiology theses to the biomedical corpus but does not show a statistically meaningful difference for both datasets. The final model combines radiology and biomedicine corpora with the corpus of BERTurk and pretrains a BERT model from scratch. This model is the worst-performing model of the BioBERT family, even worse than BERTurk and multilingual BERT.
Collapse
Affiliation(s)
- Hazal Türkmen
- Department of Computer Engineering, Ege University, 35100 İzmir, Turkey
| | - Oğuz Dikenelli
- Department of Computer Engineering, Ege University, 35100 İzmir, Turkey
| | - Cenk Eraslan
- Department of Radiology, Ege University, 35100 İzmir, Turkey
| | | | | |
Collapse
|
34
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
35
|
Holmes J, Liu Z, Zhang L, Ding Y, Sio TT, McGee LA, Ashman JB, Li X, Liu T, Shen J, Liu W. Evaluating large language models on a highly-specialized topic, radiation oncology physics. Front Oncol 2023; 13:1219326. [PMID: 37529688 PMCID: PMC10388568 DOI: 10.3389/fonc.2023.1219326] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/12/2023] [Indexed: 08/03/2023] Open
Abstract
Purpose We present the first study to investigate Large Language Models (LLMs) in answering radiation oncology physics questions. Because popular exams like AP Physics, LSAT, and GRE have large test-taker populations and ample test preparation resources in circulation, they may not allow for accurately assessing the true potential of LLMs. This paper proposes evaluating LLMs on a highly-specialized topic, radiation oncology physics, which may be more pertinent to scientific and medical communities in addition to being a valuable benchmark of LLMs. Methods We developed an exam consisting of 100 radiation oncology physics questions based on our expertise. Four LLMs, ChatGPT (GPT-3.5), ChatGPT (GPT-4), Bard (LaMDA), and BLOOMZ, were evaluated against medical physicists and non-experts. The performance of ChatGPT (GPT-4) was further explored by being asked to explain first, then answer. The deductive reasoning capability of ChatGPT (GPT-4) was evaluated using a novel approach (substituting the correct answer with "None of the above choices is the correct answer."). A majority vote analysis was used to approximate how well each group could score when working together. Results ChatGPT GPT-4 outperformed all other LLMs and medical physicists, on average, with improved accuracy when prompted to explain before answering. ChatGPT (GPT-3.5 and GPT-4) showed a high level of consistency in its answer choices across a number of trials, whether correct or incorrect, a characteristic that was not observed in the human test groups or Bard (LaMDA). In evaluating deductive reasoning ability, ChatGPT (GPT-4) demonstrated surprising accuracy, suggesting the potential presence of an emergent ability. Finally, although ChatGPT (GPT-4) performed well overall, its intrinsic properties did not allow for further improvement when scoring based on a majority vote across trials. In contrast, a team of medical physicists were able to greatly outperform ChatGPT (GPT-4) using a majority vote. Conclusion This study suggests a great potential for LLMs to work alongside radiation oncology experts as highly knowledgeable assistants.
Collapse
Affiliation(s)
- Jason Holmes
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, United States
| | - Zhengliang Liu
- School of Computing, The University of Georgia, Athens, GA, United States
| | - Lian Zhang
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, United States
| | - Yuzhen Ding
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, United States
| | - Terence T. Sio
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, United States
| | - Lisa A. McGee
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, United States
| | - Jonathan B. Ashman
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, United States
| | - Xiang Li
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Tianming Liu
- School of Computing, The University of Georgia, Athens, GA, United States
| | - Jiajian Shen
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, United States
| | - Wei Liu
- Department of Radiation Oncology, Mayo Clinic, Phoenix, AZ, United States
| |
Collapse
|
36
|
Tian Y, Zhang W, Duan L, McDonald W, Osgood N. Comparison of pretrained transformer-based models for influenza and COVID-19 detection using social media text data in Saskatchewan, Canada. Front Digit Health 2023; 5:1203874. [PMID: 37448834 PMCID: PMC10338115 DOI: 10.3389/fdgth.2023.1203874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 06/02/2023] [Indexed: 07/15/2023] Open
Abstract
Background The use of social media data provides an opportunity to complement traditional influenza and COVID-19 surveillance methods for the detection and control of outbreaks and informing public health interventions. Objective The first aim of this study is to investigate the degree to which Twitter users disclose health experiences related to influenza and COVID-19 that could be indicative of recent plausible influenza cases or symptomatic COVID-19 infections. Second, we seek to use the Twitter datasets to train and evaluate the classification performance of Bidirectional Encoder Representations from Transformers (BERT) and variant language models in the context of influenza and COVID-19 infection detection. Methods We constructed two Twitter datasets using a keyword-based filtering approach on English-language tweets collected from December 2016 to December 2022 in Saskatchewan, Canada. The influenza-related dataset comprised tweets filtered with influenza-related keywords from December 13, 2016, to March 17, 2018, while the COVID-19 dataset comprised tweets filtered with COVID-19 symptom-related keywords from January 1, 2020, to June 22, 2021. The Twitter datasets were cleaned, and each tweet was annotated by at least two annotators as to whether it suggested recent plausible influenza cases or symptomatic COVID-19 cases. We then assessed the classification performance of pre-trained transformer-based language models, including BERT-base, BERT-large, RoBERTa-base, RoBERT-large, BERTweet-base, BERTweet-covid-base, BERTweet-large, and COVID-Twitter-BERT (CT-BERT) models, on each dataset. To address the notable class imbalance, we experimented with both oversampling and undersampling methods. Results The influenza dataset had 1129 out of 6444 (17.5%) tweets annotated as suggesting recent plausible influenza cases. The COVID-19 dataset had 924 out of 11939 (7.7%) tweets annotated as inferring recent plausible COVID-19 cases. When compared against other language models on the COVID-19 dataset, CT-BERT performed the best, supporting the highest scores for recall (94.8%), F1(94.4%), and accuracy (94.6%). For the influenza dataset, BERTweet models exhibited better performance. Our results also showed that applying data balancing techniques such as oversampling or undersampling method did not lead to improved model performance. Conclusions Utilizing domain-specific language models for monitoring users' health experiences related to influenza and COVID-19 on social media shows improved classification performance and has the potential to supplement real-time disease surveillance.
Collapse
Affiliation(s)
| | | | | | | | - Nathaniel Osgood
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
37
|
Greco CM, Simeri A, Tagarelli A, Zumpano E. Transformer-based Language Models for Mental Health issues: a Survey. Pattern Recognit Lett 2023. [DOI: 10.1016/j.patrec.2023.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
|
38
|
Kothuru S, Santhanavijayan A. Identifying COVID-19 english informative tweets using limited labelled data. SOCIAL NETWORK ANALYSIS AND MINING 2023; 13:25. [PMID: 36686376 PMCID: PMC9844936 DOI: 10.1007/s13278-023-01025-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 12/28/2022] [Accepted: 01/07/2023] [Indexed: 01/19/2023]
Abstract
Identifying COVID-19 informative tweets is very useful in building monitoring systems to track the latest updates. Existing approaches to identify informative tweets rely on a large number of labelled tweets to achieve good performances. As labelling is an expensive and laborious process, there is a need to develop approaches that can identify COVID-19 informative tweets using limited labelled data. In this paper, we propose a simple yet novel labelled data-efficient approach that achieves the state-of-the-art (SOTA) F1-score of 91.23 on the WNUT COVID-19 dataset using just 1000 tweets (14.3% of the full training set). Our labelled data-efficient approach starts with limited labelled data, augment it using data augmentation methods and then fine-tune the model using augmented data set. It is the first work to approach the task of identifying COVID-19 English informative tweets using limited labelled data yet achieve the new SOTA performance.
Collapse
Affiliation(s)
- Srinivasulu Kothuru
- Department of Computer Science and Engineering, National Institute of Technology, Thuvakudi, Tiruchirappalli, Tamil Nadu 620015 India
| | - A. Santhanavijayan
- Department of Computer Science and Engineering, National Institute of Technology, Thuvakudi, Tiruchirappalli, Tamil Nadu 620015 India
| |
Collapse
|
39
|
End-to-End Transformer-Based Models in Textual-Based NLP. AI 2023. [DOI: 10.3390/ai4010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Transformer architectures are highly expressive because they use self-attention mechanisms to encode long-range dependencies in the input sequences. In this paper, we present a literature review on Transformer-based (TB) models, providing a detailed overview of each model in comparison to the Transformer’s standard architecture. This survey focuses on TB models used in the field of Natural Language Processing (NLP) for textual-based tasks. We begin with an overview of the fundamental concepts at the heart of the success of these models. Then, we classify them based on their architecture and training mode. We compare the advantages and disadvantages of popular techniques in terms of architectural design and experimental value. Finally, we discuss open research, directions, and potential future work to help solve current TB application challenges in NLP.
Collapse
|
40
|
Vachmanus S, Noraset T, Piyanonpong W, Rattananukrom T, Tuarob S. DeepMetaForge: A Deep Vision-Transformer Metadata-Fusion Network for Automatic Skin Lesion Classification. IEEE ACCESS 2023; 11:145467-145484. [DOI: 10.1109/access.2023.3345225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
Affiliation(s)
- Sirawich Vachmanus
- Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
| | - Thanapon Noraset
- Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
| | - Waritsara Piyanonpong
- Division of Dermatology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Teerapong Rattananukrom
- Division of Dermatology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Suppawong Tuarob
- Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
| |
Collapse
|
41
|
Serna García G, Al Khalaf R, Invernici F, Ceri S, Bernasconi A. CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning. Gigascience 2022; 12:giad036. [PMID: 37222749 PMCID: PMC10205000 DOI: 10.1093/gigascience/giad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract-for each variant/mutation-its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus. RESULTS The proposed framework comprises (i) the provisioning of abstracts from a COVID-19-related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples. CONCLUSIONS The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.
Collapse
Affiliation(s)
- Giuseppe Serna García
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Ruba Al Khalaf
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Francesco Invernici
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Stefano Ceri
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Anna Bernasconi
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| |
Collapse
|
42
|
Chandler M, Jain S, Halman J, Hong E, Dobrovolskaia MA, Zakharov AV, Afonin KA. Artificial Immune Cell, AI-cell, a New Tool to Predict Interferon Production by Peripheral Blood Monocytes in Response to Nucleic Acid Nanoparticles. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2022; 18:e2204941. [PMID: 36216772 PMCID: PMC9671856 DOI: 10.1002/smll.202204941] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/15/2022] [Indexed: 06/16/2023]
Abstract
Nucleic acid nanoparticles, or NANPs, rationally designed to communicate with the human immune system, can offer innovative therapeutic strategies to overcome the limitations of traditional nucleic acid therapies. Each set of NANPs is unique in their architectural parameters and physicochemical properties, which together with the type of delivery vehicles determine the kind and the magnitude of their immune response. Currently, there are no predictive tools that would reliably guide the design of NANPs to the desired immunological outcome, a step crucial for the success of personalized therapies. Through a systematic approach investigating physicochemical and immunological profiles of a comprehensive panel of various NANPs, the research team developes and experimentally validates a computational model based on the transformer architecture able to predict the immune activities of NANPs. It is anticipated that the freely accessible computational tool that is called an "artificial immune cell," or AI-cell, will aid in addressing the current critical public health challenges related to safety criteria of nucleic acid therapies in a timely manner and promote the development of novel biomedical tools.
Collapse
Affiliation(s)
- Morgan Chandler
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Sankalp Jain
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Justin Halman
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Enping Hong
- Nanotechnology Characterization Lab, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Marina A. Dobrovolskaia
- Nanotechnology Characterization Lab, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Alexey V. Zakharov
- National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD 20850, USA
| | - Kirill A. Afonin
- Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
43
|
Naseem U, Dunn AG, Khushi M, Kim J. Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT. BMC Bioinformatics 2022; 23:144. [PMID: 35448946 PMCID: PMC9022356 DOI: 10.1186/s12859-022-04688-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/31/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The abundance of biomedical text data coupled with advances in natural language processing (NLP) is resulting in novel biomedical NLP (BioNLP) applications. These NLP applications, or tasks, are reliant on the availability of domain-specific language models (LMs) that are trained on a massive amount of data. Most of the existing domain-specific LMs adopted bidirectional encoder representations from transformers (BERT) architecture which has limitations, and their generalizability is unproven as there is an absence of baseline results among common BioNLP tasks. RESULTS We present 8 variants of BioALBERT, a domain-specific adaptation of a lite bidirectional encoder representations from transformers (ALBERT), trained on biomedical (PubMed and PubMed Central) and clinical (MIMIC-III) corpora and fine-tuned for 6 different tasks across 20 benchmark datasets. Experiments show that a large variant of BioALBERT trained on PubMed outperforms the state-of-the-art on named-entity recognition (+ 11.09% BLURB score improvement), relation extraction (+ 0.80% BLURB score), sentence similarity (+ 1.05% BLURB score), document classification (+ 0.62% F1-score), and question answering (+ 2.83% BLURB score). It represents a new state-of-the-art in 5 out of 6 benchmark BioNLP tasks. CONCLUSIONS The large variant of BioALBERT trained on PubMed achieved a higher BLURB score than previous state-of-the-art models on 5 of the 6 benchmark BioNLP tasks. Depending on the task, 5 different variants of BioALBERT outperformed previous state-of-the-art models on 17 of the 20 benchmark datasets, showing that our model is robust and generalizable in the common BioNLP tasks. We have made BioALBERT freely available which will help the BioNLP community avoid computational cost of training and establish a new set of baselines for future efforts across a broad range of BioNLP tasks.
Collapse
Affiliation(s)
- Usman Naseem
- School of Computer Science, The University of Sydney, Sydney, Australia.
| | - Adam G Dunn
- Biomedical Informatics and Digital Health and Faculty of Medicine and Health, School of Medical Sciences, The University of Sydney, Sydney, Australia
| | - Matloob Khushi
- School of Computer Science, The University of Sydney, Sydney, Australia.,School of EAST, University of Suffolk, Ipswich, UK
| | - Jinman Kim
- School of Computer Science, The University of Sydney, Sydney, Australia
| |
Collapse
|
44
|
Kalyan KS, Rajasekharan A, Sangeetha S. AMMU: A survey of transformer-based biomedical pretrained language models. J Biomed Inform 2021; 126:103982. [PMID: 34974190 DOI: 10.1016/j.jbi.2021.103982] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/12/2021] [Accepted: 12/20/2021] [Indexed: 01/04/2023]
Abstract
Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioELECTRA and BioALBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs). In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. We discuss core concepts of transformer-based PLMs like pretraining methods, pretraining tasks, fine-tuning methods, and various embedding types specific to biomedical domain. We introduce a taxonomy for transformer-based BPLMs and then discuss all the models. We discuss various challenges and present possible solutions. We conclude by highlighting some of the open issues which will drive the research community to further improve transformer-based BPLMs. The list of all the publicly available transformer-based BPLMs along with their links is provided at https://mr-nlp.github.io/posts/2021/05/transformer-based-biomedical-pretrained-language-models-list/.
Collapse
|