1
|
Li YH, Li YL, Wei MY, Li GY. Innovation and challenges of artificial intelligence technology in personalized healthcare. Sci Rep 2024; 14:18994. [PMID: 39152194 PMCID: PMC11329630 DOI: 10.1038/s41598-024-70073-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Accepted: 08/12/2024] [Indexed: 08/19/2024] Open
Abstract
As the burgeoning field of Artificial Intelligence (AI) continues to permeate the fabric of healthcare, particularly in the realms of patient surveillance and telemedicine, a transformative era beckons. This manuscript endeavors to unravel the intricacies of recent AI advancements and their profound implications for reconceptualizing the delivery of medical care. Through the introduction of innovative instruments such as virtual assistant chatbots, wearable monitoring devices, predictive analytic models, personalized treatment regimens, and automated appointment systems, AI is not only amplifying the quality of care but also empowering patients and fostering a more interactive dynamic between the patient and the healthcare provider. Yet, this progressive infiltration of AI into the healthcare sphere grapples with a plethora of challenges hitherto unseen. The exigent issues of data security and privacy, the specter of algorithmic bias, the requisite adaptability of regulatory frameworks, and the matter of patient acceptance and trust in AI solutions demand immediate and thoughtful resolution .The importance of establishing stringent and far-reaching policies, ensuring technological impartiality, and cultivating patient confidence is paramount to ensure that AI-driven enhancements in healthcare service provision remain both ethically sound and efficient. In conclusion, we advocate for an expansion of research efforts aimed at navigating the ethical complexities inherent to a technology-evolving landscape, catalyzing policy innovation, and devising AI applications that are not only clinically effective but also earn the trust of the patient populace. By melding expertise across disciplines, we stand at the threshold of an era wherein AI's role in healthcare is both ethically unimpeachable and conducive to elevating the global health quotient.
Collapse
Affiliation(s)
- Yu-Hao Li
- International School, Beijing University of Posts and Telecommunications, Bei Jing, 100876, China
| | - Yu-Lin Li
- Department of Ophthalmology, The Second Norman Bethune Hospital of Jilin University, Changchun, 130000, China
| | - Mu-Yang Wei
- Department of Ophthalmology, The Second Norman Bethune Hospital of Jilin University, Changchun, 130000, China
| | - Guang-Yu Li
- Department of Ophthalmology, The Second Norman Bethune Hospital of Jilin University, Changchun, 130000, China.
| |
Collapse
|
2
|
Chen M, Wu Y, Wingerd B, Liu Z, Xu J, Thakkar S, Pedersen TJ, Donnelly T, Mann N, Tong W, Wolfinger RD, Bao W. Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost. Front Artif Intell 2024; 7:1401810. [PMID: 38887604 PMCID: PMC11181907 DOI: 10.3389/frai.2024.1401810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Accepted: 05/17/2024] [Indexed: 06/20/2024] Open
Abstract
Introduction Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources. Methods We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA's DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word/term/token. Results The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA). Discussion Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.
Collapse
Affiliation(s)
- Minjun Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Yue Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Byron Wingerd
- JMP Statistical Discovery LLC, Cary, NC, United States
| | - Zhichao Liu
- Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, CT, United States
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Shraddha Thakkar
- Department of Pharmaceutical Sciences, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | | | - Tom Donnelly
- JMP Statistical Discovery LLC, Cary, NC, United States
| | - Nicholas Mann
- Department of Mathematics, The University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | | | - Wenjun Bao
- JMP Statistical Discovery LLC, Cary, NC, United States
| |
Collapse
|
3
|
Jonnakuti VS, Wagner EJ, Maletić-Savatić M, Liu Z, Yalamanchili HK. PolyAMiner-Bulk is a deep learning-based algorithm that decodes alternative polyadenylation dynamics from bulk RNA-seq data. CELL REPORTS METHODS 2024; 4:100707. [PMID: 38325383 PMCID: PMC10921021 DOI: 10.1016/j.crmeth.2024.100707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/13/2023] [Accepted: 01/11/2024] [Indexed: 02/09/2024]
Abstract
Alternative polyadenylation (APA) is a key post-transcriptional regulatory mechanism; yet, its regulation and impact on human diseases remain understudied. Existing bulk RNA sequencing (RNA-seq)-based APA methods predominantly rely on predefined annotations, severely impacting their ability to decode novel tissue- and disease-specific APA changes. Furthermore, they only account for the most proximal and distal cleavage and polyadenylation sites (C/PASs). Deconvoluting overlapping C/PASs and the inherent noisy 3' UTR coverage in bulk RNA-seq data pose additional challenges. To overcome these limitations, we introduce PolyAMiner-Bulk, an attention-based deep learning algorithm that accurately recapitulates C/PAS sequence grammar, resolves overlapping C/PASs, captures non-proximal-to-distal APA changes, and generates visualizations to illustrate APA dynamics. Evaluation on multiple datasets strongly evinces the performance merit of PolyAMiner-Bulk, accurately identifying more APA changes compared with other methods. With the growing importance of APA and the abundance of bulk RNA-seq data, PolyAMiner-Bulk establishes a robust paradigm of APA analysis.
Collapse
Affiliation(s)
- Venkata Soumith Jonnakuti
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX 77030, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric J Wagner
- Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA
| | - Mirjana Maletić-Savatić
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA
| | - Zhandong Liu
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; Program in Quantitative and Computational Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hari Krishna Yalamanchili
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
4
|
Mostafa F, Chen M. Computational models for predicting liver toxicity in the deep learning era. FRONTIERS IN TOXICOLOGY 2024; 5:1340860. [PMID: 38312894 PMCID: PMC10834666 DOI: 10.3389/ftox.2023.1340860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 12/22/2023] [Indexed: 02/06/2024] Open
Abstract
Drug-induced liver injury (DILI) is a severe adverse reaction caused by drugs and may result in acute liver failure and even death. Many efforts have centered on mitigating risks associated with potential DILI in humans. Among these, quantitative structure-activity relationship (QSAR) was proven to be a valuable tool for early-stage hepatotoxicity screening. Its advantages include no requirement for physical substances and rapid delivery of results. Deep learning (DL) made rapid advancements recently and has been used for developing QSAR models. This review discusses the use of DL in predicting DILI, focusing on the development of QSAR models employing extensive chemical structure datasets alongside their corresponding DILI outcomes. We undertake a comprehensive evaluation of various DL methods, comparing with those of traditional machine learning (ML) approaches, and explore the strengths and limitations of DL techniques regarding their interpretability, scalability, and generalization. Overall, our review underscores the potential of DL methodologies to enhance DILI prediction and provides insights into future avenues for developing predictive models to mitigate DILI risk in humans.
Collapse
Affiliation(s)
- Fahad Mostafa
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, United States
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| | - Minjun Chen
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, United States
| |
Collapse
|
5
|
Wu L, Gray M, Dang O, Xu J, Fang H, Tong W. RxBERT: Enhancing drug labeling text mining and analysis with AI language modeling. Exp Biol Med (Maywood) 2023; 248:1937-1943. [PMID: 38166420 PMCID: PMC10798181 DOI: 10.1177/15353702231220669] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 11/02/2023] [Indexed: 01/04/2024] Open
Abstract
The US drug labeling document contains essential information on drug efficacy and safety, making it a crucial regulatory resource for Food and Drug Administration (FDA) drug reviewers. Due to its extensive volume and the presence of free-text, conventional text mining analysis have encountered challenges in processing these data. Recent advances in artificial intelligence (AI) for natural language processing (NLP) have provided an unprecedented opportunity to identify key information from drug labeling, thereby enhancing safety reviews and support for regulatory decisions. We developed RxBERT, a Bidirectional Encoder Representations from Transformers (BERT) model pretrained on FDA human prescription drug labeling documents for an enhanced application of drug labeling documents in both research and drug review. RxBERT was derived from BioBERT with further training on human prescription drug labeling documents. RxBERT was demonstrated in several tasks using regulatory datasets, including those involved in the National Institutes of Technology Text Analysis Challenge Dataset (NIST TAC dataset), the FDA Adverse Drug Event Evaluation Dataset (ADE Eval dataset), and the classification of texts from submission packages into labeling sections (US Drug Labeling dataset). For all these tasks, RxBERT reached 86.5 F1-scores in both TAC and ADE Eval classification, respectively, and prediction accuracy of 87% for the US Drug Labeling dataset. Overall, RxBERT was shown to be as competitive or have better performance compared to other NLP approaches such as BERT, BioBERT, etc. In summary, we developed RxBERT, a transformer-based model specific for drug labeling that outperformed the original BERT model. RxBERT has the potential to be used to assist research scientists and FDA reviewers to better process and utilize drug labeling information toward the advancement of drug effectiveness and safety for public health. This proof-of-concept study also demonstrated a potential pathway to customized large language models (LLMs) tailored to the sensitive regulatory documents for internal application.
Collapse
Affiliation(s)
- Leihong Wu
- Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Magnus Gray
- Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Oanh Dang
- Office of Surveillance and Epidemiology, FDA Center for Drug Evaluation and Research, Silver Spring, MD 20993, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Hong Fang
- Office of Scientific Coordination, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research, Jefferson, AR 72079, USA
| |
Collapse
|
6
|
Sinha K, Ghosh N, Sil PC. A Review on the Recent Applications of Deep Learning in Predictive Drug Toxicological Studies. Chem Res Toxicol 2023; 36:1174-1205. [PMID: 37561655 DOI: 10.1021/acs.chemrestox.2c00375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Drug toxicity prediction is an important step in ensuring patient safety during drug design studies. While traditional preclinical studies have historically relied on animal models to evaluate toxicity, recent advances in deep-learning approaches have shown great promise in advancing drug safety science and reducing animal use in preclinical studies. However, deep-learning-based approaches also face challenges in handling large biological data sets, model interpretability, and regulatory acceptance. In this review, we provide an overview of recent developments in deep-learning-based approaches for predicting drug toxicity, highlighting their potential advantages over traditional methods and the need to address their limitations. Deep-learning models have demonstrated excellent performance in predicting toxicity outcomes from various data sources such as chemical structures, genomic data, and high-throughput screening assays. The potential of deep learning for automated feature engineering is also discussed. This review emphasizes the need to address ethical concerns related to the use of deep learning in drug toxicity studies, including the reduction of animal use and ensuring regulatory acceptance. Furthermore, emerging applications of deep learning in drug toxicity prediction, such as predicting drug-drug interactions and toxicity in rare subpopulations, are highlighted. The integration of deep-learning-based approaches with traditional methods is discussed as a way to develop more reliable and efficient predictive models for drug safety assessment, paving the way for safer and more effective drug discovery and development. Overall, this review highlights the critical role of deep learning in predictive toxicology and drug safety evaluation, emphasizing the need for continued research and development in this rapidly evolving field. By addressing the limitations of traditional methods, leveraging the potential of deep learning for automated feature engineering, and addressing ethical concerns, deep-learning-based approaches have the potential to revolutionize drug toxicity prediction and improve patient safety in drug discovery and development.
Collapse
Affiliation(s)
- Krishnendu Sinha
- Department of Zoology, Jhargram Raj College, Jhargram 721507, West Bengal, India
| | - Nabanita Ghosh
- Department of Zoology, Maulana Azad College, Kolkata 700013, West Bengal, India
| | - Parames C Sil
- Division of Molecular Medicine, Bose Institute, Kolkata 700054, West Bengal, India
| |
Collapse
|
7
|
Wu L, Ali S, Ali H, Brock T, Xu J, Tong W. NeuroCORD: A Language Model to Facilitate COVID-19-Associated Neurological Disorder Studies. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:9974. [PMID: 36011614 PMCID: PMC9408703 DOI: 10.3390/ijerph19169974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/03/2022] [Accepted: 08/05/2022] [Indexed: 06/15/2023]
Abstract
COVID-19 can lead to multiple severe outcomes including neurological and psychological impacts. However, it is challenging to manually scan hundreds of thousands of COVID-19 articles on a regular basis. To update our knowledge, provide sound science to the public, and communicate effectively, it is critical to have an efficient means of following the most current published data. In this study, we developed a language model to search abstracts using the most advanced artificial intelligence (AI) to accurately retrieve articles on COVID-19-associated neurological disorders. We applied this NeuroCORD model to the largest benchmark dataset of COVID-19, CORD-19. We found that the model developed on the training set yielded 94% prediction accuracy on the test set. This result was subsequently verified by two experts in the field. In addition, when applied to 96,000 non-labeled articles that were published after 2020, the NeuroCORD model accurately identified approximately 3% of them to be relevant for the study of COVID-19-associated neurological disorders, while only 0.5% were retrieved using conventional keyword searching. In conclusion, NeuroCORD provides an opportunity to profile neurological disorders resulting from COVID-19 in a rapid and efficient fashion, and its general framework could be used to study other COVID-19-related emerging health issues.
Collapse
Affiliation(s)
- Leihong Wu
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
| | - Syed Ali
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
| | - Heather Ali
- Department of Internal Medicine, University of Arkansas for Medical Sciences, 4301 West Markham, Little Rock, AR 72205, USA
| | - Tyrone Brock
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
- Department of Mathematics and Computer Science, University of Arkansas at Pine Bluff, 1200 University Drive, Pine Bluff, AR 71601, USA
| | - Joshua Xu
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
| | - Weida Tong
- National Center for Toxicological Research, Food and Drug Administration, 3900 NCTR Rd., Jefferson, AR 72079, USA
| |
Collapse
|
8
|
Katritsis NM, Liu A, Youssef G, Rathee S, MacMahon M, Hwang W, Wollman L, Han N. dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature. Front Genet 2022; 13:894209. [PMID: 36017500 PMCID: PMC9395939 DOI: 10.3389/fgene.2022.894209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 06/17/2022] [Indexed: 11/13/2022] Open
Abstract
Drug-Induced Liver Injury (DILI), despite its low occurrence rate, can cause severe side effects or even lead to death. Thus, it is one of the leading causes for terminating the development of new, and restricting the use of already-circulating, drugs. Moreover, its multifactorial nature, combined with a clinical presentation that often mimics other liver diseases, complicate the identification of DILI-related (or “positive”) literature, which remains the main medium for sourcing results from the clinical practice and experimental studies. This work–contributing to the “Literature AI for DILI Challenge” of the Critical Assessment of Massive Data Analysis (CAMDA) 2021– presents an automated pipeline for distinguishing between DILI-positive and negative publications. We used Natural Language Processing (NLP) to filter out the uninformative parts of a text, and identify and extract mentions of chemicals and diseases. We combined that information with small-molecule and disease embeddings, which are capable of capturing chemical and disease similarities, to improve classification performance. The former were directly sourced from the Chemical Checker (CC). For the latter, we collected data that encode different aspects of disease similarity from the National Library of Medicine’s (NLM) Medical Subject Headings (MeSH) thesaurus and the Comparative Toxicogenomics Database (CTD). Following a similar procedure as the one used in the CC, vector representations for diseases were learnt and evaluated. Two Neural Network (NN) classifiers were developed: a baseline model that accepts texts as input and an augmented, extended, model that also utilises chemical and disease embeddings. We trained, validated, and tested the classifiers through a Nested Cross-Validation (NCV) scheme with 10 outer and 5 inner folds. During this, the baseline and extended models performed virtually identically, with F1-scores of 95.04 ± 0.61% and 94.80 ± 0.41%, respectively. Upon validation on an external, withheld, dataset that is meant to assess classifier generalisability, the extended model achieved an F1-score of 91.14 ± 1.62%, outperforming its baseline counterpart which received a lower score of 88.30 ± 2.44%. We make further comparisons between the classifiers and discuss future improvements and directions, including utilising chemical and disease embeddings for visualisation and exploratory analysis of the DILI-positive literature.
Collapse
Affiliation(s)
- Nicholas M. Katritsis
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Nicholas M. Katritsis, ; Namshik Han,
| | - Anika Liu
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| | - Gehad Youssef
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Sanjay Rathee
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Méabh MacMahon
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Centre for Therapeutics Discovery, LifeArc, Stevenage, United Kingdom
| | - Woochang Hwang
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Lilly Wollman
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Namshik Han
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Cambridge Centre for AI in Medicine, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Nicholas M. Katritsis, ; Namshik Han,
| |
Collapse
|
9
|
Rathee S, MacMahon M, Liu A, Katritsis NM, Youssef G, Hwang W, Wollman L, Han N. DILI C : An AI-Based Classifier to Search for Drug-Induced Liver Injury Literature. Front Genet 2022; 13:867946. [PMID: 35846129 PMCID: PMC9277181 DOI: 10.3389/fgene.2022.867946] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/11/2022] [Indexed: 01/15/2023] Open
Abstract
Drug-induced liver injury (DILI) is a class of adverse drug reactions (ADR) that causes problems in both clinical and research settings. It is the most frequent cause of acute liver failure in the majority of Western countries and is a major cause of attrition of novel drug candidates. Manual trawling of the literature is the main route of deriving information on DILI from research studies. This makes it an inefficient process prone to human error. Therefore, an automatized AI model capable of retrieving DILI-related articles from the huge ocean of literature could be invaluable for the drug discovery community. In this study, we built an artificial intelligence (AI) model combining the power of natural language processing (NLP) and machine learning (ML) to address this problem. This model uses NLP to filter out meaningless text (e.g., stop words) and uses customized functions to extract relevant keywords such as singleton, pair, and triplet. These keywords are processed by an apriori pattern mining algorithm to extract relevant patterns which are used to estimate initial weightings for a ML classifier. Along with pattern importance and frequency, an FDA-approved drug list mentioning DILI adds extra confidence in classification. The combined power of these methods builds a DILI classifier (DILI C ), with 94.91% cross-validation and 94.14% external validation accuracy. To make DILI C as accessible as possible, including to researchers without coding experience, an R Shiny app capable of classifying single or multiple entries for DILI is developed to enhance ease of user experience and made available at https://researchmind.co.uk/diliclassifier/. Additionally, a GitHub link (https://github.com/sanjaysinghrathi/DILI-Classifier) for app source code and ISMB extended video talk (https://www.youtube.com/watch?v=j305yIVi_f8) are available as supplementary materials.
Collapse
Affiliation(s)
- Sanjay Rathee
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Meabh MacMahon
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,LifeArc, Stevenage, United Kingdom
| | - Anika Liu
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| | - Nicholas M Katritsis
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | - Gehad Youssef
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Woochang Hwang
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Lilly Wollman
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Namshik Han
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Cambridge Centre for AI in Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
10
|
Chicco D, Jurman G. An Invitation to Greater Use of Matthews Correlation Coefficient in Robotics and Artificial Intelligence. Front Robot AI 2022; 9:876814. [PMID: 35402520 PMCID: PMC8993212 DOI: 10.3389/frobt.2022.876814] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/07/2022] [Indexed: 11/17/2022] Open
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Canada
| | - Giuseppe Jurman
- Data Science for Health Unit, Fondazione Bruno Kessler, Trento, Italy
| |
Collapse
|