1
|
Goumas G, Vlachothanasi EN, Fradelos EC, Mouliou DS. Biosensors, Artificial Intelligence Biosensors, False Results and Novel Future Perspectives. Diagnostics (Basel) 2025; 15:1037. [PMID: 40310427 PMCID: PMC12025796 DOI: 10.3390/diagnostics15081037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2025] [Revised: 04/09/2025] [Accepted: 04/16/2025] [Indexed: 05/02/2025] Open
Abstract
Medical biosensors have set the basis of medical diagnostics, and Artificial Intelligence (AI) has boosted diagnostics to a great extent. However, false results are evident in every method, so it is crucial to identify the reasons behind a possible false result in order to control its occurrence. This is the first critical state-of-the-art review article to discuss all the commonly used biosensor types and the reasons that can give rise to potential false results. Furthermore, AI is discussed in parallel with biosensors and their misdiagnoses, and again some reasons for possible false results are discussed. Finally, an expert opinion with further future perspectives is presented based on general expert insights, in order for some false diagnostic results of biosensors and AI biosensors to be surpassed.
Collapse
Affiliation(s)
- Georgios Goumas
- School of Public Health, University of West Attica, 12243 Athens, Greece;
| | - Efthymia N. Vlachothanasi
- Laboratory of Clinical Nursing, Department of Nursing, University of Thessaly Larissa, 41334 Larissa, Greece; (E.N.V.); (E.C.F.)
| | - Evangelos C. Fradelos
- Laboratory of Clinical Nursing, Department of Nursing, University of Thessaly Larissa, 41334 Larissa, Greece; (E.N.V.); (E.C.F.)
| | | |
Collapse
|
2
|
Ambreen S, Umar M, Noor A, Jain H, Ali R. Advanced AI and ML frameworks for transforming drug discovery and optimization: With innovative insights in polypharmacology, drug repurposing, combination therapy and nanomedicine. Eur J Med Chem 2025; 284:117164. [PMID: 39721292 DOI: 10.1016/j.ejmech.2024.117164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 12/28/2024]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are transforming drug discovery by overcoming traditional challenges like high costs, time-consuming, and frequent failures. AI-driven approaches streamline key phases, including target identification, lead optimization, de novo drug design, and drug repurposing. Frameworks such as deep neural networks (DNNs), convolutional neural networks (CNNs), and deep reinforcement learning (DRL) models have shown promise in identifying drug targets, optimizing delivery systems, and accelerating drug repurposing. Generative adversarial networks (GANs) and variational autoencoders (VAEs) aid de novo drug design by creating novel drug-like compounds with desired properties. Case studies, such as DDR1 kinase inhibitors designed using generative models and CDK20 inhibitors developed via structure-based methods, highlight AI's ability to produce highly specific therapeutics. Models like SNF-CVAE and DeepDR further advance drug repurposing by uncovering new therapeutic applications for existing drugs. Advanced ML algorithms enhance precision in predicting drug efficacy, toxicity, and ADME-Tox properties, reducing development costs and improving drug-target interactions. AI also supports polypharmacology by optimizing multi-target drug interactions and enhances combination therapy through predictions of drug synergies and antagonisms. In nanomedicine, AI models like CURATE.AI and the Hartung algorithm optimize personalized treatments by predicting toxicological risks and real-time dosing adjustments with high accuracy. Despite its potential, challenges like data quality, model interpretability, and ethical concerns must be addressed. High-quality datasets, transparent models, and unbiased algorithms are essential for reliable AI applications. As AI continues to evolve, it is poised to revolutionize drug discovery and personalized medicine, advancing therapeutic development and patient care.
Collapse
Affiliation(s)
- Subiya Ambreen
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Mohammad Umar
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Aaisha Noor
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Himangini Jain
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Ruhi Ali
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India.
| |
Collapse
|
3
|
Caniceiro AB, Orzeł U, Rosário-Ferreira N, Filipek S, Moreira IS. Leveraging Artificial Intelligence in GPCR Activation Studies: Computational Prediction Methods as Key Drivers of Knowledge. Methods Mol Biol 2025; 2870:183-220. [PMID: 39543036 DOI: 10.1007/978-1-0716-4213-9_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
G protein-coupled receptors (GPCRs) are key molecules involved in cellular signaling and are attractive targets for pharmacological intervention. This chapter is designed to explore the range of algorithms used to predict GPCRs' activation states, while also examining the pharmaceutical implications of these predictions. Our primary objective is to show how artificial intelligence (AI) is key in GPCR research to reveal the intricate dynamics of activation and inactivation processes, shedding light on the complex regulatory mechanisms of this vital protein family. We describe several computational strategies that leverage diverse structural data from the Protein Data Bank, molecular dynamic simulations, or ligand-based methods to predict the activation states of GPCRs. We demonstrate how the integration of AI into GPCR research not only enhances our understanding of their dynamic properties but also presents immense potential for driving pharmaceutical research and development, offering promising new avenues in the search for newer, better therapeutic agents.
Collapse
Affiliation(s)
- Ana B Caniceiro
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Urszula Orzeł
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Nícia Rosário-Ferreira
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- CIBB - Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - Sławomir Filipek
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal.
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal.
- CIBB - Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
4
|
Bueso-Bordils JI, Antón-Fos GM, Martín-Algarra R, Alemán-López PA. Overview of Computational Toxicology Methods Applied in Drug and Green Chemical Discovery. J Xenobiot 2024; 14:1901-1918. [PMID: 39728409 DOI: 10.3390/jox14040101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 11/20/2024] [Accepted: 12/02/2024] [Indexed: 12/28/2024] Open
Abstract
In the field of computational chemistry, computer models are quickly and cheaply constructed to predict toxicology hazards and results, with no need for test material or animals as these computational predictions are often based on physicochemical properties of chemical structures. Multiple methodologies are employed to support in silico assessments based on machine learning (ML) and deep learning (DL). This review introduces the development of computational toxicology, focusing on ML and DL and emphasizing their importance in the field of toxicology. A fine balance between target potency, selectivity, absorption, distribution, metabolism, excretion, toxicity (ADMET) and clinical safety properties should be achieved to discover a potential new drug. It is advantageous to perform virtual predictions as early as possible in drug development processes, even before a molecule is synthesized. Currently, there are numerous commercially available and free web-based programs for toxicity prediction, which can be used to construct various predictive models. The key features of the QSAR method are also outlined, and the selection of appropriate physicochemical descriptors is a prerequisite for robust predictions. In addition, examples of open-source tools applied to toxicity prediction are included, as well as examples of the application of different computational methods for the prediction of toxicity in drug design and environmental toxicology.
Collapse
Affiliation(s)
- Jose I Bueso-Bordils
- Pharmacy Department, CEU Cardenal Herrera University, CEU Universities C/Ramón y Cajal s/n, Alfara del Patriarca, 46115 Valencia, Spain
| | - Gerardo M Antón-Fos
- Pharmacy Department, CEU Cardenal Herrera University, CEU Universities C/Ramón y Cajal s/n, Alfara del Patriarca, 46115 Valencia, Spain
| | - Rafael Martín-Algarra
- Pharmacy Department, CEU Cardenal Herrera University, CEU Universities C/Ramón y Cajal s/n, Alfara del Patriarca, 46115 Valencia, Spain
| | - Pedro A Alemán-López
- Pharmacy Department, CEU Cardenal Herrera University, CEU Universities C/Ramón y Cajal s/n, Alfara del Patriarca, 46115 Valencia, Spain
| |
Collapse
|
5
|
Borowa A, Rymarczyk D, Żyła M, Kańdula M, Sánchez-Fernández A, Rataj K, Struski Ł, Tabor J, Zieliński B. Decoding phenotypic screening: A comparative analysis of image representations. Comput Struct Biotechnol J 2024; 23:1181-1188. [PMID: 38510976 PMCID: PMC10951426 DOI: 10.1016/j.csbj.2024.02.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/26/2024] [Accepted: 02/26/2024] [Indexed: 03/22/2024] Open
Abstract
Biomedical imaging techniques such as high content screening (HCS) are valuable for drug discovery, but high costs limit their use to pharmaceutical companies. To address this issue, The JUMP-CP consortium released a massive open image dataset of chemical and genetic perturbations, providing a valuable resource for deep learning research. In this work, we aim to utilize the JUMP-CP dataset to develop a universal representation model for HCS data, mainly data generated using U2OS cells and CellPainting protocol, using supervised and self-supervised learning approaches. We propose an evaluation protocol that assesses their performance on mode of action and property prediction tasks using a popular phenotypic screening dataset. Results show that the self-supervised approach that uses data from multiple consortium partners provides representation that is more robust to batch effects whilst simultaneously achieving performance on par with standard approaches. Together with other conclusions, it provides recommendations on the training strategy of a representation model for HCS images.
Collapse
Affiliation(s)
- Adriana Borowa
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
- Jagiellonian University, Doctoral School of Exact and Natural Sciences, Kraków, Poland
- Ardigen SA, Kraków, Poland
| | - Dawid Rymarczyk
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
- Ardigen SA, Kraków, Poland
| | | | | | | | | | - Łukasz Struski
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
| | - Jacek Tabor
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
| | - Bartosz Zieliński
- Jagiellonian University, Faculty of Mathematics and Computer Science, Kraków, Poland
- Ardigen SA, Kraków, Poland
| |
Collapse
|
6
|
Zhuo Y, Song Z, Ge Z. Security Versus Accuracy: Trade-Off Data Modeling to Safe Fault Classification Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12095-12106. [PMID: 37028378 DOI: 10.1109/tnnls.2023.3251999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
While the data-driven fault classification systems have achieved great success and been widely deployed, machine-learning-based models have recently been shown to be unsafe and vulnerable to tiny perturbations, i.e., adversarial attack. For the safety-critical industrial scenarios, the adversarial security (i.e., adversarial robustness) of the fault system should be taken into serious consideration. However, security and accuracy are intrinsically conflicting, which is a trade-off issue. In this article, we first study this new trade-off issue in the design of fault classification models and solve it from a brand new view, hyperparameter optimization (HPO). Meanwhile, to reduce the computational expense of HPO, we propose a new multiobjective (MO), multifidelity (MF) Bayesian optimization (BO) algorithm, MMTPE. The proposed algorithm is evaluated on safety-critical industrial datasets with the mainstream machine learning (ML) models. The results show that the following hold: 1) MMTPE is superior to other advanced optimization algorithms in both efficiency and performance and 2) fault classification models with optimized hyperparameters are competitive with advanced adversarially defensive methods. Moreover, insights into the model security are given, including the model intrinsic security properties and the correlations between hyperparameters and security.
Collapse
|
7
|
Zheng Y, Ma Y, Xiong Q, Zhu K, Weng N, Zhu Q. The role of artificial intelligence in the development of anticancer therapeutics from natural polyphenols: Current advances and future prospects. Pharmacol Res 2024; 208:107381. [PMID: 39218422 DOI: 10.1016/j.phrs.2024.107381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/06/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024]
Abstract
Natural polyphenols, abundant in the human diet, are derived from a wide variety of sources. Numerous preclinical studies have demonstrated their significant anticancer properties against various malignancies, making them valuable resources for drug development. However, traditional experimental methods for developing anticancer therapies from natural polyphenols are time-consuming and labor-intensive. Recently, artificial intelligence has shown promising advancements in drug discovery. Integrating AI technologies into the development process for natural polyphenols can substantially reduce development time and enhance efficiency. In this study, we review the crucial roles of natural polyphenols in anticancer treatment and explore the potential of AI technologies to aid in drug development. Specifically, we discuss the application of AI in key stages such as drug structure prediction, virtual drug screening, prediction of biological activity, and drug-target protein interaction, highlighting the potential to revolutionize the development of natural polyphenol-based anticancer therapies.
Collapse
Affiliation(s)
- Ying Zheng
- Division of Abdominal Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, No.37 Guoxue Alley, Chengdu, Sichuan 610041, China
| | - Yifei Ma
- Division of Abdominal Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, No.37 Guoxue Alley, Chengdu, Sichuan 610041, China
| | - Qunli Xiong
- Division of Abdominal Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, No.37 Guoxue Alley, Chengdu, Sichuan 610041, China
| | - Kai Zhu
- Department of Medical Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fujian 350011, PR China
| | - Ningna Weng
- Department of Medical Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fujian 350011, PR China
| | - Qing Zhu
- Division of Abdominal Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, No.37 Guoxue Alley, Chengdu, Sichuan 610041, China.
| |
Collapse
|
8
|
Reddy A, Reddy RP, Roghani AK, Garcia RI, Khemka S, Pattoor V, Jacob M, Reddy PH, Sehar U. Artificial intelligence in Parkinson's disease: Early detection and diagnostic advancements. Ageing Res Rev 2024; 99:102410. [PMID: 38972602 DOI: 10.1016/j.arr.2024.102410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 07/04/2024] [Indexed: 07/09/2024]
Abstract
Parkinson's disease (PD) is the second most common neurodegenerative disorder, globally affecting men and women at an exponentially growing rate, with currently no cure. Disease progression starts when dopaminergic neurons begin to die. In PD, the loss of neurotransmitter, dopamine is responsible for the overall communication of neural cells throughout the body. Clinical symptoms of PD are slowness of movement, involuntary muscular contractions, speech & writing changes, lessened automatic movement, and chronic tremors in the body. PD occurs in both familial and sporadic forms and modifiable and non-modifiable risk factors and socioeconomic conditions cause PD. Early detectable diagnostics and treatments have been developed in the last several decades. However, we still do not have precise early detectable biomarkers and therapeutic agents/drugs that prevent and/or delay the disease process. Recently, artificial intelligence (AI) science and machine learning tools have been promising in identifying early detectable markers with a greater rate of accuracy compared to past forms of treatment and diagnostic processes. Artificial intelligence refers to the intelligence exhibited by machines or software, distinct from the intelligence observed in humans that is based on neural networks in a form and can be used to diagnose the longevity and disease severity of disease. The term Machine Learning or Neural Networks is a blanket term used to identify an emerging technology that is created to work in the way of a "human brain" using many intertwined neurons to achieve the same level of raw intelligence as that of a brain. These processes have been used for neurodegenerative diseases such as Parkinson's disease and Alzheimer's disease, to assess the severity of the patient's condition. In the current article, we discuss the prevalence and incidence of PD, and currently available diagnostic biomarkers and therapeutic strategies. We also highlighted currently available artificial intelligence science and machine learning tools and their applications to detect disease and develop therapeutic interventions.
Collapse
Affiliation(s)
- Aananya Reddy
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; Lubbock High School, Lubbock, TX 79401, USA.
| | - Ruhananhad P Reddy
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; Lubbock High School, Lubbock, TX 79401, USA.
| | - Aryan Kia Roghani
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; Frenship High School, Lubbock, TX 79382, USA.
| | - Ricardo Isaiah Garcia
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA.
| | - Sachi Khemka
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA.
| | - Vasanthkumar Pattoor
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; University of South Florida, Tampa, FL 33620, USA.
| | - Michael Jacob
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; Department of Biology, The University of Texas at San Antonio, San Antonio, TX 78249, USA.
| | - P Hemachandra Reddy
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; Nutritional Sciences Department, College of Human Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; Public Health Department of Graduate School of Biomedical Sciences, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; Department pf Speech, Language and Hearing Services, School Health Professions, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA; Department of Pharmacology and Neuroscience, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA.
| | - Ujala Sehar
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA.
| |
Collapse
|
9
|
Li T, Mao J, Yu J, Zhao Z, Chen M, Yao Z, Fang L, Hu B. Fully automated classification of pulmonary nodules in positron emission tomography-computed tomography imaging using a two-stage multimodal learning approach. Quant Imaging Med Surg 2024; 14:5526-5540. [PMID: 39144014 PMCID: PMC11320548 DOI: 10.21037/qims-24-234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 06/17/2024] [Indexed: 08/16/2024]
Abstract
Background Lung cancer is a malignant tumor, for which pulmonary nodules are considered to be significant indicators. Early recognition and timely treatment of pulmonary nodules can contribute to improving the survival rate of patients with cancer. Positron emission tomography-computed tomography (PET/CT) is a noninvasive, fusion imaging technique that can obtain both functional and structural information of lung regions. However, studies of pulmonary nodules based on computer-aided diagnosis have primarily focused on the nodule level due to a reliance on the annotation of nodules, which is superficial and unable to contribute to the actual clinical diagnosis. The aim of this study was thus to develop a fully automated classification framework for a more comprehensive assessment of pulmonary nodules in PET/CT imaging data. Methods We developed a two-stage multimodal learning framework for the diagnosis of pulmonary nodules in PET/CT imaging. In this framework, Stage I focuses on pulmonary parenchyma segmentation using a pretrained U-Net and PET/CT registration. Stage II aims to extract, integrate, and recognize image-level and feature-level features by employing the three-dimensional (3D) Inception-residual net (ResNet) convolutional block attention module architecture and a dense-voting fusion mechanism. Results In the experiments, the proposed model's performance was comprehensively validated using a set of real clinical data, achieving mean scores of 89.98%, 89.21%, 84.75%, 93.38%, 86.83%, and 0.9227 for accuracy, precision, recall, specificity, F1 score, and area under curve values, respectively. Conclusions This paper presents a two-stage multimodal learning approach for the automatic diagnosis of pulmonary nodules. The findings reveal that the main reason for limiting model performance is the nonsolitary property of nodules in pulmonary nodule diagnosis, providing direction for future research.
Collapse
Affiliation(s)
- Tongtong Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou University, Lanzhou, China
| | - Junfeng Mao
- Department of Nuclear Medicine, The 940th Hospital of Joint Logistics Support Force of Chinese People’s Liberation Army, Lanzhou, China
- School of Basic Medical Sciences, Gansu University of Traditional Chinese Medicine, Lanzhou, China
| | - Jiandong Yu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou University, Lanzhou, China
| | - Ziyang Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou University, Lanzhou, China
| | - Miao Chen
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou University, Lanzhou, China
| | - Zhijun Yao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou University, Lanzhou, China
| | - Lei Fang
- Department of Nuclear Medicine, Taikang Tongji (Wuhan) Hospital, Wuhan, China
| | - Bin Hu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, China
- Gansu Provincial Key Laboratory of Wearable Computing, Lanzhou University, Lanzhou, China
- School of Medical Technology, Beijing Institute of Technology, Beijing, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Joint Research Center for Cognitive Neurosensor Technology of Lanzhou University & Institute of Semiconductors, Chinese Academy of Sciences, Lanzhou, China
| |
Collapse
|
10
|
Pawar SB, Deshmukh NK, Jadhav SB. Hybrid deep learning technique for COX-2 inhibition bioactivity detection against breast cancer disease. Biomed Eng Lett 2024; 14:631-647. [PMID: 39512384 PMCID: PMC11538098 DOI: 10.1007/s13534-024-00355-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 01/03/2024] [Accepted: 01/24/2024] [Indexed: 11/15/2024] Open
Abstract
This study addresses detecting COX-2 inhibition in breast cancer, targeting its role in tumor growth. The primary goal is to develop an efficient technique for precise COX-2 inhibition bioactivity detection, with implications for identifying anti-cancer compounds and advancing breast cancer therapies. The proposed methodology uses the UNet architecture for feature extraction, enhancing accuracy. A modified chicken swarm optimization (MCSO) algorithm addresses data dimensionality, optimizing features. An improved Laguerre neural network (ILNN) classifies COX-2 inhibition bioactivity. Validation is performed using the ChEMBL database. The research evaluates the accuracy, precision, recall, F-measure, Matthews' correlation coefficient (MCC), and Dice coefficient of the proposed method. These metrics are compared against those of contemporary methods to assess the efficiency and effectiveness of the developed technique. The study underscores the hybrid deep learning method's significance in accurately detecting COX-2 inhibition bioactivity against breast cancer. Results highlight its potential as a valuable tool in breast cancer drug discovery.
Collapse
Affiliation(s)
- Sahebrao B. Pawar
- School of Computational Sciences, Swami Ramanand Teerth, Marathvada University, Nanded, India
| | - N. K. Deshmukh
- School of Computational Sciences, Swami Ramanand Teerth, Marathvada University, Nanded, India
| | - Sharad B. Jadhav
- School of Computational Sciences, Swami Ramanand Teerth, Marathvada University, Nanded, India
| |
Collapse
|
11
|
Kumar N, Acharya V. Advances in machine intelligence-driven virtual screening approaches for big-data. Med Res Rev 2024; 44:939-974. [PMID: 38129992 DOI: 10.1002/med.21995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 07/15/2023] [Accepted: 10/29/2023] [Indexed: 12/23/2023]
Abstract
Virtual screening (VS) is an integral and ever-evolving domain of drug discovery framework. The VS is traditionally classified into ligand-based (LB) and structure-based (SB) approaches. Machine intelligence or artificial intelligence has wide applications in the drug discovery domain to reduce time and resource consumption. In combination with machine intelligence algorithms, VS has emerged into revolutionarily progressive technology that learns within robust decision orders for data curation and hit molecule screening from large VS libraries in minutes or hours. The exponential growth of chemical and biological data has evolved as "big-data" in the public domain demands modern and advanced machine intelligence-driven VS approaches to screen hit molecules from ultra-large VS libraries. VS has evolved from an individual approach (LB and SB) to integrated LB and SB techniques to explore various ligand and target protein aspects for the enhanced rate of appropriate hit molecule prediction. Current trends demand advanced and intelligent solutions to handle enormous data in drug discovery domain for screening and optimizing hits or lead with fewer or no false positive hits. Following the big-data drift and tremendous growth in computational architecture, we presented this review. Here, the article categorized and emphasized individual VS techniques, detailed literature presented for machine learning implementation, modern machine intelligence approaches, and limitations and deliberated the future prospects.
Collapse
Affiliation(s)
- Neeraj Kumar
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| | - Vishal Acharya
- Artificial Intelligence for Computational Biology Lab (AICoB), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research, Ghaziabad, India
| |
Collapse
|
12
|
Kim JK, Chang MC. Convolutional neural network algorithm trained on lumbar spine radiographs to predict outcomes of transforaminal epidural steroid injection for lumbosacral radicular pain from spinal stenosis. Sci Rep 2024; 14:8490. [PMID: 38605170 PMCID: PMC11009393 DOI: 10.1038/s41598-024-59288-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/09/2024] [Indexed: 04/13/2024] Open
Abstract
Little is known about the therapeutic outcomes of transforaminal epidural steroid injection (TFESI) in patients with lumbosacral radicular pain due to lumbar spinal stenosis (LSS). Using lumbar spine radiographs as input data, we trained a convolutional neural network (CNN) to predict therapeutic outcomes after lumbar TFESI in patients with lumbosacral radicular pain caused by LSS. We retrospectively recruited 193 patients for this study. The lumbar spine radiographs included anteroposterior, lateral, and bilateral (left and right) oblique views. We cut each lumbar spine radiograph image into a square shape that included the vertebra corresponding to the level at which the TFESI was performed and the vertebrae juxta below and above that level. Output data were divided into "favorable outcome" (≥ 50% reduction in the numeric rating scale [NRS] score at 2 months post-TFESI) and "poor outcome" (< 50% reduction in the NRS score at 2 months post-TFESI). Using these input and output data, we developed a CNN model for predicting TFESI outcomes. The area under the curve of our model was 0.920. Its accuracy was 87.2%. Our CNN model has an excellent capacity for predicting therapeutic outcomes after lumbar TFESI in patients with lumbosacral radicular pain induced by LSS.
Collapse
Affiliation(s)
- Jeoung Kun Kim
- Department of Business Administration, School of Business, Yeungnam University, Gyeongsan-si, Republic of Korea
| | - Min Cheol Chang
- Department of Physical Medicine and Rehabilitation, College of Medicine, Yeungnam University, 317-1, Daemyungdong, Namku, Daegu, 705-717, Republic of Korea.
| |
Collapse
|
13
|
Malashin I, Tynchenko V, Gantimurov A, Nelyub V, Borodulin A. Optimizing Neural Networks for Chemical Reaction Prediction: Insights from Methylene Blue Reduction Reactions. Int J Mol Sci 2024; 25:3860. [PMID: 38612671 PMCID: PMC11011334 DOI: 10.3390/ijms25073860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 03/24/2024] [Accepted: 03/28/2024] [Indexed: 04/14/2024] Open
Abstract
This paper offers a thorough investigation of hyperparameter tuning for neural network architectures using datasets encompassing various combinations of Methylene Blue (MB) Reduction by Ascorbic Acid (AA) reactions with different solvents and concentrations. The aim is to predict coefficients of decay plots for MB absorbance, shedding light on the complex dynamics of chemical reactions. Our findings reveal that the optimal model, determined through our investigation, consists of five hidden layers, each with sixteen neurons and employing the Swish activation function. This model yields an NMSE of 0.05, 0.03, and 0.04 for predicting the coefficients A, B, and C, respectively, in the exponential decay equation A + B · e-x/C. These findings contribute to the realm of drug design based on machine learning, providing valuable insights into optimizing chemical reaction predictions.
Collapse
Affiliation(s)
| | - Vadim Tynchenko
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (I.M.); (A.G.); (V.N.); (A.B.)
| | | | | | | |
Collapse
|
14
|
Thapa V, Galande AS, Ram GHP, John R. TIE-GANs: single-shot quantitative phase imaging using transport of intensity equation with integration of GANs. JOURNAL OF BIOMEDICAL OPTICS 2024; 29:016010. [PMID: 38293292 PMCID: PMC10826717 DOI: 10.1117/1.jbo.29.1.016010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 11/18/2023] [Accepted: 01/09/2024] [Indexed: 02/01/2024]
Abstract
Significance Artificial intelligence (AI) has become a prominent technology in computational imaging over the past decade. The expeditious and label-free characteristics of quantitative phase imaging (QPI) render it a promising contender for AI investigation. Though interferometric methodologies exhibit potential efficacy, their implementation involves complex experimental platforms and computationally intensive reconstruction procedures. Hence, non-interferometric methods, such as transport of intensity equation (TIE), are preferred over interferometric methods. Aim TIE method, despite its effectiveness, is tedious as it requires the acquisition of many images at varying defocus planes. The proposed methodology holds the ability to generate a phase image utilizing a single intensity image using generative adversarial networks (GANs). We present a method called TIE-GANs to overcome the multi-shot scheme of conventional TIE. Approach The present investigation employs the TIE as a QPI methodology, which necessitates reduced experimental and computational efforts. TIE is being used for the dataset preparation as well. The proposed method captures images from different defocus planes for training. Our approach uses an image-to-image translation technique to produce phase maps and is based on GANs. The main contribution of this work is the introduction of GANs with TIE (TIE:GANs) that can give better phase reconstruction results with shorter computation times. This is the first time the GANs is proposed for TIE phase retrieval. Results The characterization of the system was carried out with microbeads of 4 μ m size and structural similarity index (SSIM) for microbeads was found to be 0.98. We demonstrated the application of the proposed method with oral cells, which yielded a maximum SSIM value of 0.95. The key characteristics include mean squared error and peak-signal-to-noise ratio values of 140 and 26.42 dB for oral cells and 100 and 28.10 dB for microbeads. Conclusions The proposed methodology holds the ability to generate a phase image utilizing a single intensity image. Our method is feasible for digital cytology because of its reported high value of SSIM. Our approach can handle defocused images in such a way that it can take intensity image from any defocus plane within the provided range and able to generate phase map.
Collapse
Affiliation(s)
- Vikas Thapa
- Indian Institute of Technology Hyderabad, Medical Optics and Sensors Laboratory, Department of Biomedical Engineering, Hyderabad, Telangana, India
| | - Ashwini Subhash Galande
- Indian Institute of Technology Hyderabad, Medical Optics and Sensors Laboratory, Department of Biomedical Engineering, Hyderabad, Telangana, India
| | - Gurram Hanu Phani Ram
- Indian Institute of Technology Hyderabad, Medical Optics and Sensors Laboratory, Department of Biomedical Engineering, Hyderabad, Telangana, India
| | - Renu John
- Indian Institute of Technology Hyderabad, Medical Optics and Sensors Laboratory, Department of Biomedical Engineering, Hyderabad, Telangana, India
| |
Collapse
|
15
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
16
|
Pinto J, Ramos JRC, Costa RS, Rossell S, Dumas P, Oliveira R. Hybrid deep modeling of a CHO-K1 fed-batch process: combining first-principles with deep neural networks. Front Bioeng Biotechnol 2023; 11:1237963. [PMID: 37744245 PMCID: PMC10515724 DOI: 10.3389/fbioe.2023.1237963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 08/22/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Hybrid modeling combining First-Principles with machine learning is becoming a pivotal methodology for Biopharma 4.0 enactment. Chinese Hamster Ovary (CHO) cells, being the workhorse for industrial glycoproteins production, have been the object of several hybrid modeling studies. Most previous studies pursued a shallow hybrid modeling approach based on three-layered Feedforward Neural Networks (FFNNs) combined with macroscopic material balance equations. Only recently, the hybrid modeling field is incorporating deep learning into its framework with significant gains in descriptive and predictive power. Methods: This study compares, for the first time, deep and shallow hybrid modeling in a CHO process development context. Data of 24 fed-batch cultivations of a CHO-K1 cell line expressing a target glycoprotein, comprising 30 measured state variables over time, were used to compare both methodologies. Hybrid models with varying FFNN depths (3-5 layers) were systematically compared using two training methodologies. The classical training is based on the Levenberg-Marquardt algorithm, indirect sensitivity equations and cross-validation. The deep learning is based on the Adaptive Moment Estimation Method (ADAM), stochastic regularization and semidirect sensitivity equations. Results and conclusion: The results point to a systematic generalization improvement of deep hybrid models over shallow hybrid models. Overall, the training and testing errors decreased by 14.0% and 23.6% respectively when applying the deep methodology. The Central Processing Unit (CPU) time for training the deep hybrid model increased by 31.6% mainly due to the higher FFNN complexity. The final deep hybrid model is shown to predict the dynamics of the 30 state variables within the error bounds in every test experiment. Notably, the deep hybrid model could predict the metabolic shifts in key metabolites (e.g., lactate, ammonium, glutamine and glutamate) in the test experiments. We expect deep hybrid modeling to accelerate the deployment of high-fidelity digital twins in the biopharma sector in the near future.
Collapse
Affiliation(s)
- José Pinto
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, Caparica, Portugal
| | - João R. C. Ramos
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, Caparica, Portugal
| | - Rafael S. Costa
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, Caparica, Portugal
| | | | | | - Rui Oliveira
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, Caparica, Portugal
| |
Collapse
|
17
|
Shi Y, Zhang X, Yang Y, Cai T, Peng C, Wu L, Zhou L, Han J, Ma M, Zhu W, Xu Z. D3CARP: a comprehensive platform with multiple-conformation based docking, ligand similarity search and deep learning approaches for target prediction and virtual screening. Comput Biol Med 2023; 164:107283. [PMID: 37536095 DOI: 10.1016/j.compbiomed.2023.107283] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 07/15/2023] [Accepted: 07/28/2023] [Indexed: 08/05/2023]
Abstract
Resource- and time-consuming biological experiments are unavoidable in traditional drug discovery, which have directly driven the evolution of various computational algorithms and tools for drug-target interaction (DTI) prediction. For improving the prediction reliability, a comprehensive platform is highly expected as some previously reported webservers are small in scale, single-method, or even out of service. In this study, we integrated the multiple-conformation based docking, 2D/3D ligand similarity search and deep learning approaches to construct a comprehensive webserver, namely D3CARP, for target prediction and virtual screening. Specifically, 9352 conformations with positive control of 1970 targets were used for molecular docking, and approximately 2 million target-ligand pairs were used for 2D/3D ligand similarity search and deep learning. Besides, the positive compounds were added as references, and related diseases of therapeutic targets were annotated for further disease-based DTI study. The accuracies of the molecular docking and deep learning approaches were 0.44 and 0.89, respectively. And the average accuracy of five ligand similarity searches was 0.94. The strengths of D3CARP encompass the support for multiple computational methods, ensemble docking, utilization of positive controls as references, cross-validation of predicted outcomes, diverse disease types, and broad applicability in drug discovery. The D3CARP is freely accessible at https://www.d3pharma.com/D3CARP/index.php.
Collapse
Affiliation(s)
- Yulong Shi
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xinben Zhang
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Yanqing Yang
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tingting Cai
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Cheng Peng
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Leyun Wu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liping Zhou
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiaxin Han
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Minfei Ma
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Weiliang Zhu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhijian Xu
- State Key Laboratory of Drug Research; Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Pharmacy, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
18
|
Ylipää E, Chavan S, Bånkestad M, Broberg J, Glinghammar B, Norinder U, Cotgreave I. hERG-toxicity prediction using traditional machine learning and advanced deep learning techniques. Curr Res Toxicol 2023; 5:100121. [PMID: 37701072 PMCID: PMC10493507 DOI: 10.1016/j.crtox.2023.100121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 08/24/2023] [Accepted: 08/30/2023] [Indexed: 09/14/2023] Open
Abstract
The rise of artificial intelligence (AI) based algorithms has gained a lot of interest in the pharmaceutical development field. Our study demonstrates utilization of traditional machine learning techniques such as random forest (RF), support-vector machine (SVM), extreme gradient boosting (XGBoost), deep neural network (DNN) as well as advanced deep learning techniques like gated recurrent unit-based DNN (GRU-DNN) and graph neural network (GNN), towards predicting human ether-á-go-go related gene (hERG) derived toxicity. Using the largest hERG dataset derived to date, we have utilized 203,853 and 87,366 compounds for training and testing the models, respectively. The results show that GNN, SVM, XGBoost, DNN, RF, and GRU-DNN all performed well, with validation set AUC ROC scores equals 0.96, 0.95, 0.95, 0.94, 0.94 and 0.94, respectively. The GNN was found to be the top performing model based on predictive power and generalizability. The GNN technique is free of any feature engineering steps while having a minimal human intervention. The GNN approach may serve as a basis for comprehensive automation in predictive toxicology. We believe that the models presented here may serve as a promising tool, both for academic institutes as well as pharmaceutical industries, in predicting hERG-liability in new molecular structures.
Collapse
Affiliation(s)
- Erik Ylipää
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Swapnil Chavan
- Unit of Chemical and Pharmaceutical Toxicology, Research Institutes of Sweden RISE, Södertalje 151 36, Sweden
| | - Maria Bånkestad
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Johan Broberg
- Computer Systems Unit, Research Institutes of Sweden RISE, Kista 164 40, Sweden
| | - Björn Glinghammar
- Preclinical Development & Translational Medicine, Swedish Orphan Biovitrum AB, Solna 171 65, Sweden
| | - Ulf Norinder
- Department of Computer and Systems Sciences, Stockholm University, Kista 164 07, Sweden
| | - Ian Cotgreave
- Unit of Chemical and Pharmaceutical Toxicology, Research Institutes of Sweden RISE, Södertalje 151 36, Sweden
| |
Collapse
|
19
|
Flanary VL, Fisher JL, Wilk EJ, Howton TC, Lasseigne BN. Computational Advancements in Cancer Combination Therapy Prediction. JCO Precis Oncol 2023; 7:e2300261. [PMID: 37824797 PMCID: PMC12012855 DOI: 10.1200/po.23.00261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/20/2023] [Accepted: 08/15/2023] [Indexed: 10/14/2023] Open
Abstract
Given the high attrition rate of de novo drug discovery and limited efficacy of single-agent therapies in cancer treatment, combination therapy prediction through in silico drug repurposing has risen as a time- and cost-effective alternative for identifying novel and potentially efficacious therapies for cancer. The purpose of this review is to provide an introduction to computational methods for cancer combination therapy prediction and to summarize recent studies that implement each of these methods. A systematic search of the PubMed database was performed, focusing on studies published within the past 10 years. Our search included reviews and articles of ongoing and retrospective studies. We prioritized articles with findings that suggest considerations for improving combination therapy prediction methods over providing a meta-analysis of all currently available cancer combination therapy prediction methods. Computational methods used for drug combination therapy prediction in cancer research include networks, regression-based machine learning, classifier machine learning models, and deep learning approaches. Each method class has its own advantages and disadvantages, so careful consideration is needed to determine the most suitable class when designing a combination therapy prediction method. Future directions to improve current combination therapy prediction technology include incorporation of disease pathobiology, drug characteristics, patient multiomics data, and drug-drug interactions to determine maximally efficacious and tolerable drug regimens for cancer. As computational methods improve in their capability to integrate patient, drug, and disease data, more comprehensive models can be developed to more accurately predict safe and efficacious combination drug therapies for cancer and other complex diseases.
Collapse
Affiliation(s)
- Victoria L. Flanary
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Jennifer L. Fisher
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Elizabeth J. Wilk
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Timothy C. Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Brittany N. Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| |
Collapse
|
20
|
Abbas F, Zhang F, Ismail M, Khan G, Iqbal J, Alrefaei AF, Albeshr MF. Optimizing Machine Learning Algorithms for Landslide Susceptibility Mapping along the Karakoram Highway, Gilgit Baltistan, Pakistan: A Comparative Study of Baseline, Bayesian, and Metaheuristic Hyperparameter Optimization Techniques. SENSORS (BASEL, SWITZERLAND) 2023; 23:6843. [PMID: 37571627 PMCID: PMC10422586 DOI: 10.3390/s23156843] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/19/2023] [Accepted: 07/25/2023] [Indexed: 08/13/2023]
Abstract
Algorithms for machine learning have found extensive use in numerous fields and applications. One important aspect of effectively utilizing these algorithms is tuning the hyperparameters to match the specific task at hand. The selection and configuration of hyperparameters directly impact the performance of machine learning models. Achieving optimal hyperparameter settings often requires a deep understanding of the underlying models and the appropriate optimization techniques. While there are many automatic optimization techniques available, each with its own advantages and disadvantages, this article focuses on hyperparameter optimization for well-known machine learning models. It explores cutting-edge optimization methods such as metaheuristic algorithms, deep learning-based optimization, Bayesian optimization, and quantum optimization, and our paper focused mainly on metaheuristic and Bayesian optimization techniques and provides guidance on applying them to different machine learning algorithms. The article also presents real-world applications of hyperparameter optimization by conducting tests on spatial data collections for landslide susceptibility mapping. Based on the experiment's results, both Bayesian optimization and metaheuristic algorithms showed promising performance compared to baseline algorithms. For instance, the metaheuristic algorithm boosted the random forest model's overall accuracy by 5% and 3%, respectively, from baseline optimization methods GS and RS, and by 4% and 2% from baseline optimization methods GA and PSO. Additionally, for models like KNN and SVM, Bayesian methods with Gaussian processes had good results. When compared to the baseline algorithms RS and GS, the accuracy of the KNN model was enhanced by BO-TPE by 1% and 11%, respectively, and by BO-GP by 2% and 12%, respectively. For SVM, BO-TPE outperformed GS and RS by 6% in terms of performance, while BO-GP improved results by 5%. The paper thoroughly discusses the reasons behind the efficiency of these algorithms. By successfully identifying appropriate hyperparameter configurations, this research paper aims to assist researchers, spatial data analysts, and industrial users in developing machine learning models more effectively. The findings and insights provided in this paper can contribute to enhancing the performance and applicability of machine learning algorithms in various domains.
Collapse
Affiliation(s)
- Farkhanda Abbas
- School of Computer Science, China University of Geosciences, Wuhan 430074, China;
| | - Feng Zhang
- School of Computer Science, China University of Geosciences, Wuhan 430074, China;
| | - Muhammad Ismail
- Department of Computer Science, Karakoram International University, Gilgit 15100, Pakistan;
| | - Garee Khan
- School of Geography, Karakoram International University, Gilgit 15100, Pakistan;
| | - Javed Iqbal
- School of Environmental Studies, China University of Geosciences, Wuhan 430074, China;
| | - Abdulwahed Fahad Alrefaei
- Department of Zoology, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia; (A.F.A.); (M.F.A.)
| | - Mohammed Fahad Albeshr
- Department of Zoology, College of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia; (A.F.A.); (M.F.A.)
| |
Collapse
|
21
|
Hwang J, Lustig N, Jung M, Lee JH. Autoencoder and restricted Boltzmann machine for transfer learning in functional magnetic resonance imaging task classification. Heliyon 2023; 9:e18086. [PMID: 37519689 PMCID: PMC10372668 DOI: 10.1016/j.heliyon.2023.e18086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 05/18/2023] [Accepted: 07/06/2023] [Indexed: 08/01/2023] Open
Abstract
Deep neural networks (DNNs) have been adopted widely as classifiers for functional magnetic resonance imaging (fMRI) data, advancing beyond traditional machine learning models. Consequently, transfer learning of the pre-trained DNN becomes crucial to enhance DNN classification performance, specifically by alleviating an overfitting issue that occurs when a substantial number of DNN parameters are fitted to a relatively small number of fMRI samples. In this study, we first systematically compared the two most popularly used, unsupervised pretraining models for resting-state fMRI (rfMRI) volume data to pre-train the DNNs, namely autoencoder (AE) and restricted Boltzmann machine (RBM). The group in-brain mask used when training AE and RBM displayed a sizable overlap ratio with Yeo's seven functional brain networks (FNs). The parcellated FNs obtained from the RBM were fine-grained compared to those from the AE. The pre-trained AE and RBM served as the weight parameters of the first of the two hidden DNN layers, and the DNN fulfilled the task classifier role for fMRI (tfMRI) data in the Human Connectome Project (HCP). We tested two transfer learning schemes: (1) fixing and (2) fine-tuning the DNN's pre-trained AE or RBM weights. The DNN with transfer learning was compared to a baseline DNN, trained using random initial weights. Overall, DNN classification performance from the transfer learning proved superior when the pre-trained RBM weights were fixed and when the pre-trained AE weights were fine-tuned (average error rates: 14.8% for fixed RBM, 15.1% fine-tuned AE, and 15.5% for the baseline model) compared to the alternative scenarios of DNN transfer learning schemes. Moreover, the optimal transfer learning scheme between the fixed RBM and fine-tuned AE varied according to seven task conditions in the HCP. Nonetheless, the computational load reduced substantially for the fixed-weight-based transfer learning compared to the fine-tuning-based transfer learning (e.g., the number of weight parameters for the fixed-weight-based DNN model reduced to 1.9% compared with a baseline/fine-tuned DNN model). Our findings suggest that weight initialization at the DNN's first layer using RBM-based pre-trained weights provides the most promising approach when the whole-brain fMRI volume supports associated task classification. We believe that our proposed scheme could be applied to a variety of task conditions to improve their classification performance and to utilize computational resources efficiently using our AE/RBM-based pre-trained weights compared to random initial weights for DNN training.
Collapse
Affiliation(s)
| | | | | | - Jong-Hwan Lee
- Corresponding author. Department of Brain and Cognitive Engineering, Korea University, Anam-ro 145, Seongbuk-gu, Seoul 02841, South Korea.
| |
Collapse
|
22
|
Nguyen DC, Ishikawa Y. On predicting annual output energy of 4-terminal perovskite/silicon tandem PV cells for building integrated photovoltaic application using machine learning. Heliyon 2023; 9:e18097. [PMID: 37539179 PMCID: PMC10395358 DOI: 10.1016/j.heliyon.2023.e18097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/04/2023] [Accepted: 07/06/2023] [Indexed: 08/05/2023] Open
Abstract
Building integrated photovoltaic (BIPV), based on tandem PV cells, is considered a new alternative for combining solar energy with buildings. Accurately predicting the BIPV-harvested annual output energy (E o u t , a n n u a l ) is crucial for evaluating the BIPV performance. Machine learning (ML) is a potential candidate for solving such a problem without the time-consuming process of experimental investigations. This contribution proposes an artificial neural network (ANN) to predict the E o u t , a n n u a l of 4-terminal perovskite/silicon (psk/Si) PV cells under realistic environmental conditions. The input variables of the proposed model consist of the input solar irradiance (P i n ), incident light's angle (A i n ), the PV module's temperature (T m o d ), the psk absorber's thickness (T h p s k ), and the psk absorber's bandgap (B p s k ). The input data were received from the simulated results. This work also evaluates the degree of importance of each input variable and optimizes the architecture of the ANN using the surrogate algorithm before predictions. The optimized ANN-3 (three hidden layers) model shows superior performance indicators, including a mean squared error of MSE = 0.02283, correlation coefficient R = 0.99999, and Willmott's index of agreement I w = 0.99999. Consequently, the predicted highest E o u t , a n n u a l at B p s k of 1.71 eV is 297.73, 115.01, 193.98, and 97.6 kWh/m2 for the rooftop, east, south, and west facades, respectively.
Collapse
Affiliation(s)
- Dong C. Nguyen
- College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Kanagawa 252-5258, Japan
- Institute of Materials Science, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Cau Giay, Hanoi 100000, Viet Nam
| | - Yasuaki Ishikawa
- College of Science and Engineering, Aoyama Gakuin University, Sagamihara, Kanagawa 252-5258, Japan
| |
Collapse
|
23
|
Dutschmann TM, Kinzel L, Ter Laak A, Baumann K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J Cheminform 2023; 15:49. [PMID: 37118768 PMCID: PMC10142532 DOI: 10.1186/s13321-023-00709-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 03/10/2023] [Indexed: 04/30/2023] Open
Abstract
It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the "golden-standard" to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Lennart Kinzel
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Antonius Ter Laak
- Bayer AG, Research & Development, Pharmaceuticals, Muellerstrasse 178, 13353, Berlin, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany.
| |
Collapse
|
24
|
A novel medical text classification model with Kalman filter for clinical decision making. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
25
|
Bongers BJ, Sijben HJ, Hartog PBR, Tarnovskiy A, IJzerman AP, Heitman LH, van Westen GJP. Proteochemometric Modeling Identifies Chemically Diverse Norepinephrine Transporter Inhibitors. J Chem Inf Model 2023; 63:1745-1755. [PMID: 36926886 PMCID: PMC10052348 DOI: 10.1021/acs.jcim.2c01645] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Solute carriers (SLCs) are relatively underexplored compared to other prominent protein families such as kinases and G protein-coupled receptors. However, proteins from the SLC family play an essential role in various diseases. One such SLC is the high-affinity norepinephrine transporter (NET/SLC6A2). In contrast to most other SLCs, the NET has been relatively well studied. However, the chemical space of known ligands has a low chemical diversity, making it challenging to identify chemically novel ligands. Here, a computational screening pipeline was developed to find new NET inhibitors. The approach increases the chemical space to model for NETs using the chemical space of related proteins that were selected utilizing similarity networks. Prior proteochemometric models added data from related proteins, but here we use a data-driven approach to select the optimal proteins to add to the modeled data set. After optimizing the data set, the proteochemometric model was optimized using stepwise feature selection. The final model was created using a two-step approach combining several proteochemometric machine learning models through stacking. This model was applied to the extensive virtual compound database of Enamine, from which the top predicted 22,000 of the 600 million virtual compounds were clustered to end up with 46 chemically diverse candidates. A subselection of 32 candidates was synthesized and subsequently tested using an impedance-based assay. There were five hit compounds identified (hit rate 16%) with sub-micromolar inhibitory potencies toward NET, which are promising for follow-up experimental research. This study demonstrates a data-driven approach to diversify known chemical space to identify novel ligands and is to our knowledge the first to select this set based on the sequence similarity of related targets.
Collapse
Affiliation(s)
- Brandon J Bongers
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Huub J Sijben
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Peter B R Hartog
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | | | - Adriaan P IJzerman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Laura H Heitman
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands.,Oncode Institute, Jaarbeursplein 6, Utrecht 3521 AL, The Netherlands
| | - Gerard J P van Westen
- Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| |
Collapse
|
26
|
Kour S, Biswas I, Sheoran S, Arora S, Sheela P, Duppala SK, Murthy DK, Pawar SC, Singh H, Kumar D, Prabhu D, Vuree S, Kumar R. Artificial intelligence and nanotechnology for cervical cancer treatment: Current status and future perspectives. J Drug Deliv Sci Technol 2023. [DOI: 10.1016/j.jddst.2023.104392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
27
|
Kumar N, Acharya V. Machine intelligence-guided selection of optimized inhibitor for human immunodeficiency virus (HIV) from natural products. Comput Biol Med 2023; 153:106525. [PMID: 36603433 DOI: 10.1016/j.compbiomed.2022.106525] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 11/28/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
The human immunodeficiency virus (HIV) connects to the cluster of differentiation (CD4) and any of the entry co-receptors (CCR5 and CXCR4); followed by unloading the viral genome, reverse transcriptase, and integrase enzymes within the host cell. The co-receptors facilitate the entry of virus and vital enzymes, leading to replication and pre-maturation of viral particles within the host. The protease enzyme transforms the immature viral vesicles into the mature virion. The pivotal role of co-receptors and enzymes in homeostasis and growth makes the crucial target for anti-HIV drug discovery, and the availability of X-ray crystal structures is an asset. Here, we used the machine intelligence-driven framework (A-HIOT) to identify and optimize target-based potential hit molecules for five significant protein targets from the ZINC15 database (natural products dataset). Following validation with dynamic motion behavior analysis and molecular dynamics simulation, the optimized hits were evaluated using in silico ADMET filtration. Furthermore, three molecules were screened, optimized, and validated: ZINC00005328058 for CCR5 and protease, ZINC000254014855 for CXCR4 and integrase, and ZINC000000538471 for reverse transcriptase. In clinical trials, the ZINC000254014855 and ZINC000254014855 were passed in primary screens for vif-HIV-1, and we reported the specific receptor as well as interactions. As a result, the validated molecules may be investigated further in experimental studies targeting specific receptors in order to design and synergize an anti-HIV regimen.
Collapse
Affiliation(s)
- Neeraj Kumar
- Functional Genomics and Complex System Lab, HiCHiCoB, Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, 176061, Himachal Pradesh, India; Academy of Scientific and Innovative Research, Ghaziabad, 201002, India.
| | - Vishal Acharya
- Functional Genomics and Complex System Lab, HiCHiCoB, Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, 176061, Himachal Pradesh, India; Academy of Scientific and Innovative Research, Ghaziabad, 201002, India.
| |
Collapse
|
28
|
Rodríguez-Pérez R, Trunzer M, Schneider N, Faller B, Gerebtzoff G. Multispecies Machine Learning Predictions of In Vitro Intrinsic Clearance with Uncertainty Quantification Analyses. Mol Pharm 2023; 20:383-394. [PMID: 36437712 DOI: 10.1021/acs.molpharmaceut.2c00680] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In pharmaceutical research, compounds are optimized for metabolic stability to avoid a too fast elimination of the drug. Intrinsic clearance (CLint) measured in liver microsomes or hepatocytes is an important parameter during lead optimization. In this work, machine learning models were developed to relate the compound structure to microsomal metabolic stability and predict CLint for new compounds. A multitask (MT) learning architecture was introduced to model the CLint of six species simultaneously, giving as a result a multispecies machine learning model. MT graph neural network (MT-GNN) regression was identified as the top-performing method, and an ensemble of 10 MT-GNN models was evaluated prospectively. Geometric mean fold errors were consistently smaller than 2-fold. Moreover, high precision values were obtained in the prediction of "high" (>300 μL/min/mg) and "low" (<100 μL/min/mg) CLint compounds. Precision values ranged from 80 to 94% for low CLint predictions and from 75 to 97% for high CLint predictions, depending on the species. Uncertainty on experimental values and model predictions was systematically quantified. Experimental variability (aleatoric uncertainty) of all historical Novartis in vitro clearance experiments was analyzed. Interestingly, MT-GNN models' performance approached assays' experimental variability. Moreover, uncertainty estimation in predictions (epistemic uncertainty) enabled identifying predictions associated with lower and higher error. Taken together, our manuscript combines a multispecies deep learning model and large-scale uncertainty analyses to improve CLint predictions and facilitate early informed decisions for compound prioritization.
Collapse
Affiliation(s)
| | - Markus Trunzer
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Nadine Schneider
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Bernard Faller
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| | - Grégori Gerebtzoff
- Novartis Institutes for Biomedical Research, Novartis Campus, BaselCH-4002, Switzerland
| |
Collapse
|
29
|
On the ability of machine learning methods to discover novel scaffolds. J Mol Model 2022; 29:22. [PMID: 36574054 DOI: 10.1007/s00894-022-05359-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 10/21/2022] [Indexed: 12/28/2022]
Abstract
The recent advances in the application of machine learning to drug discovery have made it a 'hot topic' for research, with hundreds of academic groups and companies integrating machine learning into their drug discovery projects. Nevertheless, there remains great uncertainty regarding the most appropriate ways to evaluate the relative performance of these powerful methods against more traditional cheminformatics approaches, and many pitfalls remain for the unwary. In 2020, researchers at MIT (Stokes et al., Cell 180(4), 688-702, 2020) reported the discovery of a new compound with antibacterial activity, halicin, through the use of a neural network machine learning method. A robust ability to identify new active chemotypes through computational methods would be very useful. In this study, we have used the Stokes et al. dataset to compare the performance of this method to two other approaches, Mapping of Activity Through Dichotomic Scores (MADS) by Todeschini et al. (J Chemom 32(4):e2994, 2018) and Random Matrix Theory (RMT) by Lee et al. (Proc Natl Acad Sci 116(9):3373-3378, 2019). Our results demonstrate that all three methods are capable of predicting halicin as an active antibacterial compound, but that this result is dependent on the dataset composition, pre-processing and the molecular fingerprint used. We have further assessed overall performance as determined by several performance metrics. We also investigated the scaffold hopping potential of the methods by modifying the dataset by removal of the β-lactam and fluoroquinolone chemotypes. MADS and RMT are able to identify actives in the test set that contained these substructures. This ability arises because of high scoring fragments of the withheld chemotypes that are in common with other active antibiotic classes. Interestingly, MADS is relatively better compared to the other two methods based on general predictive performance.
Collapse
|
30
|
Kim JI, Maguire F, Tsang KK, Gouliouris T, Peacock SJ, McAllister TA, McArthur AG, Beiko RG. Machine Learning for Antimicrobial Resistance Prediction: Current Practice, Limitations, and Clinical Perspective. Clin Microbiol Rev 2022; 35:e0017921. [PMID: 35612324 PMCID: PMC9491192 DOI: 10.1128/cmr.00179-21] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Antimicrobial resistance (AMR) is a global health crisis that poses a great threat to modern medicine. Effective prevention strategies are urgently required to slow the emergence and further dissemination of AMR. Given the availability of data sets encompassing hundreds or thousands of pathogen genomes, machine learning (ML) is increasingly being used to predict resistance to different antibiotics in pathogens based on gene content and genome composition. A key objective of this work is to advocate for the incorporation of ML into front-line settings but also highlight the further refinements that are necessary to safely and confidently incorporate these methods. The question of what to predict is not trivial given the existence of different quantitative and qualitative laboratory measures of AMR. ML models typically treat genes as independent predictors, with no consideration of structural and functional linkages; they also may not be accurate when new mutational variants of known AMR genes emerge. Finally, to have the technology trusted by end users in public health settings, ML models need to be transparent and explainable to ensure that the basis for prediction is clear. We strongly advocate that the next set of AMR-ML studies should focus on the refinement of these limitations to be able to bridge the gap to diagnostic implementation.
Collapse
Affiliation(s)
- Jee In Kim
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, Canada
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, Canada
| | - Finlay Maguire
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, Canada
- Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University, Halifax, Canada
- Shared Hospital Laboratory, Toronto, Canada
- Sunnybrook Research Institute, Sunnybrook Health Sciences Centre, Toronto, Canada
| | - Kara K. Tsang
- London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Theodore Gouliouris
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
- Clinical Microbiology and Public Health Laboratory, Public Health England, Cambridge, United Kingdom
- Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom
| | - Sharon J. Peacock
- Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Tim A. McAllister
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Lethbridge, Canada
| | - Andrew G. McArthur
- David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Canada
- M.G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Canada
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Canada
| | - Robert G. Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, Canada
- Institute for Comparative Genomics, Dalhousie University, Halifax, Canada
| |
Collapse
|
31
|
Sun Y, Jiao Y, Shi C, Zhang Y. Deep learning-based molecular dynamics simulation for structure-based drug design against SARS-CoV-2. Comput Struct Biotechnol J 2022; 20:5014-5027. [PMID: 36091720 PMCID: PMC9448712 DOI: 10.1016/j.csbj.2022.09.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 08/03/2022] [Accepted: 09/03/2022] [Indexed: 11/26/2022] Open
Abstract
Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), has led to a global pandemic. Deep learning (DL) technology and molecular dynamics (MD) simulation are two mainstream computational approaches to investigate the geometric, chemical and structural features of protein and guide the relevant drug design. Despite a large amount of research papers focusing on drug design for SARS-COV-2 using DL architectures, it remains unclear how the binding energy of the protein-protein/ligand complex dynamically evolves which is also vital for drug development. In addition, traditional deep neural networks usually have obvious deficiencies in predicting the interaction sites as protein conformation changes. In this review, we introduce the latest progresses of the DL and DL-based MD simulation approaches in structure-based drug design (SBDD) for SARS-CoV-2 which could address the problems of protein structure and binding prediction, drug virtual screening, molecular docking and complex evolution. Furthermore, the current challenges and future directions of DL-based MD simulation for SBDD are also discussed.
Collapse
Affiliation(s)
- Yao Sun
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yanqi Jiao
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Chengcheng Shi
- State Key Lab of Urban Water Resource and Environment, School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Yang Zhang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| |
Collapse
|
32
|
A general deep hybrid model for bioreactor systems: Combining first principles with deep neural networks. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2022.107952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
33
|
Fabietti M, Mahmud M, Lotfi A, Kaiser MS. ABOT: an open-source online benchmarking tool for machine learning-based artefact detection and removal methods from neuronal signals. Brain Inform 2022; 9:19. [PMID: 36048345 PMCID: PMC9437165 DOI: 10.1186/s40708-022-00167-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/22/2022] [Indexed: 11/10/2022] Open
Abstract
Brain signals are recorded using different techniques to aid an accurate understanding of brain function and to treat its disorders. Untargeted internal and external sources contaminate the acquired signals during the recording process. Often termed as artefacts, these contaminations cause serious hindrances in decoding the recorded signals; hence, they must be removed to facilitate unbiased decision-making for a given investigation. Due to the complex and elusive manifestation of artefacts in neuronal signals, computational techniques serve as powerful tools for their detection and removal. Machine learning (ML) based methods have been successfully applied in this task. Due to ML's popularity, many articles are published every year, making it challenging to find, compare and select the most appropriate method for a given experiment. To this end, this paper presents ABOT (Artefact removal Benchmarking Online Tool) as an online benchmarking tool which allows users to compare existing ML-driven artefact detection and removal methods from the literature. The characteristics and related information about the existing methods have been compiled as a knowledgebase (KB) and presented through a user-friendly interface with interactive plots and tables for users to search it using several criteria. Key characteristics extracted from over 120 articles from the literature have been used in the KB to help compare the specific ML models. To comply with the FAIR (Findable, Accessible, Interoperable and Reusable) principle, the source code and documentation of the toolbox have been made available via an open-access repository.
Collapse
Affiliation(s)
- Marcos Fabietti
- Department of Computer Science, Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS, UK
| | - Mufti Mahmud
- Department of Computer Science, Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS, UK.
- Medical Technologies Innovation Facility, Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS, UK.
- Computing and Informatics Research Centre, Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS, UK.
| | - Ahmad Lotfi
- Department of Computer Science, Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS, UK
| | - M Shamim Kaiser
- Institute of Information Technology, Jahangirnagar University, Dhaka, 1342, Savar, Bangladesh
| |
Collapse
|
34
|
Ali F, Kumar H, Patil S, Ahmad A, Babour A, Daud A. Deep-GHBP: Improving prediction of Growth Hormone-binding proteins using deep learning model. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103856] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
35
|
Hao S, Hu X, Feng Z, Sun K, You X, Wang Z, Yang C. Prediction of metal ion ligand binding residues by adding disorder value and propensity factors based on deep learning algorithm. Front Genet 2022; 13:969412. [PMID: 36035120 PMCID: PMC9402973 DOI: 10.3389/fgene.2022.969412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 07/04/2022] [Indexed: 11/13/2022] Open
Abstract
Proteins need to interact with different ligands to perform their functions. Among the ligands, the metal ion is a major ligand. At present, the prediction of protein metal ion ligand binding residues is a challenge. In this study, we selected Zn2+, Cu2+, Fe2+, Fe3+, Co2+, Mn2+, Ca2+ and Mg2+ metal ion ligands from the BioLip database as the research objects. Based on the amino acids, the physicochemical properties and predicted structural information, we introduced the disorder value as the feature parameter. In addition, based on the component information, position weight matrix and information entropy, we introduced the propensity factor as prediction parameters. Then, we used the deep neural network algorithm for the prediction. Furtherly, we made an optimization for the hyper-parameters of the deep learning algorithm and obtained improved results than the previous IonSeq method.
Collapse
Affiliation(s)
- Sixi Hao
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
- *Correspondence: Xiuzhen Hu, ; Zhenxing Feng,
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
- *Correspondence: Xiuzhen Hu, ; Zhenxing Feng,
| | - Kai Sun
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| | - Xiaoxiao You
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| | - Ziyang Wang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| | - Caiyun Yang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
- Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, China
| |
Collapse
|
36
|
Sreenivasan AP, Harrison PJ, Schaal W, Matuszewski DJ, Kultima K, Spjuth O. Predicting protein network topology clusters from chemical structure using deep learning. J Cheminform 2022; 14:47. [PMID: 35841114 PMCID: PMC9284831 DOI: 10.1186/s13321-022-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 06/06/2022] [Indexed: 11/10/2022] Open
Abstract
Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity.
Collapse
Affiliation(s)
- Akshai P Sreenivasan
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.,Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Philip J Harrison
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Damian J Matuszewski
- Centre for Image Analysis, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Kim Kultima
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.
| |
Collapse
|
37
|
One-Day-Ahead Solar Irradiation and Windspeed Forecasting with Advanced Deep Learning Techniques. ENERGIES 2022. [DOI: 10.3390/en15124361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In recent years, demand for electric energy has steadily increased; therefore, the integration of renewable energy sources (RES) at a large scale into power systems is a major concern. Wind and solar energy are among the most widely used alternative sources of energy. However, there is intense variability both in solar irradiation and even more in windspeed, which causes solar and wind power generation to fluctuate highly. As a result, the penetration of RES technologies into electricity networks is a difficult task. Therefore, more accurate solar irradiation and windspeed one-day-ahead forecasting is crucial for safe and reliable operation of electrical systems, the management of RES power plants, and the supply of high-quality electric power at the lowest possible cost. Clouds’ influence on solar irradiation forecasting, data categorization per month for successive years due to the similarity of patterns of solar irradiation per month during the year, and relative seasonal similarity of windspeed patterns have not been taken into consideration in previous work. In this study, three deep learning techniques, i.e., multi-head CNN, multi-channel CNN, and encoder–decoder LSTM, were adopted for medium-term windspeed and solar irradiance forecasting based on a real-time measurement dataset and were compared with two well-known conventional methods, i.e., RegARMA and NARX. Utilization of a walk-forward validation forecast strategy was combined, firstly with a recursive multistep forecast strategy and secondly with a multiple-output forecast strategy, using a specific cloud index introduced for the first time. Moreover, the similarity of patterns of solar irradiation per month during the year and the relative seasonal similarity of windspeed patterns in a timeseries measurements dataset for several successive years demonstrates that they contribute to very high one-day-ahead windspeed and solar irradiation forecasting performance.
Collapse
|
38
|
Electromagnetic Modulation Signal Classification Using Dual-Modal Feature Fusion CNN. ENTROPY 2022; 24:e24050700. [PMID: 35626583 PMCID: PMC9142120 DOI: 10.3390/e24050700] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 05/08/2022] [Accepted: 05/12/2022] [Indexed: 11/17/2022]
Abstract
AMC (automatic modulation classification) plays a vital role in spectrum monitoring and electromagnetic abnormal signal detection. Up to now, few studies have focused on the complementarity between features of different modalities and the importance of the feature fusion mechanism in the AMC method. This paper proposes a dual-modal feature fusion convolutional neural network (DMFF-CNN) for AMC to use the complementarity between different modal features fully. DMFF-CNN uses the gram angular field (GAF) image coding and intelligence quotient (IQ) data combined with CNN. Firstly, the original signal is converted into images by GAF, and the GAF images are used as the input of ResNet50. Secondly, it is converted into IQ data and as the complex value network (CV-CNN) input to extract features. Furthermore, a dual-modal feature fusion mechanism (DMFF) is proposed to fuse the dual-modal features extracted by GAF-ResNet50 and CV-CNN. The fusion feature is used as the input of DMFF-CNN for model training to achieve AMC of multi-type signals. In the evaluation stage, the advantages of the DMFF mechanism proposed in this paper and the accuracy improvement compared with other feature fusion algorithms are discussed. The experiment shows that our method performs better than others, including some state-of-the-art methods, and has superior robustness at a low signal-to-noise ratio (SNR), and the average classification accuracy of the dataset signals reaches 92.1%. The DMFF-CNN proposed in this paper provides a new path for the AMC field.
Collapse
|
39
|
Naga D, Muster W, Musvasva E, Ecker GF. Off-targetP ML: an open source machine learning framework for off-target panel safety assessment of small molecules. J Cheminform 2022; 14:27. [PMID: 35525988 PMCID: PMC9077900 DOI: 10.1186/s13321-022-00603-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 03/26/2022] [Indexed: 11/10/2022] Open
Abstract
Unpredicted drug safety issues constitute the majority of failures in the pharmaceutical industry according to several studies. Some of these preclinical safety issues could be attributed to the non-selective binding of compounds to targets other than their intended therapeutic target, causing undesired adverse events. Consequently, pharmaceutical companies routinely run in-vitro safety screens to detect off-target activities prior to preclinical and clinical studies. Hereby we present an open source machine learning framework aiming at the prediction of our in-house 50 off-target panel activities for ~ 4000 compounds, directly from their structure. This framework is intended to guide chemists in the drug design process prior to synthesis and to accelerate drug discovery. We also present a set of ML approaches that require minimum programming experience for deployment. The workflow incorporates different ML approaches such as deep learning and automated machine learning. It also accommodates popular issues faced in bioactivity predictions, as data imbalance, inter-target duplicated measurements and duplicated public compound identifiers. Throughout the workflow development, we explore and compare the capability of Neural Networks and AutoML in constructing prediction models for fifty off-targets of different protein classes, different dataset sizes, and high-class imbalance. Outcomes from different methods are compared in terms of efficiency and efficacy. The most important challenges and factors impacting model construction and performance in addition to suggestions on how to overcome such challenges are also discussed.
Collapse
Affiliation(s)
- Doha Naga
- Roche Pharma Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland.,Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Wolfgang Muster
- Roche Pharma Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Eunice Musvasva
- Roche Pharma Research & Early Development, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Gerhard F Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria.
| |
Collapse
|
40
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
41
|
Zhang J, Wang Q, Shen W. Hyper-parameter optimization of multiple machine learning algorithms for molecular property prediction using hyperopt library. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2022.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
42
|
A Carbonate Reservoir Prediction Method Based on Deep Learning and Multiparameter Joint Inversion. ENERGIES 2022. [DOI: 10.3390/en15072506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Deep-water carbonate reservoirs are currently the focus of global oil and gas production activities. The characterization of strongly heterogeneous carbonate reservoirs, especially the prediction of fluids in deep-water presalt carbonate reservoirs, exposes difficulties in reservoir inversion due to their complex structures and weak seismic signals. Therefore, a multiparameter joint inversion method is proposed to comprehensively utilize the information of different seismic angle gathers and the simultaneous inversion of multiple seismic parameters. Compared with the commonly used simultaneous constrained sparse-pulse inversion method, the multiparameter joint inversion method can characterize thinner layers that are consistent with data and can obtain higher-resolution presalt reservoir results. Based on the results of multiparameter joint inversion, in this paper, we further integrate the long short-term memory network algorithm to predict the porosity of presalt reef reservoirs. Compared with a fully connected neural network based on the backpropagation algorithm, the porosity results are in better agreement with the new log porosity curves, with the average porosity of the four wells increasing from 89.48% to 97.76%. The results show that the method, which is based on deep learning and multiparameter joint inversion, can more accurately identify porosity and has good application prospects in the prediction of carbonate reservoirs with complex structures.
Collapse
|
43
|
Mohammed KK, Hassanien AE, Afify HM. Classification of Ear Imagery Database using Bayesian Optimization based on CNN-LSTM Architecture. J Digit Imaging 2022; 35:947-961. [PMID: 35296939 PMCID: PMC9485378 DOI: 10.1007/s10278-022-00617-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 02/25/2022] [Accepted: 02/27/2022] [Indexed: 11/28/2022] Open
Abstract
The external and middle ear conditions are diagnosed using a digital otoscope. The clinical diagnosis of ear conditions is suffered from restricted accuracy due to the increased dependency on otolaryngologist expertise, patient complaint, blurring of the otoscopic images, and complexity of lesions definition. There is a high requirement for improved diagnosis algorithms based on otoscopic image processing. This paper presented an ear diagnosis approach based on a convolutional neural network (CNN) as feature extraction and long short-term memory (LSTM) as a classifier algorithm. However, the suggested LSTM model accuracy may be decreased by the omission of a hyperparameter tuning process. Therefore, Bayesian optimization is used for selecting the hyperparameters to improve the results of the LSTM network to obtain a good classification. This study is based on an ear imagery database that consists of four categories: normal, myringosclerosis, earwax plug, and chronic otitis media (COM). This study used 880 otoscopic images divided into 792 training images and 88 testing images to evaluate the approach performance. In this paper, the evaluation metrics of ear condition classification are based on a percentage of accuracy, sensitivity, specificity, and positive predictive value (PPV). The findings yielded a classification accuracy of 100%, a sensitivity of 100%, a specificity of 100%, and a PPV of 100% for the testing database. Finally, the proposed approach shows how to find the best hyperparameters concerning the Bayesian optimization for reliable diagnosis of ear conditions under the consideration of LSTM architecture. This approach demonstrates that CNN-LSTM has higher performance and lower training time than CNN, which has not been used in previous studies for classifying ear diseases. Consequently, the usefulness and reliability of the proposed approach will create an automatic tool for improving the classification and prediction of various ear pathologies.
Collapse
Affiliation(s)
- Kamel K Mohammed
- Center for Virus Research and Studies, Al Azhar University, Cairo, Egypt.,Scientific Research Group in Egypt (SRGE), Cairo, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Information, Cairo University, Giza, Egypt.,Scientific Research Group in Egypt (SRGE), Cairo, Egypt
| | - Heba M Afify
- Systems and Biomedical Engineering Department, Higher Institute of Engineering in Shorouk Academy, Al Shorouk City, Cairo, Egypt. .,Scientific Research Group in Egypt (SRGE), Cairo, Egypt.
| |
Collapse
|
44
|
Lee BD, Gitter A, Greene CS, Raschka S, Maguire F, Titus AJ, Kessler MD, Lee AJ, Chevrette MG, Stewart PA, Britto-Borges T, Cofer EM, Yu KH, Carmona JJ, Fertig EJ, Kalinin AA, Signal B, Lengerich BJ, Triche TJ, Boca SM. Ten quick tips for deep learning in biology. PLoS Comput Biol 2022; 18:e1009803. [PMID: 35324884 PMCID: PMC8946751 DOI: 10.1371/journal.pcbi.1009803] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Affiliation(s)
- Benjamin D. Lee
- In-Q-Tel Labs, Arlington, Virginia, United States of America
- School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Sebastian Raschka
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Finlay Maguire
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Alexander J. Titus
- University of New Hampshire, Manchester, New Hampshire, United States of America
- Bioeconomy.XYZ, Manchester, New Hampshire, United States of America
| | - Michael D. Kessler
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Alexandra J. Lee
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marc G. Chevrette
- Wisconsin Institute for Discovery and Department of Plant Pathology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Paul Allen Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida, United States of America
| | - Thiago Britto-Borges
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Heidelberg, Germany
- Department of Internal Medicine III (Cardiology, Angiology, and Pneumology), University Hospital Heidelberg, Heidelberg, Germany
| | - Evan M. Cofer
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Kun-Hsing Yu
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Pathology, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Juan Jose Carmona
- Philips Healthcare, Cambridge, Massachusetts, United States of America
| | - Elana J. Fertig
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Biomedical Engineering, Department of Applied Mathematics and Statistics, Convergence Institute, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Alexandr A. Kalinin
- Medical Big Data Group, Shenzhen Research Institute of Big Data, Shenzhen, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Brandon Signal
- School of Medicine, College of Health and Medicine, University of Tasmania, Hobart, Australia
| | - Benjamin J. Lengerich
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Timothy J. Triche
- Center for Epigenetics, Van Andel Research Institute, Grand Rapids, Michigan, United States of America
- Department of Pediatrics, College of Human Medicine, Michigan State University, East Lansing, Michigan, United States of America
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Simina M. Boca
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, District of Columbia, United States of America
- Department of Oncology, Georgetown University Medical Center, Washington, DC, United States of America
- Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC, United States of America
- Cancer Prevention and Control Program, Lombardi Comprehensive Cancer Center, Washington, DC, United States of America
| |
Collapse
|
45
|
Studying the Effects of Cold Plasma Phosphorus Using Physiological and Digital Image Processing Techniques. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:8332737. [PMID: 35281947 PMCID: PMC8913142 DOI: 10.1155/2022/8332737] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 12/26/2021] [Accepted: 01/10/2022] [Indexed: 01/20/2023]
Abstract
The goal of this study is to see how cold plasma affects rabbit bone tissue infected with osteoporosis. The search is divided into three categories: control, infected, and treated. The rabbits were subjected to cold plasma for five minutes in a room with a microwave plasma voltage of “175 V” and a gas flow of “2.” A histopathological photograph of infected bone cells is obtained to demonstrate the influence of plasma on infected bone cells, as well as the extent of destruction and effect of plasma therapy before and after exposure. The findings of the search show that plasma has a clear impact on Ca and vitamin D levels. In the cold plasma, the levels of osteocalcin and alkali phosphates (ALP) respond as well. Image processing techniques (second-order gray level matrix) with textural elements are employed as an extra proof. The outcome gives good treatment indicators, and the image processing result corresponds to the biological result.
Collapse
|
46
|
Liu X, Lu D, Zhang A, Liu Q, Jiang G. Data-Driven Machine Learning in Environmental Pollution: Gains and Problems. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:2124-2133. [PMID: 35084840 DOI: 10.1021/acs.est.1c06157] [Citation(s) in RCA: 151] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The complexity and dynamics of the environment make it extremely difficult to directly predict and trace the temporal and spatial changes in pollution. In the past decade, the unprecedented accumulation of data, the development of high-performance computing power, and the rise of diverse machine learning (ML) methods provide new opportunities for environmental pollution research. The ML methodology has been used in satellite data processing to obtain ground-level concentrations of atmospheric pollutants, pollution source apportionment, and spatial distribution modeling of water pollutants. However, unlike the active practices of ML in chemical toxicity prediction, advanced algorithms such as deep neural networks in environmental process studies of pollutants are still deficient. In addition, over 40% of the environmental applications of ML go to air pollution, and its application range and acceptance in other aspects of environmental science remain to be increased. The use of ML methods to revolutionize environmental science and its problem-solving scenarios has its own challenges. Several issues should be taken into consideration, such as the tradeoff between model performance and interpretability, prerequisites of the machine learning model, model selection, and data sharing.
Collapse
Affiliation(s)
- Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
| | - Dawei Lu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, People's Republic of China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Institute of Environment and Health, Jianghan University, Wuhan 430056, People's Republic of China
| | - Qian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, People's Republic of China
- Institute of Environment and Health, Jianghan University, Wuhan 430056, People's Republic of China
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, People's Republic of China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, People's Republic of China
| |
Collapse
|
47
|
A New Hybrid Based on Long Short-Term Memory Network with Spotted Hyena Optimization Algorithm for Multi-Label Text Classification. MATHEMATICS 2022. [DOI: 10.3390/math10030488] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
An essential work in natural language processing is the Multi-Label Text Classification (MLTC). The purpose of the MLTC is to assign multiple labels to each document. Traditional text classification methods, such as machine learning usually involve data scattering and failure to discover relationships between data. With the development of deep learning algorithms, many authors have used deep learning in MLTC. In this paper, a novel model called Spotted Hyena Optimizer (SHO)-Long Short-Term Memory (SHO-LSTM) for MLTC based on LSTM network and SHO algorithm is proposed. In the LSTM network, the Skip-gram method is used to embed words into the vector space. The new model uses the SHO algorithm to optimize the initial weight of the LSTM network. Adjusting the weight matrix in LSTM is a major challenge. If the weight of the neurons to be accurate, then the accuracy of the output will be higher. The SHO algorithm is a population-based meta-heuristic algorithm that works based on the mass hunting behavior of spotted hyenas. In this algorithm, each solution of the problem is coded as a hyena. Then the hyenas are approached to the optimal answer by following the hyena of the leader. Four datasets are used (RCV1-v2, EUR-Lex, Reuters-21578, and Bookmarks) to evaluate the proposed model. The assessments demonstrate that the proposed model has a higher accuracy rate than LSTM, Genetic Algorithm-LSTM (GA-LSTM), Particle Swarm Optimization-LSTM (PSO-LSTM), Artificial Bee Colony-LSTM (ABC-LSTM), Harmony Algorithm Search-LSTM (HAS-LSTM), and Differential Evolution-LSTM (DE-LSTM). The improvement of SHO-LSTM model accuracy for four datasets compared to LSTM is 7.52%, 7.12%, 1.92%, and 4.90%, respectively.
Collapse
|
48
|
Sankara Narayanan P, Runthala A. Accurate computational evolution of proteins and its dependence on deep learning and machine learning strategies. BIOCATAL BIOTRANSFOR 2022. [DOI: 10.1080/10242422.2022.2030317] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
49
|
Sun K, Hu X, Feng Z, Wang H, Lv H, Wang Z, Zhang G, Xu S, You X. Predicting Ca 2+ and Mg 2+ ligand binding sites by deep neural network algorithm. BMC Bioinformatics 2022; 22:324. [PMID: 35045825 PMCID: PMC8772041 DOI: 10.1186/s12859-021-04250-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 06/09/2021] [Indexed: 11/25/2022] Open
Abstract
Background Alkaline earth metal ions are important protein binding ligands in human body, and it is of great significance to predict their binding residues. Results In this paper, Mg2+ and Ca2+ ligands are taken as the research objects. Based on the characteristic parameters of protein sequences, amino acids, physicochemical characteristics of amino acids and predicted structural information, deep neural network algorithm is used to predict the binding sites of proteins. By optimizing the hyper-parameters of the deep learning algorithm, the prediction results by the fivefold cross-validation are better than those of the Ionseq method. In addition, to further verify the performance of the proposed model, the undersampling data processing method is adopted, and the prediction results on independent test are better than those obtained by the support vector machine algorithm. Conclusions An efficient method for predicting Mg2+ and Ca2+ ligand binding sites was presented.
Collapse
Affiliation(s)
- Kai Sun
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China. .,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China.
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| | - Hongbin Wang
- College of Data Science and Application, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China
| | - Haotian Lv
- College of Data Science and Application, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China
| | - Ziyang Wang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| | - Gaimei Zhang
- Hohhot First Hospital, Hohhot, 010051, People's Republic of China
| | - Shuang Xu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| | - Xiaoxiao You
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, People's Republic of China.,Inner Mongolia Key Laboratory of Statistical Analysis Theory for Life Data and Neural Network Modeling, Hohhot, People's Republic of China
| |
Collapse
|
50
|
Abstract
Quantitative structure-activity relationship (QSAR) models are routinely applied computational tools in the drug discovery process. QSAR models are regression or classification models that predict the biological activities of molecules based on the features derived from their molecular structures. These models are usually used to prioritize a list of candidate molecules for future laboratory experiments and to help chemists gain better insights into how structural changes affect a molecule's biological activities. Developing accurate and interpretable QSAR models is therefore of the utmost importance in the drug discovery process. Deep neural networks, which are powerful supervised learning algorithms, have shown great promise for addressing regression and classification problems in various research fields, including the pharmaceutical industry. In this chapter, we briefly review the applications of deep neural networks in QSAR modeling and describe commonly used techniques to improve model performance.
Collapse
|