1
|
Halder A, Saha B, Roy M, Majumder S. A novel deep sequential learning architecture for drug drug interaction prediction using DDINet. Sci Rep 2025; 15:9337. [PMID: 40102542 PMCID: PMC11920219 DOI: 10.1038/s41598-025-93952-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2024] [Accepted: 03/11/2025] [Indexed: 03/20/2025] Open
Abstract
Drug drug Interactions (DDI) present considerable challenges in healthcare, often resulting in adverse effects or decreased therapeutic efficacy. This article proposes a novel deep sequential learning architecture called DDINet to predict and classify DDIs between pairs of drugs based on different mechanisms viz., Excretion, Absorption, Metabolism, and Excretion rate (higher serum level) etc. Chemical features such as Hall Smart, Amino Acid count and Carbon types are extracted from each drug (pairs) to apply as an input to the proposed model. Proposed DDINet incorporates attention mechanism and deep sequential learning architectures, such as Long Short-Term Memory and gated recurrent unit. It utilizes the Rcpi toolkit to extract biochemical features of drugs from their chemical composition in Simplified Molecular-Input Line-Entry System format. Experiments are conducted on publicly available DDI datasets from DrugBank and Kaggle. The model's efficacy in predicting and classifying DDIs is evaluated using various performance measures. The experimental results show that DDINet outperformed eight counterpart techniques achieving [Formula: see text] overall accuracy which is also statistically confirmed by Confidence Interval tests and paired t-tests. This architecture may act as an effective computational technique for drug drug interaction with respect to mechanism which may act as a complementary tool to reduce costly wet lab experiments for DDI prediction and classification.
Collapse
Affiliation(s)
- Anindya Halder
- Department of Computer Application, School of Technology, North-Eastern Hill University, Tura Campus, Tura, Meghalaya, 794002, India.
| | - Biswanath Saha
- Department of Computer Application, School of Technology, North-Eastern Hill University, Tura Campus, Tura, Meghalaya, 794002, India.
| | - Moumita Roy
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, 741235, India.
| | - Sukanta Majumder
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, 741235, India.
| |
Collapse
|
2
|
Yang Y, Cheng F. Artificial intelligence streamlines scientific discovery of drug-target interactions. Br J Pharmacol 2025. [PMID: 39843168 DOI: 10.1111/bph.17427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 10/04/2024] [Accepted: 11/01/2024] [Indexed: 01/24/2025] Open
Abstract
Drug discovery is a complicated process through which new therapeutics are identified to prevent and treat specific diseases. Identification of drug-target interactions (DTIs) stands as a pivotal aspect within the realm of drug discovery and development. The traditional process of drug discovery, especially identification of DTIs, is marked by its high costs of experimental assays and low success rates. Computational methods have emerged as indispensable tools, especially those employing artificial intelligence (AI) methods, which could streamline the process, thereby reducing costs and time consumption and potentially increasing success rates. In this review, we focus on the application of AI techniques in DTI prediction. Specifically, we commence with a comprehensive overview of drug discovery and development, along with systematic prediction and validation of DTIs. We proceed to highlight the prominent databases and toolkits used in developing AI methods for DTI prediction, as well as with methodologies for evaluating their efficacy. We further extend the exploration into three primary types of state-of-the-art AI methods used in DTI prediction, including classical machine learning, deep learning and network-based methods. Finally, we summarize the key findings and outline the current challenges and future directions that AI methods face in scientific drug discovery and development.
Collapse
Affiliation(s)
- Yuxin Yang
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | - Feixiong Cheng
- Cleveland Clinic Genome Center, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| |
Collapse
|
3
|
Pimtawong T, Ren J, Lee J, Lee HM, Na D. A review on computational models for predicting protein solubility. J Microbiol 2025; 63:e.2408001. [PMID: 39895070 DOI: 10.71150/jm.2408001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 10/29/2024] [Indexed: 02/04/2025]
Abstract
Protein solubility is a critical factor in the production of recombinant proteins, which are widely used in various industries, including pharmaceuticals, diagnostics, and biotechnology. Predicting protein solubility remains a challenging task due to the complexity of protein structures and the multitude of factors influencing solubility. Recent advances in computational methods, particularly those based on machine learning, have provided powerful tools for predicting protein solubility, thereby reducing the need for extensive experimental trials. This review provides an overview of current computational approaches to predict protein solubility. We discuss the datasets, features, and algorithms employed in these models. The review aims to bridge the gap between computational predictions and experimental validations, fostering the development of more accurate and reliable solubility prediction models that can significantly enhance recombinant protein production.
Collapse
Affiliation(s)
- Teerapat Pimtawong
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Jun Ren
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Jingyu Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Hyang-Mi Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
4
|
Wang Z, Wu J, Zheng M, Geng C, Zhen B, Zhang W, Wu H, Xu Z, Xu G, Chen S, Li X. StaPep: An Open-Source Toolkit for Structure Prediction, Feature Extraction, and Rational Design of Hydrocarbon-Stapled Peptides. J Chem Inf Model 2024; 64:9361-9373. [PMID: 39503524 DOI: 10.1021/acs.jcim.4c01718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
All-hydrocarbon stapled peptides, with their covalent side-chain constraints, provide enhanced proteolytic stability and membrane permeability, making them superior to linear peptides. However, tools for extracting structural and physicochemical descriptors to predict the properties of hydrocarbon-stapled peptides are lacking. To address this, we present StaPep, a Python-based toolkit for generating 3D structures and calculating 21 features for hydrocarbon-stapled peptides. StaPep supports peptides containing two non-standard amino acids (norleucine and 2-aminoisobutyric acid) and six non-natural anchoring residues (S3, S5, S8, R3, R5, and R8), with customization options for other non-standard amino acids. We showcase StaPep's utility through three case studies. The first generates 3D structures of these peptides with a mean RMSD of 1.62 ± 0.86, offering essential structural insights for drug design and biological activity prediction. The second develops machine learning models based on calculated molecular features to differentiate between membrane-permeable and non-permeable stapled peptides, achieving an AUC of 0.93. The third constructs regression models to predict the antimicrobial activity of stapled peptides against Escherichia coli, with a Pearson correlation of 0.84. StaPep's pipeline spans data retrieval, structure generation, feature calculation, and machine learning modeling for hydrocarbon-stapled peptides. The source codes and data set are freely available on Github: https://github.com/dahuilangda/stapep_package.
Collapse
Affiliation(s)
- Zhe Wang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Hangzhou VicrobX Biotech Co., Ltd., Hangzhou 310018, China
| | - Jianping Wu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou 311215, China
| | - Mengjun Zheng
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| | - Chenchen Geng
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| | - Borui Zhen
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| | - Wei Zhang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Hangzhou VicrobX Biotech Co., Ltd., Hangzhou 310018, China
| | - Hui Wu
- Huadong Medicine Co., Ltd., Hangzhou 310015, China
| | - Zhengyang Xu
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| | - Gang Xu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
| | - Si Chen
- School of Medicine, Shanghai University, Shanghai 200444, China
| | - Xiang Li
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| |
Collapse
|
5
|
Contreras-Torres E, Marrero-Ponce Y. MD-LAIs Software: Computing Whole-Sequence and Amino Acid-Level "Embeddings" for Peptides and Proteins. J Chem Inf Model 2024; 64:8665-8672. [PMID: 39552512 DOI: 10.1021/acs.jcim.3c01189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Several computational tools have been developed to calculate sequence-based molecular descriptors (MDs) for peptides and proteins. However, these tools have certain limitations: 1) They generally lack capabilities for curating input data. 2) Their outputs often exhibit significant overlap. 3) There is limited availability of MDs at the amino acid (aa) level. 4) They lack flexibility in computing specific MDs. To address these issues, we developed MD-LAIs (Molecular Descriptors from Local Amino acid Invariants), Java-based software designed to compute both whole-sequence and aa-level MDs for peptides and proteins. These MDs are generated by applying aggregation operators (AOs) to macromolecular vectors containing the chemical-physical and structural properties of aas. The set of AOs includes both nonclassical (e.g., Minkowski norms) and classical AOs (e.g., Radial Distribution Function). Classical AOs capture neighborhood structural information at different k levels, while nonclassical AOs are applied using a sliding window to generalize the aa-level output. A weighting system based on fuzzy membership functions is also included to account for the contributions of individual aas. MD-LAIs features: 1) a module for data curation tasks, 2) a feature selection module, 3) projects of highly relevant MDs, and 4) low-dimensional lists of informative global and aa-level MDs. Overall, we expect that MD-LAIs will be a valuable tool for encoding protein or peptide sequences. The software is freely available as a stand-alone system on GitHub (https://github.com/Grupo-Medicina-Molecular-y-Traslacional/MD_LAIS).
Collapse
Affiliation(s)
- Ernesto Contreras-Torres
- Norewian Cruise Line Holdings Limited, Corporate Center Drive, Miami, Florida 33216, United States
- Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México 03920, México
| | - Yovani Marrero-Ponce
- Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México 03920, México
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito 170157 Pichincha, Ecuador
| |
Collapse
|
6
|
Julian W, Sergeeva O, Cao W, Wu C, Erokwu B, Flask C, Zhang L, Wang X, Basilion J, Yang S, Lee Z. Searching for Protein Off-Targets of Prostate-Specific Membrane Antigen-Targeting Radioligands in the Salivary Glands. Cancer Biother Radiopharm 2024; 39:721-732. [PMID: 39268679 PMCID: PMC11824224 DOI: 10.1089/cbr.2024.0066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/17/2024] Open
Abstract
Background: Prostate specific membrane antigen (PSMA)-targeted radioligand therapies represent a highly effective treatment for metastatic prostate cancer. However, high and sustain uptake of PSMA-ligands in the salivary glands led to dose limiting dry mouth (xerostomia), especially with α-emitters. The expression of PSMA and histologic analysis couldn't directly explain the toxicity, suggesting a potential off-target mediator for uptake. In this study, we searched for possible off-target non-PSMA protein(s) in the salivary glands. Methods: A machine-learning based quantitative structure activity relationship (QSAR) model was built for seeking the possible off-target(s). The resulting target candidates from the model prediction were subjected to further analysis for salivary protein expression and structural homology at key regions required for PSMA-ligand binding. Furthermore, cellular binding assays were performed utilizing multiple cell lines with high expression of the candidate proteins and low expression of PSMA. Finally, PSMA knockout (PSMA-/-) mice were scanned by small animal PET/MR using [68Ga]Ga-PSMA-11 for in-vivo validation. Results: The screening of the trained QSAR model did not yield a solid off-target protein, which was corroborated in part by cellular binding assays. Imaging using PSMA-/- mice further demonstrated markedly reduced PSMA-radioligand uptake in the salivary glands. Conclusion: Uptake of the PSMA-targeted radioligands in the salivary glands remains primarily PSMA-mediated. Further investigations are needed to illustrate a seemingly different process of uptake and retention in the salivary glands than that in prostate cancer.
Collapse
Affiliation(s)
- William Julian
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Olga Sergeeva
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Wei Cao
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Chunying Wu
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Bernadette Erokwu
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Chris Flask
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Lifang Zhang
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Xinning Wang
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
- Biomedical Engineering Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - James Basilion
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
- Biomedical Engineering Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Sichun Yang
- Nutrition Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Zhenghong Lee
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
- Biomedical Engineering Department, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
7
|
Ancuceanu R, Popovici PC, Drăgănescu D, Busnatu Ș, Lascu BE, Dinu M. QSAR Regression Models for Predicting HMG-CoA Reductase Inhibition. Pharmaceuticals (Basel) 2024; 17:1448. [PMID: 39598360 PMCID: PMC11597356 DOI: 10.3390/ph17111448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 10/27/2024] [Accepted: 10/28/2024] [Indexed: 11/29/2024] Open
Abstract
BACKGROUND/OBJECTIVES HMG-CoA reductase is an enzyme that regulates the initial stage of cholesterol synthesis, and its inhibitors are widely used in the treatment of cardiovascular diseases. METHODS We have created a set of quantitative structure-activity relationship (QSAR) models for human HMG-CoA reductase inhibitors using nested cross-validation as the primary validation method. To develop the QSAR models, we employed various machine learning regression algorithms, feature selection methods, and fingerprints or descriptor datasets. RESULTS We built and evaluated a total of 300 models, selecting 21 that demonstrated good performance (coefficient of determination, R2 ≥ 0.70 or concordance correlation coefficient, CCC ≥ 0.85). Six of these top-performing models met both performance criteria and were used to construct five ensemble models. We identified the descriptors most important in explaining HMG-CoA inhibition for each of the six best-performing models. We used the top models to search through over 220,000 chemical compounds from a large database (ZINC 15) for potential new inhibitors. Only a small fraction (237 out of approximately 220,000 compounds) had reliable predictions with mean pIC50 values ≥ 8 (IC50 values ≤ 10 nM). Our svm-based ensemble model predicted IC50 values < 10 nM for roughly 0.08% of the screened compounds. We have also illustrated the potential applications of these QSAR models in understanding the cholesterol-lowering activities of herbal extracts, such as those reported for an extract prepared from the Iris × germanica rhizome. CONCLUSIONS Our QSAR models can accurately predict human HMG-CoA reductase inhibitors, having the potential to accelerate the discovery of novel cholesterol-lowering agents and may also be applied to understand the mechanisms underlying the reported cholesterol-lowering activities of herbal extracts.
Collapse
Affiliation(s)
- Robert Ancuceanu
- Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020021 Bucharest, Romania; (R.A.); (P.C.P.); (B.E.L.); (M.D.)
| | - Patriciu Constantin Popovici
- Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020021 Bucharest, Romania; (R.A.); (P.C.P.); (B.E.L.); (M.D.)
| | - Doina Drăgănescu
- Department of Pharmaceutical Physics, Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020021 Bucharest, Romania
| | - Ștefan Busnatu
- Department of Cardiology, Carol Davila University of Medicine and Pharmacy, 020021 Bucharest, Romania;
- Emergency Hospital “Bagdasar-Arseni”, 050474 Bucharest, Romania
| | - Beatrice Elena Lascu
- Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020021 Bucharest, Romania; (R.A.); (P.C.P.); (B.E.L.); (M.D.)
| | - Mihaela Dinu
- Department of Pharmaceutical Botany and Cell Biology, Faculty of Pharmacy, Carol Davila University of Medicine and Pharmacy, 020021 Bucharest, Romania; (R.A.); (P.C.P.); (B.E.L.); (M.D.)
| |
Collapse
|
8
|
Adolph C, Hards K, Williams ZC, Cheung CY, Keighley LM, Jowsey WJ, Kyte M, Inaoka DK, Kita K, Mackenzie JS, Steyn AJC, Li Z, Yan M, Tian GB, Zhang T, Ding X, Furkert DP, Brimble MA, Hickey AJR, McNeil MB, Cook GM. Identification of Chemical Scaffolds That Inhibit the Mycobacterium tuberculosis Respiratory Complex Succinate Dehydrogenase. ACS Infect Dis 2024; 10:3496-3515. [PMID: 39268963 DOI: 10.1021/acsinfecdis.3c00655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Abstract
Drug-resistant Mycobacterium tuberculosis is a significant cause of infectious disease morbidity and mortality for which new antimicrobials are urgently needed. Inhibitors of mycobacterial respiratory energy metabolism have emerged as promising next-generation antimicrobials, but a number of targets remain unexplored. Succinate dehydrogenase (SDH), a focal point in mycobacterial central carbon metabolism and respiratory energy production, is required for growth and survival in M. tuberculosis under a number of conditions, highlighting the potential of inhibitors targeting mycobacterial SDH enzymes. To advance SDH as a novel drug target in M. tuberculosis, we utilized a combination of biochemical screening and in-silico deep learning technologies to identify multiple chemical scaffolds capable of inhibiting mycobacterial SDH activity. Antimicrobial susceptibility assays show that lead inhibitors are bacteriostatic agents with activity against wild-type and drug-resistant strains of M. tuberculosis. Mode of action studies on lead compounds demonstrate that the specific inhibition of SDH activity dysregulates mycobacterial metabolism and respiration and results in the secretion of intracellular succinate. Interaction assays demonstrate that the chemical inhibition of SDH activity potentiates the activity of other bioenergetic inhibitors and prevents the emergence of resistance to a variety of drugs. Overall, this study shows that SDH inhibitors are promising next-generation antimicrobials against M. tuberculosis.
Collapse
Affiliation(s)
- Cara Adolph
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
| | - Kiel Hards
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
| | - Zoe C Williams
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
- Africa Health Research Institute, University of KwaZulu Natal, Durban 4001, South Africa
| | - Chen-Yi Cheung
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
| | - Laura M Keighley
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
| | - William J Jowsey
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
| | - Matson Kyte
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
| | - Daniel Ken Inaoka
- School of Tropical Medicine and Global Health, Nagasaki University, Nagasaki 852-8523, Japan
- Department of Biomedical Chemistry, Graduate School of Medicine, The University of Tokyo, Tokyo 113-0033, Japan
- Department of Molecular Infection Dynamics, Institute of Tropical Medicine (NEKKEN), Nagasaki University, Nagasaki 852-8523, Japan
| | - Kiyoshi Kita
- School of Tropical Medicine and Global Health, Nagasaki University, Nagasaki 852-8523, Japan
- Department of Host-Defence Biochemistry, Institute of Tropical Medicine (NEKKEN), Nagasaki University, Nagasaki 852-8523, Japan
| | - Jared S Mackenzie
- Africa Health Research Institute, University of KwaZulu Natal, Durban 4001, South Africa
| | - Adrie J C Steyn
- Africa Health Research Institute, University of KwaZulu Natal, Durban 4001, South Africa
- Department of Microbiology, University of Alabama at Birmingham, Birmingham, Alabama 35294, United States
- Centres for AIDS Research and Free Radical Biology, University of Alabama at Birmingham, Birmingham, Alabama 35294, United States
| | - Zhengqiu Li
- School of Pharmacy, Jinan University, Guangzhou 510632, China
| | - Ming Yan
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Guo-Bao Tian
- Department of Immunology, School of Medicine, Sun Yat-Sen University, Shenzhen 518107, China
- Advanced Medical Technology Centre, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou 510080, China
- Key Laboratory of Tropical Diseases Control, Ministry of Education, Sun Yat-Sen University, Guangzhou 510080, China
| | - Tianyu Zhang
- State Key Laboratory of Respiratory Disease, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Guangdong-Hong Kong-Macao Joint Laboratory of Respiratory Infectious Diseases, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- China-New Zealand Joint Laboratory on Biomedicine and Health, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Xiaobo Ding
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
- School of Chemical Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Daniel P Furkert
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Margaret A Brimble
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
- School of Chemical Sciences, University of Auckland, Auckland 1010, New Zealand
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Anthony J R Hickey
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
| | - Matthew B McNeil
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
| | - Gregory M Cook
- Department of Microbiology and Immunology, School of Biomedical Sciences, University of Otago, Dunedin 9054, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland 1042, New Zealand
- China-New Zealand Joint Laboratory on Biomedicine and Health, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| |
Collapse
|
9
|
Feng C, Wei H, Xu C, Feng B, Zhu X, Liu J, Zou Q. iProps: A Comprehensive Software Tool for Protein Classification and Analysis With Automatic Machine Learning Capabilities and Model Interpretation Capabilities. IEEE J Biomed Health Inform 2024; 28:6237-6247. [PMID: 39008396 DOI: 10.1109/jbhi.2024.3425716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Protein classification is a crucial field in bioinformatics. The development of a comprehensive tool that can perform feature evaluation, visualization, automated machine learning, and model interpretation would significantly advance research in protein classification. However, there is a significant gap in the literature regarding tools that integrate all these essential functionalities. This paper presents iProps, a novel Python-based software package, meticulously crafted to fulfill these multifaceted requirements. iProps is distinguished by its proficiency in feature extraction, evaluation, automated machine learning, and interpretation of classification models. Firstly, iProps fully leverages evolutionary information and amino acid reduction information to propose or extend several numerical protein features that are independent of sequence length, including SC-PSSM, ORDip, TRC, CTDC-E, CKSAAGP-E, and so forth; at the same time, it also implements the calculation of 17 other numerical features within the software. iProps also provides feature combination operations for the aforementioned features to generate more hybrid features, and has added data balancing sampling processing as well as built-in classifier settings, among other functionalities. Thus, It can discern the most effective protein class recognition feature from a multitude of candidates, utilizing three automated machine learning algorithms to identify the most optimal classifiers and parameter settings. Furthermore, iProps generates a detailed explanatory report that includes 23 informative graphs derived from three interpretable models. To assess the performance of iProps, a series of numerical experiments were conducted using two well-established datasets. The results demonstrated that our software achieved superior recognition performance in every case. Beyond its contributions to bioinformatics, iProps broadens its applicability by offering robust data analysis tools that are beneficial across various disciplines, capitalizing on its automated machine learning and model interpretation capabilities. As an open-source platform, iProps is readily accessible and features an intuitive user interface, ensuring ease of use for individuals, even those without a background in programming.
Collapse
|
10
|
Bourdakou MM, Melliou E, Magiatis P, Spyrou GM. Computational investigation of the functional landscape of the protective role that extra virgin olive oil consumption may have on chronic lymphocytic leukemia. J Transl Med 2024; 22:869. [PMID: 39334178 PMCID: PMC11428436 DOI: 10.1186/s12967-024-05672-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 09/04/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND The health benefits of the Mediterranean diet are partially attributed to the polyphenols present in extra virgin olive oil (EVOO), which have been shown to have anti-cancer properties. However, the possible effect that EVOO could have on Chronic Lymphocytic Leukemia (CLL) has not been fully explored. METHODS This study investigates the anti-CLL activity of EVOO through a computational multi-level data analysis procedure, focusing on the identification of shared biological functions between them. Specifically, publicly available data from genomics, transcriptomics and proteomics related to EVOO consumption and CLL were collected from several resources and analyzed through a computational pipeline, highlighting common molecular mechanisms and biological processes. Computational verification of a number of the highlighted functional terms associating CLL and EVOO has been performed as well. RESULTS Our investigation revealed four molecular pathways and three biological processes that overlap between mechanisms associated with CLL and those impacted by the consumption of EVOO. To further investigate the common biological functions, we focused on AKT1-related terms, aiming to investigate the potential importance of AKT1 in the anti- CLL effects associated with EVOO. CONCLUSIONS Overall, the results provide valuable insights into the potential beneficial effect of EVOO in CLL and highlight EVOO's bioactive compounds as promising candidates for future investigations.
Collapse
Affiliation(s)
- Marilena M Bourdakou
- Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Eleni Melliou
- Laboratory of Pharmacognosy and Natural Products Chemistry, Department of Pharmacy, National and Kapodistrian University of Athens, Athens, Greece
| | - Prokopios Magiatis
- Laboratory of Pharmacognosy and Natural Products Chemistry, Department of Pharmacy, National and Kapodistrian University of Athens, Athens, Greece.
| | - George M Spyrou
- Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.
| |
Collapse
|
11
|
López-Cortés A, Cabrera-Andrade A, Echeverría-Garcés G, Echeverría-Espinoza P, Pineda-Albán M, Elsitdie N, Bueno-Miño J, Cruz-Segundo CM, Dorado J, Pazos A, Gonzáles-Díaz H, Pérez-Castillo Y, Tejera E, Munteanu CR. Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses. Sci Rep 2024; 14:19359. [PMID: 39169044 PMCID: PMC11339426 DOI: 10.1038/s41598-024-68565-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 07/25/2024] [Indexed: 08/23/2024] Open
Abstract
The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .
Collapse
Affiliation(s)
- Andrés López-Cortés
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador.
| | - Alejandro Cabrera-Andrade
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
- Escuela de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Quito, Ecuador
| | - Gabriela Echeverría-Garcés
- Centro de Referencia Nacional de Genómica, Secuenciación y Bioinformática, Instituto Nacional de Investigación en Salud Pública "Leopoldo Izquieta Pérez", Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Santiago, Chile
| | | | - Micaela Pineda-Albán
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | - Nicole Elsitdie
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | - José Bueno-Miño
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | - Carlos M Cruz-Segundo
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain
- Tecnológico de Estudios Superiores de Jocotitlán, Jocotitlán, Mexico
| | - Julian Dorado
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), University of A Coruna, A Coruña, Spain
| | - Alejandro Pazos
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), University of A Coruna, A Coruña, Spain
- Biomedical Research Institute of A Coruna (INIBIC), University Hospital Complex of A Coruna (CHUAC), A Coruña, Spain
| | - Humberto Gonzáles-Díaz
- Department of Organic Chemistry II, University of the Basque Country UPV/EHU, Biscay, Spain
- IKERBASQUE, Basque Foundation for Science, Biscay, Spain
| | | | - Eduardo Tejera
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
| | - Cristian R Munteanu
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), University of A Coruna, A Coruña, Spain
- Biomedical Research Institute of A Coruna (INIBIC), University Hospital Complex of A Coruna (CHUAC), A Coruña, Spain
| |
Collapse
|
12
|
Liao YH, Chen SZ, Bin YN, Zhao JP, Feng XL, Zheng CH. UsIL-6: An unbalanced learning strategy for identifying IL-6 inducing peptides by undersampling technique. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108176. [PMID: 38677081 DOI: 10.1016/j.cmpb.2024.108176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 03/26/2024] [Accepted: 04/11/2024] [Indexed: 04/29/2024]
Abstract
BACKGROUND AND OBJECTIVE Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. METHODS In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. RESULTS The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. CONCLUSIONS The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.
Collapse
Affiliation(s)
- Yan-Hong Liao
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China
| | - Shou-Zhi Chen
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China
| | - Yan-Nan Bin
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Jian-Ping Zhao
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China.
| | - Xin-Long Feng
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China.
| | - Chun-Hou Zheng
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China; School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
13
|
Han Y, Zhang H, Zeng Z, Liu Z, Lu D, Liu Z. Descriptor-augmented machine learning for enzyme-chemical interaction predictions. Synth Syst Biotechnol 2024; 9:259-268. [PMID: 38450325 PMCID: PMC10915406 DOI: 10.1016/j.synbio.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/08/2024] Open
Abstract
Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals, as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective. This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction. We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation. The influence of protein and chemical descriptors was assessed in three scenarios, which were predicting the activity of unknown relations between known enzymes and known chemicals (new relationship evaluation), predicting the activity of novel enzymes on known chemicals (new enzyme evaluation), and predicting the activity of new chemicals on known enzymes (new chemical evaluation). The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes, whereas chemical descriptors appear no effect. A variety of sequence-based and structure-based protein descriptors were constructed, among which the esm-2 descriptor achieved the best results. Using enzyme families as labels showed that descriptors could cluster proteins well, which could explain the contributions of descriptors to the machine learning model. As a counterpart, in the new chemical evaluation, chemical descriptors made significant improvement in four out of the seven datasets, while protein descriptors appear no effect. We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models. The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy. This work provides guidance for the development of machine learning models for specific enzyme families.
Collapse
Affiliation(s)
- Yilei Han
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Haoye Zhang
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zheni Zeng
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Zhiyuan Liu
- Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
| | - Diannan Lu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| | - Zheng Liu
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
14
|
Pang C, Qiao J, Zeng X, Zou Q, Wei L. Deep Generative Models in De Novo Drug Molecule Generation. J Chem Inf Model 2024; 64:2174-2194. [PMID: 37934070 DOI: 10.1021/acs.jcim.3c01496] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.
Collapse
Affiliation(s)
- Chao Pang
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, Changsha 410082, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250100, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
15
|
Ma Y, Zhao Y, Ma Y. Kernel Bayesian nonlinear matrix factorization based on variational inference for human-virus protein-protein interaction prediction. Sci Rep 2024; 14:5693. [PMID: 38454139 PMCID: PMC10920681 DOI: 10.1038/s41598-024-56208-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 03/04/2024] [Indexed: 03/09/2024] Open
Abstract
Identification of potential human-virus protein-protein interactions (PPIs) contributes to the understanding of the mechanisms of viral infection and to the development of antiviral drugs. Existing computational models often have more hyperparameters that need to be adjusted manually, which limits their computational efficiency and generalization ability. Based on this, this study proposes a kernel Bayesian logistic matrix decomposition model with automatic rank determination, VKBNMF, for the prediction of human-virus PPIs. VKBNMF introduces auxiliary information into the logistic matrix decomposition and sets the prior probabilities of the latent variables to build a Bayesian framework for automatic parameter search. In addition, we construct the variational inference framework of VKBNMF to ensure the solution efficiency. The experimental results show that for the scenarios of paired PPIs, VKBNMF achieves an average AUPR of 0.9101, 0.9316, 0.8727, and 0.9517 on the four benchmark datasets, respectively, and for the scenarios of new human (viral) proteins, VKBNMF still achieves a higher hit rate. The case study also further demonstrated that VKBNMF can be used as an effective tool for the prediction of human-virus PPIs.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, China
| | - Yongbiao Zhao
- School of Computer, Central China Normal University, Wuhan, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China.
- Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang, China.
| |
Collapse
|
16
|
McGibbon M, Shave S, Dong J, Gao Y, Houston DR, Xie J, Yang Y, Schwaller P, Blay V. From intuition to AI: evolution of small molecule representations in drug discovery. Brief Bioinform 2023; 25:bbad422. [PMID: 38033290 PMCID: PMC10689004 DOI: 10.1093/bib/bbad422] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/13/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners' decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Steven Shave
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jie Dong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410013, China
| | - Yumiao Gao
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| | - Jiancong Xie
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University, Guangzhou, 510000, China
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, United Kingdom
| |
Collapse
|
17
|
Díaz-Rojas M, González-Andrade M, Aguayo-Ortiz R, Rodríguez-Sotres R, Pérez-Vásquez A, Madariaga-Mazón A, Mata R. Discovery of inhibitors of protein tyrosine phosphatase 1B contained in a natural products library from Mexican medicinal plants and fungi using a combination of enzymatic and in silico methods*. Front Pharmacol 2023; 14:1281045. [PMID: 38027024 PMCID: PMC10644722 DOI: 10.3389/fphar.2023.1281045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 10/13/2023] [Indexed: 12/01/2023] Open
Abstract
This work aimed to discover protein tyrosine phosphatase 1B (PTP1B) inhibitors from a small molecule library of natural products (NPs) derived from selected Mexican medicinal plants and fungi to find new hits for developing antidiabetic drugs. The products showing similar IC50 values to ursolic acid (UA) (positive control, IC50 = 26.5) were considered hits. These compounds were canophyllol (1), 5-O-(β-D-glucopyranosyl)-7-methoxy-3',4'-dihydroxy-4-phenylcoumarin (2), 3,4-dimethoxy-2,5-phenanthrenediol (3), masticadienonic acid (4), 4',5,6-trihydroxy-3',7-dimethoxyflavone (5), E/Z vermelhotin (6), tajixanthone hydrate (7), quercetin-3-O-(6″-benzoyl)-β-D-galactoside (8), lichexanthone (9), melianodiol (10), and confusarin (11). According to the double-reciprocal plots, 1 was a non-competitive inhibitor, 3 a mixed-type, and 6 competitive. The chemical space analysis of the hits (IC50 < 100 μM) and compounds possessing activity (IC50 in the range of 100-1,000 μM) with the BIOFACQUIM library indicated that the active molecules are chemically diverse, covering most of the known Mexican NPs' chemical space. Finally, a structure-activity similarity (SAS) map was built using the Tanimoto similarity index and PTP1B absolute inhibitory activity, which allows the identification of seven scaffold hops, namely, compounds 3, 5, 6, 7, 8, 9, and 11. Canophyllol (1), on the other hand, is a true analog of UA since it is an SAR continuous zone of the SAS map.
Collapse
Affiliation(s)
- Miriam Díaz-Rojas
- Facultad de Química, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | | | - Rodrigo Aguayo-Ortiz
- Facultad de Química, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | | | | | - Abraham Madariaga-Mazón
- Instituto de Química Unidad Mérida and Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas Unidad Mérida, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Rachel Mata
- Facultad de Química, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
18
|
Ji S, Chen F, Stein P, Wang J, Zhou Z, Wang L, Zhao Q, Lin Z, Liu B, Xu K, Lai F, Xiong Z, Hu X, Kong T, Kong F, Huang B, Wang Q, Xu Q, Fan Q, Liu L, Williams CJ, Schultz RM, Xie W. OBOX regulates mouse zygotic genome activation and early development. Nature 2023; 620:1047-1053. [PMID: 37459895 PMCID: PMC10528489 DOI: 10.1038/s41586-023-06428-3] [Citation(s) in RCA: 64] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Accepted: 07/12/2023] [Indexed: 08/25/2023]
Abstract
Zygotic genome activation (ZGA) activates the quiescent genome to enable the maternal-to-zygotic transition1,2. However, the identity of transcription factors that underlie mammalian ZGA in vivo remains elusive. Here we show that OBOX, a PRD-like homeobox domain transcription factor family (OBOX1-OBOX8)3-5, are key regulators of mouse ZGA. Mice deficient for maternally transcribed Obox1/2/5/7 and zygotically expressed Obox3/4 had a two-cell to four-cell arrest, accompanied by impaired ZGA. The Obox knockout defects could be rescued by restoring either maternal and zygotic OBOX, which suggests that maternal and zygotic OBOX redundantly support embryonic development. Chromatin-binding analysis showed that Obox knockout preferentially affected OBOX-binding targets. Mechanistically, OBOX facilitated the 'preconfiguration' of RNA polymerase II, as the polymerase relocated from the initial one-cell binding targets to ZGA gene promoters and distal enhancers. Impaired polymerase II preconfiguration in Obox mutants was accompanied by defective ZGA and chromatin accessibility transition, as well as aberrant activation of one-cell polymerase II targets. Finally, ectopic expression of OBOX activated ZGA genes and MERVL repeats in mouse embryonic stem cells. These data thus demonstrate that OBOX regulates mouse ZGA and early embryogenesis.
Collapse
Affiliation(s)
- Shuyan Ji
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Fengling Chen
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Paula Stein
- Reproductive and Developmental Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Jiacheng Wang
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Ziming Zhou
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Lijuan Wang
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Qing Zhao
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Zili Lin
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
- College of Animal Science and Technology College, Beijing University of Agriculture, Beijing, China
| | - Bofeng Liu
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Kai Xu
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Fangnong Lai
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Zhuqing Xiong
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Xiaoyu Hu
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Tianxiang Kong
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Feng Kong
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Bo Huang
- Zhejiang Provincial Key Laboratory of Pancreatic Disease, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiujun Wang
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Qianhua Xu
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Qiang Fan
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Ling Liu
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Carmen J Williams
- Reproductive and Developmental Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA
| | - Richard M Schultz
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Anatomy, Physiology and Cell Biology School of Veterinary Medicine University of California, Davis, Davis, CA, USA.
| | - Wei Xie
- Center for Stem Cell Biology and Regenerative Medicine, MOE Key Laboratory of Bioinformatics, New Cornerstone Science Laboratory, School of Life Sciences, Tsinghua University, Beijing, China.
- Tsinghua-Peking Center for Life Sciences, Beijing, China.
| |
Collapse
|
19
|
Panwar P, Yang Q, Martini A. PyL3dMD: Python LAMMPS 3D molecular descriptors package. J Cheminform 2023; 15:69. [PMID: 37507792 PMCID: PMC10385924 DOI: 10.1186/s13321-023-00737-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023] Open
Abstract
Molecular descriptors characterize the biological, physical, and chemical properties of molecules and have long been used for understanding molecular interactions and facilitating materials design. Some of the most robust descriptors are derived from geometrical representations of molecules, called 3-dimensional (3D) descriptors. When calculated from molecular dynamics (MD) simulation trajectories, 3D descriptors can also capture the effects of operating conditions such as temperature or pressure. However, extracting 3D descriptors from MD trajectories is non-trivial, which hinders their wide use by researchers developing advanced quantitative-structure-property-relationship models using machine learning. Here, we describe a suite of open-source Python-based post-processing routines, called PyL3dMD, for calculating 3D descriptors from MD simulations. PyL3dMD is compatible with the popular simulation package LAMMPS and enables users to compute more than 2000 3D molecular descriptors from atomic trajectories generated by MD simulations. PyL3dMD is freely available via GitHub and can be easily installed and used as a highly flexible Python package on all major platforms (Windows, Linux, and macOS). A performance benchmark study used descriptors calculated by PyL3dMD to develop a neural network and the results showed that PyL3dMD is fast and efficient in calculating descriptors for large and complex molecular systems with long simulation durations. PyL3dMD facilitates the calculation of 3D molecular descriptors using MD simulations, making it a valuable tool for cheminformatics studies.
Collapse
Affiliation(s)
- Pawan Panwar
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, CA, 95343, USA.
| | - Quanpeng Yang
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, CA, 95343, USA
| | - Ashlie Martini
- Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, CA, 95343, USA.
| |
Collapse
|
20
|
Emonts J, Buyel J. An overview of descriptors to capture protein properties - Tools and perspectives in the context of QSAR modeling. Comput Struct Biotechnol J 2023; 21:3234-3247. [PMID: 38213891 PMCID: PMC10781719 DOI: 10.1016/j.csbj.2023.05.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/23/2023] [Accepted: 05/23/2023] [Indexed: 01/13/2024] Open
Abstract
Proteins are important ingredients in food and feed, they are the active components of many pharmaceutical products, and they are necessary, in the form of enzymes, for the success of many technical processes. However, production can be challenging, especially when using heterologous host cells such as bacteria to express and assemble recombinant mammalian proteins. The manufacturability of proteins can be hindered by low solubility, a tendency to aggregate, or inefficient purification. Tools such as in silico protein engineering and models that predict separation criteria can overcome these issues but usually require the complex shape and surface properties of proteins to be represented by a small number of quantitative numeric values known as descriptors, as similarly used to capture the features of small molecules. Here, we review the current status of protein descriptors, especially for application in quantitative structure activity relationship (QSAR) models. First, we describe the complexity of proteins and the properties that descriptors must accommodate. Then we introduce descriptors of shape and surface properties that quantify the global and local features of proteins. Finally, we highlight the current limitations of protein descriptors and propose strategies for the derivation of novel protein descriptors that are more informative.
Collapse
Affiliation(s)
- J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Germany
| | - J.F. Buyel
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Muthgasse 18, 1190 Vienna, Austria
- Institute for Molecular Biotechnology, Worringerweg 1, RWTH Aachen University, 52074 Aachen, Germany
| |
Collapse
|
21
|
Pande A, Patiyal S, Lathwal A, Arora C, Kaur D, Dhall A, Mishra G, Kaur H, Sharma N, Jain S, Usmani SS, Agrawal P, Kumar R, Kumar V, Raghava GPS. Pfeature: A Tool for Computing Wide Range of Protein Features and Building Prediction Models. J Comput Biol 2023; 30:204-222. [PMID: 36251780 DOI: 10.1089/cmb.2022.0241] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
In the last three decades, a wide range of protein features have been discovered to annotate a protein. Numerous attempts have been made to integrate these features in a software package/platform so that the user may compute a wide range of features from a single source. To complement the existing methods, we developed a method, Pfeature, for computing a wide range of protein features. Pfeature allows to compute more than 200,000 features required for predicting the overall function of a protein, residue-level annotation of a protein, and function of chemically modified peptides. It has six major modules, namely, composition, binary profiles, evolutionary information, structural features, patterns, and model building. Composition module facilitates to compute most of the existing compositional features, plus novel features. The binary profile of amino acid sequences allows to compute the fraction of each type of residue as well as its position. The evolutionary information module allows to compute evolutionary information of a protein in the form of a position-specific scoring matrix profile generated using Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST); fit for annotation of a protein and its residues. A structural module was developed for computing of structural features/descriptors from a tertiary structure of a protein. These features are suitable to predict the therapeutic potential of a protein containing non-natural or chemically modified residues. The model-building module allows to implement various machine learning techniques for developing classification and regression models as well as feature selection. Pfeature also allows the generation of overlapping patterns and features from a protein. A user-friendly Pfeature is available as a web server python library and stand-alone package.
Collapse
Affiliation(s)
- Akshara Pande
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Lathwal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gaurav Mishra
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Department of Electrical Engineering, Shiv Nadar University, Greater Noida, India
| | - Harpreet Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Shipra Jain
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Salman Sadullah Usmani
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Piyush Agrawal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Rajesh Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Vinod Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
22
|
Ma Y, Zhong J. Logistic tensor decomposition with sparse subspace learning for prediction of multiple disease types of human-virus protein-protein interactions. Brief Bioinform 2023; 24:6961474. [PMID: 36573486 DOI: 10.1093/bib/bbac604] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/04/2022] [Accepted: 12/08/2022] [Indexed: 12/28/2022] Open
Abstract
Viral infection involves a large number of protein-protein interactions (PPIs) between the virus and the host, and the identification of these PPIs plays an important role in revealing viral infection and pathogenesis. Existing computational models focus on predicting whether human proteins and viral proteins interact, and rarely take into account the types of diseases associated with these interactions. Although there are computational models based on a matrix and tensor decomposition for predicting multi-type biological interaction relationships, these methods cannot effectively model high-order nonlinear relationships of biological entities and are not suitable for integrating multiple features. To this end, we propose a novel computational framework, LTDSSL, to determine human-virus PPIs under different disease types. LTDSSL utilizes logistic functions to model nonlinear associations, sets importance levels to emphasize the importance of observed interactions and utilizes sparse subspace learning of multiple features to improve model performance. Experimental results show that LTDSSL has better predictive performance for both new disease types and new triples than the state-of-the-art methods. In addition, the case study further demonstrates that LTDSSL can effectively predict human-viral PPIs under various disease types.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, 361024 , China
| | - Junjiang Zhong
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, 361024 , China
| |
Collapse
|
23
|
Alizadeh AA, Jafari B, Dastmalchi S. Drug Repurposing for Identification of S1P1 Agonists with Potential Application in Multiple Sclerosis Using In Silico Drug Design Approaches. Adv Pharm Bull 2023; 13:113-122. [PMID: 36721815 PMCID: PMC9871275 DOI: 10.34172/apb.2023.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 10/09/2021] [Accepted: 12/31/2021] [Indexed: 02/03/2023] Open
Abstract
Purpose: Drug repurposing is an approach successfully used for discovery of new therapeutic applications for the existing drugs. The current study was aimed to use the combination of in silico methods to identify FDA-approved drugs with possible S1P1 agonistic activity useful in multiple sclerosis (MS). Methods: For this, a 3D-QSAR model for the known 21 S1P1 agonists were generated based on 3D-QSAR approach and used to predict the possible S1P1 agonistic activity of FDA-approved drugs. Then, the selected compounds were screened by docking into S1P1 and S1P3 receptors to select the S1P1 potent and selective compounds. Further evaluation was carried out by molecular dynamics (MD) simulation studies where the S1P1 binding energies of selected compounds were calculated. Results: The analyses resulted in identification of cobicistat, benzonatate and brigatinib as the selective and potent S1P1 agonists with the binding energies of -85.93, -69.77 and -67.44 kcal. mol-1, calculated using MM-GBSA algorithm based on 50 ns MD simulation trajectories. These values are better than that of siponimod (-59.35 kcal mol-1), an FDA approved S1P1 agonist indicated for MS treatment. Furthermore, similarity network analysis revealed that cobicistat and brigatinib are the most structurally favorable compounds to interact with S1P1. Conclusion: The findings in this study revealed that cobicistat and brigatinib can be evaluated in experimental studies as potential S1P1 agonist candidates useful in the treatment of MS.
Collapse
Affiliation(s)
- Ali Akbar Alizadeh
- Biotechnology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.,Pharmaceutical Analysis Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Behzad Jafari
- Department of Medicinal Chemistry, School of Pharmacy, Urmia University of Medical Sciences, Urmia, Iran
| | - Siavoush Dastmalchi
- Biotechnology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.,School of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran.,Corresponding Author: Siavoush Dastmalchi, Emails: ,
| |
Collapse
|
24
|
Guevara-Barrientos D, Kaundal R. ProFeatX: A parallelized protein feature extraction suite for machine learning. Comput Struct Biotechnol J 2022; 21:796-801. [PMID: 36698978 PMCID: PMC9842958 DOI: 10.1016/j.csbj.2022.12.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/26/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022] Open
Abstract
Machine learning algorithms have been successfully applied in proteomics, genomics and transcriptomics. and have helped the biological community to answer complex questions. However, most machine learning methods require lots of data, with every data point having the same vector size. The biological sequence data, such as proteins, are amino acid sequences of variable length, which makes it essential to extract a definite number of features from all the proteins for them to be used as input into machine learning models. There are numerous methods to achieve this, but only several tools let researchers encode their proteins using multiple schemes without having to use different programs or, in many cases, code these algorithms themselves, or even come up with new algorithms. In this work, we created ProFeatX, a tool that contains 50 encodings to extract protein features in an efficient and fast way supporting desktop as well as high-performance computing environment. It can also encode concatenated features for protein-protein interactions. The tool has an easy-to-use web interface, allowing non-experts to use feature extraction techniques, as well as a stand-alone version for advanced users. ProFeatX is implemented in C++ and available on GitHub at https://github.com/usubioinfo/profeatx. The web server is available at http://bioinfo.usu.edu/profeatx/.
Collapse
Affiliation(s)
- David Guevara-Barrientos
- Department of Computer Science, College of Science, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
| | - Rakesh Kaundal
- Department of Computer Science, College of Science, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
- Department of Plants, Soils, and Climate, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
| |
Collapse
|
25
|
Kania A, Sarapata K. Multifarious aspects of the chaos game representation and its applications in biological sequence analysis. Comput Biol Med 2022; 151:106243. [PMID: 36335814 DOI: 10.1016/j.compbiomed.2022.106243] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 10/18/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
Chaos game representation (CGR) has been successfully applied in bioinformatics for over 30 years. Since then, many further extensions were announced. Numerical encoding of biological sequences is especially convenient in the visualisation process, free-alignment methods and input preparation for machine learning techniques. The development and applications of CGR have embraced mainly linear nucleotide sequences. However, there were also some attempts to create a representation of proteins. The latter need to be more sophisticated, as arbitrary coordinates for amino acids do not reflect their properties which is crucial during the encoding process. In this paper, the authors summarised various variations of CGRs and their limitations. We began by studying the PROSITE motifs and showed the immense number of amino acid properties employed by different proteins. To this aim, we harnessed the Principal Component Analysis (PCA) and studied the relation between explained variance and the number of features that describe them. It appeared that even after many reductions, about 50 features are non-redundant. This was the reason we introduced an embedding concept from natural language processing which enables adjusting features for a given list of sequences. We presented a simple neural network architecture with one hidden layer and one neuron within it and showed it provides satisfactory results in phylogenetic tree construction in ND5 and SPARC protein cases. To this aim, we transformed CGR representations for all considered sequences using Discrete Fourier Transform (DFT) and applied Unweighted Pair Group Method with Arithmetic Mean (UPGMA) algorithm. Moreover, we indicated some similarities between CGR and Recurrent Neural Networks (RNN). In the end, we attempted to include information about the RNA secondary structure and defined some measures to validate biological significance. We studied their properties and showed on ALMV-3 example its usefulness.
Collapse
Affiliation(s)
- Adrian Kania
- Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387 Cracow, Poland.
| | - Krzysztof Sarapata
- Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387 Cracow, Poland
| |
Collapse
|
26
|
Cai H, Zhang H, Zhao D, Wu J, Wang L. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction. Brief Bioinform 2022; 23:6702671. [PMID: 36124766 DOI: 10.1093/bib/bbac408] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 07/28/2022] [Accepted: 08/22/2022] [Indexed: 12/14/2022] Open
Abstract
Accurate prediction of molecular properties, such as physicochemical and bioactive properties, as well as ADME/T (absorption, distribution, metabolism, excretion and toxicity) properties, remains a fundamental challenge for molecular design, especially for drug design and discovery. In this study, we advanced a novel deep learning architecture, termed FP-GNN (fingerprints and graph neural networks), which combined and simultaneously learned information from molecular graphs and fingerprints for molecular property prediction. To evaluate the FP-GNN model, we conducted experiments on 13 public datasets, an unbiased LIT-PCBA dataset and 14 phenotypic screening datasets for breast cell lines. Extensive evaluation results showed that compared to advanced deep learning and conventional machine learning algorithms, the FP-GNN algorithm achieved state-of-the-art performance on these datasets. In addition, we analyzed the influence of different molecular fingerprints, and the effects of molecular graphs and molecular fingerprints on the performance of the FP-GNN model. Analysis of the anti-noise ability and interpretation ability also indicated that FP-GNN was competitive in real-world situations. Collectively, FP-GNN algorithm can assist chemists, biologists and pharmacists in predicting and discovering better molecules with desired functions or properties.
Collapse
Affiliation(s)
- Hanxuan Cai
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Huimin Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jingxing Wu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
27
|
Ren K, Su G. Characteristic fragmentations of nitroaromatic compounds (NACs) in Orbitrap HCD and integrated strategy for recognition of NACs in environmental samples. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 834:155106. [PMID: 35398140 DOI: 10.1016/j.scitotenv.2022.155106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 03/28/2022] [Accepted: 04/03/2022] [Indexed: 06/14/2023]
Abstract
Nitroaromatic compounds (NACs) are high of concern due to their mutagenicity, and carcinogenicity to organisms. Here, we attempted to establish a novel searching-validation-evaluation workflow that is tailored to recognize unknown NACs in environmental samples using liquid chromatography coupled with quadrupole Orbitrap high-resolution mass spectrometry (LC-Orbitrap-HRMS). We studied the fragmentation process of NAC standards in Orbitrap higher-energy collision dissociation (HCD) cells and observed that the mass loss of NO was the most prevalent among all NAC standards at both low and medium levels of collision energy. Thus, neutral loss of NO was considered as a diagnostic fragment of nitro groups and was used to screen out NACs in environmental samples. This technique is mass-loss-dependent, which enhances the recognition efficiency of NACs. Candidates exported from the PubChem compound database were further evaluated to obtain possible structures. This strategy was applied for the analysis of 24 surface soil, and we tentatively discovered two novel NACs in the analyzed samples. The semi-quantification results demonstrated that the concentrations of novel NACs were comparable to those of the ten targeted NACs in soil samples. This study provides an integrated strategy for the recognition of known and unknown NACs, which could be extended to other environmental matrices.
Collapse
Affiliation(s)
- Kefan Ren
- Jiangsu Key Laboratory of Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Guanyong Su
- Jiangsu Key Laboratory of Chemical Pollution Control and Resources Reuse, School of Environmental and Biological Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China.
| |
Collapse
|
28
|
Mao J, Zeb A, Kim MS, Jeon HN, Wang J, Guan S, No KT. Development of an innovative data-driven system to generate descriptive prediction equation of dielectric constant on small sample sets. Heliyon 2022; 8:e10011. [PMID: 36016529 PMCID: PMC9396556 DOI: 10.1016/j.heliyon.2022.e10011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 04/13/2022] [Accepted: 07/15/2022] [Indexed: 11/29/2022] Open
Abstract
Dielectric constant (DC, ε) is a fundamental parameter in material sciences to measure polarizability of the system. In industrial processes, its value is an imperative indicator, which demonstrates the dielectric property of material and compiles information including separation information, chemical equilibrium, chemical reactivity analysis, and solubility modeling. Since, the available ε-prediction models are fairly primitive and frequently suffer from serious failures especially when deals with strong polar compounds. Therefore, we have developed a novel data-driven system to improve the efficiency and wide-range applicability of ε using in material sciences. This innovative scheme adopts the correlation distance and genetic algorithm to discriminate features’ combination and avoid overfitting. Herein, the prediction output of the single ML model as a coding to estimate the target value by simulating the layer-by-layer extraction in deep learning, and enabling instant search for the optimal combination of features is recruited. Our model established an improved correlation value of 0.956 with target as compared to the previously available best traditional ML result of 0.877. Our framework established a profound improvement, especially for material systems possessing ε value >50. In terms of interpretability, we have derived a conceptual computational equation from a minimum generating tree. Our innovative data-driven system is preferentially superior over other methods due to its application for the prediction of dielectric constants as well as for the prediction of overall micro and macro-properties of any multi-components complex.
Collapse
|
29
|
Blay V, Gailiunaite S, Lee CY, Chang HY, Hupp T, Houston DR, Chi P. Comparison of ATP-binding pockets and discovery of homologous recombination inhibitors. Bioorg Med Chem 2022; 70:116923. [PMID: 35841829 DOI: 10.1016/j.bmc.2022.116923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/16/2022] [Accepted: 07/06/2022] [Indexed: 11/02/2022]
Abstract
The ATP binding sites of many enzymes are structurally related, which complicates their development as therapeutic targets. In this work, we explore a diverse set of ATPases and compare their ATP binding pockets using different strategies, including direct and indirect structural methods, in search of pockets attractive for drug discovery. We pursue different direct and indirect structural strategies, as well as ligandability assessments to help guide target selection. The analyses indicate human RAD51, an enzyme crucial in homologous recombination, as a promising, tractable target. Inhibition of RAD51 has shown promise in the treatment of certain cancers but more potent inhibitors are needed. Thus, we design compounds computationally against the ATP binding pocket of RAD51 with consideration of multiple criteria, including predicted specificity, drug-likeness, and toxicity. The molecules designed are evaluated experimentally using molecular and cell-based assays. Our results provide two novel hit compounds against RAD51 and illustrate a computational pipeline to design new inhibitors against ATPases.
Collapse
Affiliation(s)
- Vincent Blay
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK; Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I2Sysbio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.
| | - Saule Gailiunaite
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Chih-Ying Lee
- Institute of Biochemical Sciences, National Taiwan University, Taipei 10617, Taiwan
| | - Hao-Yen Chang
- Institute of Biochemical Sciences, National Taiwan University, Taipei 10617, Taiwan
| | - Ted Hupp
- MRC Institute of Genetics & Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
| | - Peter Chi
- Institute of Biochemical Sciences, National Taiwan University, Taipei 10617, Taiwan; Institute of Biological Chemistry, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
30
|
Chen Z, Liu X, Zhao P, Li C, Wang Y, Li F, Akutsu T, Bain C, Gasser RB, Li J, Yang Z, Gao X, Kurgan L, Song J. iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets. Nucleic Acids Res 2022; 50:W434-W447. [PMID: 35524557 PMCID: PMC9252729 DOI: 10.1093/nar/gkac351] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/22/2022] [Accepted: 04/25/2022] [Indexed: 01/07/2023] Open
Abstract
The rapid accumulation of molecular data motivates development of innovative approaches to computationally characterize sequences, structures and functions of biological and chemical molecules in an efficient, accessible and accurate manner. Notwithstanding several computational tools that characterize protein or nucleic acids data, there are no one-stop computational toolkits that comprehensively characterize a wide range of biomolecules. We address this vital need by developing a holistic platform that generates features from sequence and structural data for a diverse collection of molecule types. Our freely available and easy-to-use iFeatureOmega platform generates, analyzes and visualizes 189 representations for biological sequences, structures and ligands. To the best of our knowledge, iFeatureOmega provides the largest scope when directly compared to the current solutions, in terms of the number of feature extraction and analysis approaches and coverage of different molecules. We release three versions of iFeatureOmega including a webserver, command line interface and graphical interface to satisfy needs of experienced bioinformaticians and less computer-savvy biologists and biochemists. With the assistance of iFeatureOmega, users can encode their molecular data into representations that facilitate construction of predictive models and analytical studies. We highlight benefits of iFeatureOmega based on three research applications, demonstrating how it can be used to accelerate and streamline research in bioinformatics, computational biology, and cheminformatics areas. The iFeatureOmega webserver is freely available at http://ifeatureomega.erc.monash.edu and the standalone versions can be downloaded from https://github.com/Superzchen/iFeatureOmega-GUI/ and https://github.com/Superzchen/iFeatureOmega-CLI/.
Collapse
Affiliation(s)
- Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
- Center for Crop Genome Engineering, Henan Agricultural University, Zhengzhou 450046, China
| | - Xuhan Liu
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Yanan Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Chris Bain
- Monash Data Future Institutes, Monash University, Melbourne, Victoria 3800, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
| | - Zuoren Yang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Future Institutes, Monash University, Melbourne, Victoria 3800, Australia
| |
Collapse
|
31
|
Song J, Kim D, Lee S, Jung J, Joo JWJ, Jang W. Integrative transcriptome-wide analysis of atopic dermatitis for drug repositioning. Commun Biol 2022; 5:615. [PMID: 35729261 PMCID: PMC9213508 DOI: 10.1038/s42003-022-03564-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 06/07/2022] [Indexed: 12/13/2022] Open
Abstract
Atopic dermatitis (AD) is one of the most common inflammatory skin diseases, which significantly impact the quality of life. Transcriptome-wide association study (TWAS) was conducted to estimate both transcriptomic and genomic features of AD and detected significant associations between 31 expression quantitative loci and 25 genes. Our results replicated well-known genetic markers for AD, as well as 4 novel associated genes. Next, transcriptome meta-analysis was conducted with 5 studies retrieved from public databases and identified 5 additional novel susceptibility genes for AD. Applying the connectivity map to the results from TWAS and meta-analysis, robustly enriched perturbations were identified and their chemical or functional properties were analyzed. Here, we report the first research on integrative approaches for an AD, combining TWAS and transcriptome meta-analysis. Together, our findings could provide a comprehensive understanding of the pathophysiologic mechanisms of AD and suggest potential drug candidates as alternative treatment options. Integrative genomic and transcriptomic analyses on publicly available data-sets together with in silico drug repositioning identifies alternative therapeutic options to treat atopic dermatitis.
Collapse
Affiliation(s)
- Jaeseung Song
- Department of Life Sciences, Dongguk University-Seoul, 04620, Seoul, Republic of Korea
| | - Daeun Kim
- Department of Life Sciences, Dongguk University-Seoul, 04620, Seoul, Republic of Korea
| | - Sora Lee
- Department of Life Sciences, Dongguk University-Seoul, 04620, Seoul, Republic of Korea
| | - Junghyun Jung
- Department of Life Sciences, Dongguk University-Seoul, 04620, Seoul, Republic of Korea.,Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA
| | - Jong Wha J Joo
- Department of Computer Science and Engineering, Dongguk University-Seoul, 04620, Seoul, Republic of Korea
| | - Wonhee Jang
- Department of Life Sciences, Dongguk University-Seoul, 04620, Seoul, Republic of Korea.
| |
Collapse
|
32
|
Sharifabad MM, Sheikhpour R, Gharaghani S. Drug-target interaction prediction using reliable negative samples and effective feature selection methods. J Pharmacol Toxicol Methods 2022; 116:107191. [PMID: 35738316 DOI: 10.1016/j.vascn.2022.107191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 06/04/2022] [Accepted: 06/14/2022] [Indexed: 11/28/2022]
Abstract
Machine learning-based approaches in the field of drug discovery have dramatically reduced the time and cost of the laboratory process of detecting potential drug-target interactions (DTIs). Standard binary classifiers require both positive and negative samples in the training and validation phases. One of the major challenges in the DTI context is the lack of access to non-interacting pairs as negative samples in the learning process. Many recent studies in this field have randomly selected negative samples from unlabeled drug-target pairs. Therefore, due to the probability of the presence of unknown positive samples in a set considered as negative samples, the model results may be affected and appear with a high rate of false positive. In this study, an algorithm called Reliable Non-Interacting Drug-Target Pairs (RNIDTP) is proposed to select reliable negative samples and an efficient algorithm to select relevant features for drug-target interaction prediction. To validate the performance of the proposed RNIDTP algorithm in the selection of negative samples, a benchmark drug-target interactions dataset is used. The results demonstrate the superiority of the proposed algorithm compared with other algorithms in most cases. The results also indicate that by using an appropriate algorithm for the selection of negative samples, the performance of the learning process is significantly increased compared to random selection.
Collapse
Affiliation(s)
- Mohammad Morovvati Sharifabad
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Razieh Sheikhpour
- Department of Computer Engineering, Faculty of Engineering, Ardakan University, P.O. Box 184, Ardakan, Iran.
| | - Sajjad Gharaghani
- Laboratory of Bioinformatics and Drug Design (LBD), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
33
|
Charoenkwan P, Schaduangrat N, Lio' P, Moni MA, Manavalan B, Shoombuatong W. NEPTUNE: A novel computational approach for accurate and large-scale identification of tumor homing peptides. Comput Biol Med 2022; 148:105700. [PMID: 35715261 DOI: 10.1016/j.compbiomed.2022.105700] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/31/2022] [Accepted: 06/04/2022] [Indexed: 11/16/2022]
Abstract
Tumor homing peptides (THPs) play a crucial role in recognizing and specifically binding to cancer cells. Although experimental approaches can facilitate the precise identification of THPs, they are usually time-consuming, labor-intensive, and not cost-effective. However, computational approaches can identify THPs by utilizing sequence information alone, thus highlighting their great potential for large-scale identification of THPs. Herein, we propose NEPTUNE, a novel computational approach for the accurate and large-scale identification of THPs from sequence information. Specifically, we constructed variant baseline models from multiple feature encoding schemes coupled with six popular machine learning algorithms. Subsequently, we comprehensively assessed and investigated the effects of these baseline models on THP prediction. Finally, the probabilistic information generated by the optimal baseline models is fed into a support vector machine-based classifier to construct the final meta-predictor (NEPTUNE). Cross-validation and independent tests demonstrated that NEPTUNE achieved superior performance for THP prediction compared with its constituent baseline models and the existing methods. Moreover, we employed the powerful SHapley additive exPlanations method to improve the interpretation of NEPTUNE and elucidate the most important features for identifying THPs. Finally, we implemented an online web server using NEPTUNE, which is available at http://pmlabstack.pythonanywhere.com/NEPTUNE. NEPTUNE could be beneficial for the large-scale identification of unknown THP candidates for follow-up experimental validation.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pietro Lio'
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, QLD, 4072, Australia
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
34
|
Alkhadrawi AM, Xue H, Ahmad N, Akram M, Wang Y, Li C. Molecular study on the role of vacuolar transporters in glycyrrhetinic acid production in engineered Saccharomyces cerevisiae. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2022; 1864:183890. [PMID: 35181296 DOI: 10.1016/j.bbamem.2022.183890] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 02/06/2022] [Accepted: 02/09/2022] [Indexed: 12/25/2022]
Abstract
Glycyrrhetinic acid (GA) is one of the major bioactive components of the leguminous plant, Glycyrrhiza spp. (Chinese licorice). Owing to GA's complicated chemical structure, its production by chemical synthesis is challenging and requires other efficient strategies such as microbial synthesis. Earlier investigations employed numerous approaches to improve GA yield by refining the synthetic pathway and improving the metabolic flux. Nevertheless, the metabolic role of transporters in GA biosynthesis in microbial cell factories has not been studied so far. In this study, we investigated the role of yeast ATP binding cassette (ABC) vacuolar transporters in GA production. Molecular docking of GA and its precursors, β-Amyrin and 11-oxo-β-amyrin, was performed with five vacuolar ABC transporters (Bpt1p, Vmr1p, Ybt1p, Ycf1p and Nft1p). Based on docking scores, two top scoring transporters were selected (Bpt1p and Vmr1p) to investigate transporters' functions on GA production via overexpression and knockout experiments in one GA-producing yeast strain (GA166). Results revealed that GA and its precursors exhibited the highest predicted binding affinity towards BPT1 (ΔG = -10.9, -10.6, -10.9 kcal/mol for GA, β-amyrin and 11-oxo-β-amyrin, respectively). Experimental results showed that the overexpression of BPT1 and VMR1 restored the intracellular as well as extracellular GA production level under limited nutritional conditions, whereas knockout of BPT1 resulted in a total loss of GA production. These results suggest that the activity of BPT1 is required for GA production in engineered Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Adham M Alkhadrawi
- Key Laboratory of Medical Molecule Science and Pharmaceutics Engineering, Ministry of Industry and Information Technology, Institute of Biochemical Engineering, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, PR China
| | - Haijie Xue
- Key Laboratory of Medical Molecule Science and Pharmaceutics Engineering, Ministry of Industry and Information Technology, Institute of Biochemical Engineering, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, PR China
| | - Nadeem Ahmad
- Key Laboratory of Medical Molecule Science and Pharmaceutics Engineering, Ministry of Industry and Information Technology, Institute of Biochemical Engineering, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, PR China; Department of Pharmacy, COMSATS University Islamabad, Abbottabad campus, Abbottabad 22060, Pakistan
| | - Muhammad Akram
- Key Laboratory of Medical Molecule Science and Pharmaceutics Engineering, Ministry of Industry and Information Technology, Institute of Biochemical Engineering, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, PR China; Department of Life Sciences, School of Science, University of Management and Technology, Lahore, 54770, Pakistan
| | - Ying Wang
- Key Laboratory of Medical Molecule Science and Pharmaceutics Engineering, Ministry of Industry and Information Technology, Institute of Biochemical Engineering, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, PR China.
| | - Chun Li
- Key Laboratory of Medical Molecule Science and Pharmaceutics Engineering, Ministry of Industry and Information Technology, Institute of Biochemical Engineering, School of Chemistry and Chemical Engineering, Beijing Institute of Technology, Beijing 100081, PR China; Key Lab for Industrial Biocatalysis, Ministry of Education, Department of Chemical Engineering, Tsinghua University, Beijing 100084, PR China; Center for Synthetic and Systems Biology, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
35
|
Tie D, Fan Z, Chen D, Chen X, Chen Q, Chen J, Bo H. Mechanisms of Danggui Buxue Tang on Hematopoiesis via Multiple Targets and Multiple Components: Metabonomics Combined with Database Mining Technology. THE AMERICAN JOURNAL OF CHINESE MEDICINE 2022; 50:1155-1171. [PMID: 35475977 DOI: 10.1142/s0192415x22500471] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This study aimed to explore the mechanism of action of Danggui Buxue Tang (DBT) with its multiple components and targets in the synergistic regulation of hematopoiesis. Mouse models of hematopoiesis were established using antibiotics. Metabolomics was used to detect body metabolites and enriched pathways. The active ingredients, targets, and pathways of DBT were analyzed using system pharmacology. The results of metabolomics and system pharmacology were integrated to identify the key pathways and targets. A total of 515 metabolites were identified using metabolomics. After the action of antibiotics, 49 metabolites were markedly changed: 23 were increased, 26 were decreased, and 11 were significantly reversed after DBT administration. Pathway enrichment analysis showed that these 11 metabolites were related to bile secretion, cofactor biosynthesis, and fatty acid biosynthesis. The results of the pharmacological analysis showed that 616 targets were related to DBT-induced anemia, which were mainly enriched in biological processes, such as bile secretion, biosynthesis of cofactors, and cholesterol metabolism. Combined with the results of metabolomics and system pharmacology, we found that bile acid metabolism and biotin synthesis were the key pathways for DBT. Forty-two targets of DBT were related to these two metabolic pathways. PPI analysis revealed that the top 10 targets were CYP3A4, ABCG2, and UGT1A8. Twenty-one components interacted with these 10 targets. In one case, a target corresponds to multiple components, and a component corresponds to multiple targets. DBT acts on multiple targets of ABCG2, UGT1A8, and CYP3A4 through multiple components, affecting the biosynthesis of cofactors and bile secretion pathways to regulate hematopoiesis.
Collapse
Affiliation(s)
- Defu Tie
- School of Bioscience and Biopharmaceutics, Guangdong Province Key Laboratory for Biotechnology Drug Candidates, P. R. China
| | - Zhaohui Fan
- School of Bioscience and Biopharmaceutics, Guangdong Province Key Laboratory for Biotechnology Drug Candidates, P. R. China
| | - Dan Chen
- School of Bioscience and Biopharmaceutics, Guangdong Province Key Laboratory for Biotechnology Drug Candidates, P. R. China
| | - Xiao Chen
- School of Bioscience and Biopharmaceutics, Guangdong Province Key Laboratory for Biotechnology Drug Candidates, P. R. China
| | - Qizhu Chen
- School of Bioscience and Biopharmaceutics, Guangdong Province Key Laboratory for Biotechnology Drug Candidates, P. R. China
| | - Jun Chen
- College of Pharmacy, Guangdong Pharmaceutical University, 510006 Guangzhou, Guangdong, P. R. China
| | - Huaben Bo
- School of Bioscience and Biopharmaceutics, Guangdong Province Key Laboratory for Biotechnology Drug Candidates, P. R. China
| |
Collapse
|
36
|
Amerifar S, Norouzi M, Ghandi M. A tool for feature extraction from biological sequences. Brief Bioinform 2022; 23:6563937. [PMID: 35383372 DOI: 10.1093/bib/bbac108] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 11/12/2022] Open
Abstract
With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, however, are practical only when the sequences are converted into feature vectors. Many tools target this task including iLearnPlus, a Python-based tool which supports a rich set of features. In this paper, we propose a holistic tool that extracts features from biological sequences (i.e. DNA, RNA and Protein). These features are the inputs to machine learning models that predict properties, structures or functions of the input sequences. Our tool not only supports all features in iLearnPlus but also 30 additional features which exist in the literature. Moreover, our tool is based on R language which makes an alternative for bioinformaticians to transform sequences into feature vectors. We have compared the conversion time of our tool with that of iLearnPlus: we transform the sequences much faster. We convert small nucleotides by a median of 2.8X faster, while we outperform iLearnPlus by a median of 6.3X for large sequences. Finally, in amino acids, our tool achieves a median speedup of 23.9X.
Collapse
Affiliation(s)
- Sare Amerifar
- Bioinformatics, Tatbiat Modares University, Jalal Al Ahmad, 14115-111, Tehran, Iran
| | - Mahammad Norouzi
- Computer Science, Technical University of Darmstadt, Hochschulstr. 1, 64293, Hesse, Germany
| | - Mahmoud Ghandi
- Bioinformatics, Monte Rosa Therapeutics, Summer Street, 02210, Boston, United States
| |
Collapse
|
37
|
Du BX, Qin Y, Jiang YF, Xu Y, Yiu SM, Yu H, Shi JY. Compound–protein interaction prediction by deep learning: Databases, descriptors and models. Drug Discov Today 2022; 27:1350-1366. [DOI: 10.1016/j.drudis.2022.02.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 11/19/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022]
|
38
|
Ignacz G, Szekely G. Deep learning meets quantitative structure–activity relationship (QSAR) for leveraging structure-based prediction of solute rejection in organic solvent nanofiltration. J Memb Sci 2022. [DOI: 10.1016/j.memsci.2022.120268] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
39
|
Ismail H, White C, Al-Barakati H, Newman RH, Kc DB. FEPS: A Tool for Feature Extraction from Protein Sequence. Methods Mol Biol 2022; 2499:65-104. [PMID: 35696075 DOI: 10.1007/978-1-0716-2317-6_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Machine learning has become one of the most popular choices for developing computational approaches in protein structural bioinformatics. The ability to extract features from protein sequence/structure often becomes one of the crucial steps for the development of machine learning-based approaches. Over the years, various sequence, structural, and physicochemical descriptors have been developed for proteins and these descriptors have been used to predict/solve various bioinformatics problems. Hence, several feature extraction tools have been developed over the years to help researchers to generate numeric features from protein sequences. Most of these tools have some limitations regarding the number of sequences they can handle and the subsequent preprocessing that is required for the generated features before they can be fed to machine learning methods. Here, we present Feature Extraction from Protein Sequences (FEPS), a toolkit for feature extraction. FEPS is a versatile software package for generating various descriptors from protein sequences and can handle several sequences: the number of which is limited only by the computational resources. In addition, the features extracted from FEPS do not require subsequent processing and are ready to be fed to the machine learning techniques as it provides various output formats as well as the ability to concatenate these generated features. FEPS is made freely available via an online web server as well as a stand-alone toolkit. FEPS, a comprehensive toolkit for feature extraction, will help spur the development of machine learning-based models for various bioinformatics problems.
Collapse
Affiliation(s)
- Hamid Ismail
- Department of Animal Science, North Carolina A&T State University, Greensboro, NC, USA
| | - Clarence White
- Computational Science and Engineering Department, North Carolina A&T State University, Greensboro, NC, USA
| | - Hussam Al-Barakati
- Department of Computer Science, Jamoum University College, Umm Al-Qura University, Jamoum, Saudi Arabia
| | - Robert H Newman
- Department of Biology, North Carolina A&T State University, Greensboro, NC, USA
| | - Dukka B Kc
- Department of Computer Science, Michigan Technological University, Houghton, MI, USA.
| |
Collapse
|
40
|
Antimalarial Drug Predictions Using Molecular Descriptors and Machine Learning against Plasmodium Falciparum. Biomolecules 2021; 11:biom11121750. [PMID: 34944394 PMCID: PMC8698534 DOI: 10.3390/biom11121750] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/12/2021] [Accepted: 11/17/2021] [Indexed: 11/16/2022] Open
Abstract
Malaria remains by far one of the most threatening and dangerous illnesses caused by the plasmodium falciparum parasite. Chloroquine (CQ) and first-line artemisinin-based combination treatment (ACT) have long been the drug of choice for the treatment and controlling of malaria; however, the emergence of CQ-resistant and artemisinin resistance parasites is now present in most areas where malaria is endemic. In this work, we developed five machine learning models to predict antimalarial bioactivities of a drug against plasmodium falciparum from the features (i.e., molecular descriptors values) obtained from PaDEL software from SMILES of compounds and compare the machine learning models by experiments with our collected data of 4794 instances. As a consequence, we found that three models amongst the five, namely artificial neural network (ANN), extreme gradient boost (XGB), and random forest (RF), outperform the others in terms of accuracy while observing that, using roughly a quarter of the promising descriptors picked by the feature selection algorithm, the five models achieved equivalent and comparable performance. Nevertheless, the contribution of all molecular descriptors in the models was investigated through the comparison of their rank values by the feature selection algorithm and found that the most potent and relevant descriptors which come from the ‘Autocorrelation’ module contributed more while the ‘Atom type electrotopological state’ contributed the least to the model.
Collapse
|
41
|
Shityakov S, Skorb EV, Förster CY, Dandekar T. Scaffold Searching of FDA and EMA-Approved Drugs Identifies Lead Candidates for Drug Repurposing in Alzheimer's Disease. Front Chem 2021; 9:736509. [PMID: 34751244 PMCID: PMC8571023 DOI: 10.3389/fchem.2021.736509] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 09/22/2021] [Indexed: 11/24/2022] Open
Abstract
Clinical trials of novel therapeutics for Alzheimer's Disease (AD) have consumed a significant amount of time and resources with largely negative results. Repurposing drugs already approved by the Food and Drug Administration (FDA), European Medicines Agency (EMA), or Worldwide for another indication is a more rapid and less expensive option. Therefore, we apply the scaffold searching approach based on known amyloid-beta (Aβ) inhibitor tramiprosate to screen the DrugCentral database (n = 4,642) of clinically tested drugs. As a result, menadione bisulfite and camphotamide substances with protrombogenic and neurostimulation/cardioprotection effects were identified as promising Aβ inhibitors with an improved binding affinity (ΔGbind) and blood-brain barrier permeation (logBB). Finally, the data was also confirmed by molecular dynamics simulations using implicit solvation, in particular as Molecular Mechanics Generalized Born Surface Area (MM-GBSA) model. Overall, the proposed in silico pipeline can be implemented through the early stage rational drug design to nominate some lead candidates for AD, which will be further validated in vitro and in vivo, and, finally, in a clinical trial.
Collapse
Affiliation(s)
- Sergey Shityakov
- Laboratory of Chemoinformatics, Infochemistry Scientific Center, ITMO University, Saint-Petersburg, Russia
| | - Ekaterina V. Skorb
- Laboratory of Chemoinformatics, Infochemistry Scientific Center, ITMO University, Saint-Petersburg, Russia
| | - Carola Y. Förster
- Department of Anaesthesiology, Intensive Care, Emergency and Pain Medicine, Würzburg University Hospital, Würzburg, Germany
| | - Thomas Dandekar
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| |
Collapse
|
42
|
Li HL, Pang YH, Liu B. BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic Acids Res 2021; 49:e129. [PMID: 34581805 PMCID: PMC8682797 DOI: 10.1093/nar/gkab829] [Citation(s) in RCA: 143] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 08/24/2021] [Accepted: 09/09/2021] [Indexed: 01/08/2023] Open
Abstract
In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
Collapse
Affiliation(s)
- Hong-Liang Li
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
43
|
Li H, Tamang T, Nantasenamat C. Toward insights on antimicrobial selectivity of host defense peptides via machine learning model interpretation. Genomics 2021; 113:3851-3863. [PMID: 34480984 DOI: 10.1016/j.ygeno.2021.08.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 08/22/2021] [Accepted: 08/25/2021] [Indexed: 10/20/2022]
Abstract
Host defense peptides are promising candidates for the development of novel antibiotics. To realize their therapeutic potential, high levels of target selectivity is essential. This study aims to identify factors governing selectivity via the use of the random forest algorithm for correlating peptide sequence information with their bioactivity data. Satisfactory predictive models were achieved from out-of-bag prediction that yielded accuracies and Matthew's correlation coefficients in excess of 0.80 and 0.57, respectively. Model interpretation through the use of variable importance metrics and partial dependence plots indicated that the selectivity was heavily influenced by the composition and distribution patterns of molecular charge and solubility related parameters. Furthermore, the three investigated bacterial target species (Escherichia coli, Pseudomonas aeruginosa and Staphylococcus aureus) likely had a significant influence on how selectivity was realized as there appears to be a similar underlying selectivity mechanism on the basis of charge-solubility properties (i.e. but which is tailored according to the target in question).
Collapse
Affiliation(s)
- Hao Li
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Thinam Tamang
- Madan Bhandari Memorial College, Institute of Science and Technology, Tribhuvan University, Kathmandu 44602, Nepal
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
44
|
Charoenkwan P, Chiangjong W, Hasan MM, Nantasenamat C, Shoombuatong W. Review and comparative analysis of machine learning-based predictors for predicting and analyzing of anti-angiogenic peptides. Curr Med Chem 2021; 29:849-864. [PMID: 34375178 DOI: 10.2174/0929867328666210810145806] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
Cancer is one of the leading causes of death worldwide and underlying this is angiogenesis that represents one of the hallmarks of cancer. Ongoing effort is already under way in the discovery of anti-angiogenic peptides (AAPs) as a promising therapeutic route by tackling the formation of new blood vessels. As such, the identification of AAPs constitutes a viable path for understanding their mechanistic properties pertinent for the discovery of new anti-cancer drugs. In spite of the abundance of peptide sequences in public databases, experimental efforts in the identification of anti-angiogenic peptides have progressed very slowly owing to its high expenditures and laborious nature. Owing to its inherent ability to make sense of large volumes of data, machine learning (ML) represents a lucrative technique that can be harnessed for peptide-based drug discovery. In this review, we conducted a comprehensive and comparative analysis of ML-based AAP predictors in terms of their employed feature descriptors, ML algorithms, cross-validation methods and prediction performance. Moreover, the common framework of these AAP predictors and their inherent weaknesses are also discussed. Particularly, we explore future perspectives for improving the prediction accuracy and model interpretability, which represents an interesting avenue for overcoming some of the inherent weaknesses of existing AAP predictors. We anticipate that this review would assist researchers in the rapid screening and identification of promising AAPs for clinical use.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, United States
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
45
|
Santiago Á, Guzmán-Ocampo DC, Aguayo-Ortiz R, Dominguez L. Characterizing the Chemical Space of γ-Secretase Inhibitors and Modulators. ACS Chem Neurosci 2021; 12:2765-2775. [PMID: 34291906 DOI: 10.1021/acschemneuro.1c00313] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
γ-Secretase (GS) is one of the most attractive molecular targets for the treatment of Alzheimer's disease (AD). Its key role in the final step of amyloid-β peptides generation and its relationship in the cascade of events for disease development have caught the attention of many pharmaceutical groups. Over the past years, different inhibitors and modulators have been evaluated as promising therapeutics against AD. However, despite the great chemical diversity of the reported compounds, a global classification and visual representation of the chemical space for GS inhibitors and modulators remain unavailable. In the present work, we carried out a two-dimensional (2D) chemical space analysis from different classes and subclasses of GS inhibitors and modulators based on their structural similarity. Along with the novel structural information available for GS complexes, our analysis opens the possibility to identify compounds with high molecular similarity, critical to finding new chemical structures through the optimization of existing compounds and relating them with a potential binding site.
Collapse
Affiliation(s)
- Ángel Santiago
- Departamento de Fisicoquímica, Facultad de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Dulce C. Guzmán-Ocampo
- Departamento de Fisicoquímica, Facultad de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Rodrigo Aguayo-Ortiz
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Laura Dominguez
- Departamento de Fisicoquímica, Facultad de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| |
Collapse
|
46
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|
47
|
Suratanee A, Buaboocha T, Plaimas K. Prediction of Human- Plasmodium vivax Protein Associations From Heterogeneous Network Structures Based on Machine-Learning Approach. Bioinform Biol Insights 2021; 15:11779322211013350. [PMID: 34188457 PMCID: PMC8212370 DOI: 10.1177/11779322211013350] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 04/04/2021] [Indexed: 11/24/2022] Open
Abstract
Malaria caused by Plasmodium vivax can lead to severe morbidity and death. In addition, resistance has been reported to existing drugs in treating this malaria. Therefore, the identification of new human proteins associated with malaria is urgently needed for the development of additional drugs. In this study, we established an analysis framework to predict human-P. vivax protein associations using network topological profiles from a heterogeneous network structure of human and P. vivax, machine-learning techniques and statistical analysis. Novel associations were predicted and ranked to determine the importance of human proteins associated with malaria. With the best-ranking score, 411 human proteins were identified as promising proteins. Their regulations and functions were statistically analyzed, which led to the identification of proteins involved in the regulation of membrane and vesicle formation, and proteasome complexes as potential targets for the treatment of P. vivax malaria. In conclusion, by integrating related data, our analysis was efficient in identifying potential targets providing an insight into human-parasite protein associations. Furthermore, generalizing this model could allow researchers to gain further insights into other diseases and enhance the field of biomedical science.
Collapse
Affiliation(s)
- Apichat Suratanee
- Department of Mathematics, Faculty of
Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok,
Thailand
| | - Teerapong Buaboocha
- Department of Biochemistry, Faculty of
Science, Chulalongkorn University, Bangkok, Thailand
- Omics Sciences and Bioinformatics
Center, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Kitiporn Plaimas
- Omics Sciences and Bioinformatics
Center, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Advanced Virtual and Intelligent
Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of
Science, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
48
|
Zanganeh S, Firoozpour L, Sardari S, Afgar A, Cohan RA, Mohajel N. Novel Descriptors Derived from the Aggregation Propensity of Di- and Tripeptides Can Predict the Critical Aggregation Concentration of Longer Peptides. ACS OMEGA 2021; 6:13331-13340. [PMID: 34056481 PMCID: PMC8158804 DOI: 10.1021/acsomega.1c01293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 04/28/2021] [Indexed: 05/14/2023]
Abstract
Self-assembling amphiphilic peptides have recently received special attention in medicine. Nonetheless, testing the myriad of combinations generated from at least 20 coded and several hundreds of noncoded amino acids to obtain candidate sequences for each application, if possible, is time-consuming and expensive. Therefore, rapid and accurate approaches are needed to select candidates from countless combinations. In the current study, we examined three conventional descriptor sets along with a novel descriptor set derived from the simulated aggregation propensity of di- and tripeptides to model the critical aggregation concentration (CAC) of amphiphilic peptides. In contrast to the conventional descriptors, the radial kernel model derived from the novel descriptor set accurately predicted the critical aggregation concentration of the test set with a residual standard error of 0.10. The importance of aromatic side chains, as well as neighboring amino acids in the self-assembly, was emphasized by analysis of the influential descriptors. The addition of very long peptides (70-100 residues) to the data set decreased the model accuracy and changed the influential descriptors. The developed model can be used to predict the CAC of self-assembling amphiphilic peptides and also to derive rules to apply in designing novel amphiphilic peptides with desired properties.
Collapse
Affiliation(s)
- Saeed Zanganeh
- Department
of Nanobiotechnology, New Technologies Research Group, Pasteur Institute of Iran, Tehran 1316943551, Iran
- Department
of Hematology and Medical Laboratory Sciences, Faculty of Allied Medicine, Kerman University of Medical Sciences, Kerman 7616911333, Iran
| | - Loghman Firoozpour
- Department
of Medicinal Chemistry, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 1416753955, Iran
| | - Soroush Sardari
- Drug
Design and Bioinformatics Unit, Medical Biotechnology Department,
Biotechnology Research Center, Pasteur Institute
of Iran, Tehran 1316943551, Iran
| | - Ali Afgar
- Research
Center for Hydatid Disease in Iran, School of Medicine, Kerman University of Medical Sciences, Kerman 7616914115, Iran
| | - Reza Ahangari Cohan
- Department
of Nanobiotechnology, New Technologies Research Group, Pasteur Institute of Iran, Tehran 1316943551, Iran
| | - Nasir Mohajel
- Department
of Molecular Virology, Pasteur Institute
of Iran, Tehran 1316943551, Iran
| |
Collapse
|
49
|
Killoran MP, Levin S, Boursier ME, Zimmerman K, Hurst R, Hall MP, Machleidt T, Kirkland TA, Friedman Ohana R. An Integrated Approach toward NanoBRET Tracers for Analysis of GPCR Ligand Engagement. Molecules 2021; 26:molecules26102857. [PMID: 34065854 PMCID: PMC8151276 DOI: 10.3390/molecules26102857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/06/2021] [Accepted: 05/07/2021] [Indexed: 01/22/2023] Open
Abstract
Gaining insight into the pharmacology of ligand engagement with G-protein coupled receptors (GPCRs) under biologically relevant conditions is vital to both drug discovery and basic research. NanoLuc-based bioluminescence resonance energy transfer (NanoBRET) monitoring competitive binding between fluorescent tracers and unmodified test compounds has emerged as a robust and sensitive method to quantify ligand engagement with specific GPCRs genetically fused to NanoLuc luciferase or the luminogenic HiBiT peptide. However, development of fluorescent tracers is often challenging and remains the principal bottleneck for this approach. One way to alleviate the burden of developing a specific tracer for each receptor is using promiscuous tracers, which is made possible by the intrinsic specificity of BRET. Here, we devised an integrated tracer discovery workflow that couples machine learning-guided in silico screening for scaffolds displaying promiscuous binding to GPCRs with a blend of synthetic strategies to rapidly generate multiple tracer candidates. Subsequently, these candidates were evaluated for binding in a NanoBRET ligand-engagement screen across a library of HiBiT-tagged GPCRs. Employing this workflow, we generated several promiscuous fluorescent tracers that can effectively engage multiple GPCRs, demonstrating the efficiency of this approach. We believe that this workflow has the potential to accelerate discovery of NanoBRET fluorescent tracers for GPCRs and other target classes.
Collapse
Affiliation(s)
- Michael P. Killoran
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Sergiy Levin
- Promega Biosciences LLC, 277 Granada Drive, San Luis Obispo, CA 93401, USA; (S.L.); (T.A.K.)
| | - Michelle E. Boursier
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Kristopher Zimmerman
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Robin Hurst
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Mary P. Hall
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Thomas Machleidt
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
| | - Thomas A. Kirkland
- Promega Biosciences LLC, 277 Granada Drive, San Luis Obispo, CA 93401, USA; (S.L.); (T.A.K.)
| | - Rachel Friedman Ohana
- Promega Corporation, 2800 Woods Hollow, Fitchburg, WI 53711, USA; (M.P.K.); (M.E.B.); (K.Z.); (R.H.); (M.P.H.); (T.M.)
- Correspondence: ; Tel.: +1-608-274-1181
| |
Collapse
|
50
|
Charoenkwan P, Chiangjong W, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. StackIL6: a stacking ensemble model for improving the prediction of IL-6 inducing peptides. Brief Bioinform 2021; 22:6271998. [PMID: 33963832 DOI: 10.1093/bib/bbab172] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/30/2021] [Accepted: 04/10/2021] [Indexed: 12/13/2022] Open
Abstract
The release of interleukin (IL)-6 is stimulated by antigenic peptides from pathogens as well as by immune cells for activating aggressive inflammation. IL-6 inducing peptides are derived from pathogens and can be used as diagnostic biomarkers for predicting various stages of disease severity as well as being used as IL-6 inhibitors for the suppression of aggressive multi-signaling immune responses. Thus, the accurate identification of IL-6 inducing peptides is of great importance for investigating their mechanism of action as well as for developing diagnostic and immunotherapeutic applications. This study proposes a novel stacking ensemble model (termed StackIL6) for accurately identifying IL-6 inducing peptides. More specifically, StackIL6 was constructed from twelve different feature descriptors derived from three major groups of features (composition-based features, composition-transition-distribution-based features and physicochemical properties-based features) and five popular machine learning algorithms (extremely randomized trees, logistic regression, multi-layer perceptron, support vector machine and random forest). To enhance the utility of baseline models, they were effectively and systematically integrated through a stacking strategy to build the final meta-based model. Extensive benchmarking experiments demonstrated that StackIL6 could achieve significantly better performance than the existing method (IL6PRED) and outperformed its constituent baseline models on both training and independent test datasets, which thereby support its excellent discrimination and generalization abilities. To facilitate easy access to the StackIL6 model, it was established as a freely available web server accessible at http://camt.pythonanywhere.com/StackIL6. It is anticipated that StackIL6 can help to facilitate rapid screening of promising IL-6 inducing peptides for the development of diagnostic and immunotherapeutic applications in the future.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | | | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|