101
|
Tsugawa H, Rai A, Saito K, Nakabayashi R. Metabolomics and complementary techniques to investigate the plant phytochemical cosmos. Nat Prod Rep 2021; 38:1729-1759. [PMID: 34668509 DOI: 10.1039/d1np00014d] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Covering: up to 2021Plants and their associated microbial communities are known to produce millions of metabolites, a majority of which are still not characterized and are speculated to possess novel bioactive properties. In addition to their role in plant physiology, these metabolites are also relevant as existing and next-generation medicine candidates. Elucidation of the plant metabolite diversity is thus valuable for the successful exploitation of natural resources for humankind. Herein, we present a comprehensive review on recent metabolomics approaches to illuminate molecular networks in plants, including chemical isolation and enzymatic production as well as the modern metabolomics approaches such as stable isotope labeling, ultrahigh-resolution mass spectrometry, metabolome imaging (spatial metabolomics), single-cell analysis, cheminformatics, and computational mass spectrometry. Mass spectrometry-based strategies to characterize plant metabolomes through metabolite identification and annotation are described in detail. We also highlight the use of phytochemical genomics to mine genes associated with specialized metabolites' biosynthesis. Understanding the metabolic diversity through biotechnological advances is fundamental to elucidate the functions of the plant-derived specialized metabolome.
Collapse
Affiliation(s)
- Hiroshi Tsugawa
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. .,RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.,Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, 2-24-16 Nakamachi, Koganei, Tokyo 184-8588, Japan.,Graduate School of Medical Life Science, Yokohama City University, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
| | - Amit Rai
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. .,Plant Molecular Science Center, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba 260-8675, Japan
| | - Kazuki Saito
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan. .,Plant Molecular Science Center, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba 260-8675, Japan
| | - Ryo Nakabayashi
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.
| |
Collapse
|
102
|
Asai A, Konno M, Taniguchi M, Vecchione A, Ishii H. Computational healthcare: Present and future perspectives (Review). Exp Ther Med 2021; 22:1351. [PMID: 34659497 PMCID: PMC8515560 DOI: 10.3892/etm.2021.10786] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 07/19/2021] [Indexed: 12/05/2022] Open
Abstract
Artificial intelligence (AI) has been developed through repeated new discoveries since around 1960. The use of AI is now becoming widespread within society and our daily lives. AI is also being introduced into healthcare, such as medicine and drug development; however, it is currently biased towards specific domains. The present review traces the history of the development of various AI-based applications in healthcare and compares AI-based healthcare with conventional healthcare to show the future prospects for this type of care. Knowledge of the past and present development of AI-based applications would be useful for the future utilization of novel AI approaches in healthcare.
Collapse
Affiliation(s)
- Ayumu Asai
- Center of Medical Innovation and Translational Research, Department of Medical Data Science, Graduate School of Medicine, Osaka University, Suita, Osaka 565-0871, Japan.,Artificial Intelligence Research Center, Osaka University, Ibaraki, Osaka 567-0047, Japan.,The Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka 567-0047, Japan
| | - Masamitsu Konno
- Center of Medical Innovation and Translational Research, Department of Medical Data Science, Graduate School of Medicine, Osaka University, Suita, Osaka 565-0871, Japan
| | - Masateru Taniguchi
- The Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka 567-0047, Japan
| | - Andrea Vecchione
- Department of Clinical and Molecular Medicine, University of Rome 'Sapienza', Santo Andrea Hospital, I-1035-00189 Rome, Italy
| | - Hideshi Ishii
- Center of Medical Innovation and Translational Research, Department of Medical Data Science, Graduate School of Medicine, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
103
|
Shen T, Wu J, Lan H, Zheng L, Pei J, Wang S, Liu W, Huang J. When homologous sequences meet structural decoys: Accurate contact prediction by tFold in CASP14-(tFold for CASP14 contact prediction). Proteins 2021; 89:1901-1910. [PMID: 34473376 DOI: 10.1002/prot.26232] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/16/2021] [Accepted: 08/20/2021] [Indexed: 12/29/2022]
Abstract
In this paper, we report our tFold framework's performance on the inter-residue contact prediction task in the 14th Critical Assessment of protein Structure Prediction (CASP14). Our tFold framework seamlessly combines both homologous sequences and structural decoys under an ultra-deep network architecture. Squeeze-excitation and axial attention mechanisms are employed to effectively capture inter-residue interactions. In CASP14, our best predictor achieves 41.78% in the averaged top-L precision for long-range contacts for all the 22 free-modeling (FM) targets, and ranked 1st among all the 60 participating teams. The tFold web server is now freely available at: https://drug.ai.tencent.com/console/en/tfold.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Wei Liu
- Tencent AI Lab, Shenzhen, China
| | | |
Collapse
|
104
|
Gala M, Žoldák G. Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone. NANOMATERIALS (BASEL, SWITZERLAND) 2021; 11:2198. [PMID: 34578514 PMCID: PMC8467864 DOI: 10.3390/nano11092198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 08/16/2021] [Accepted: 08/24/2021] [Indexed: 12/17/2022]
Abstract
Artificial proteins can be constructed from stable substructures, whose stability is encoded in their protein sequence. Identifying stable protein substructures experimentally is the only available option at the moment because no suitable method exists to extract this information from a protein sequence. In previous research, we examined the mechanics of E. coli Hsp70 and found four mechanically stable (S class) and three unstable substructures (U class). Of the total 603 residues in the folded domains of Hsp70, 234 residues belong to one of four mechanically stable substructures, and 369 residues belong to one of three unstable substructures. Here our goal is to develop a machine learning model to categorize Hsp70 residues using sequence information. We applied three supervised methods: logistic regression (LR), random forest, and support vector machine. The LR method showed the highest accuracy, 0.925, to predict the correct class of a particular residue only when context-dependent physico-chemical features were included. The cross-validation of the LR model yielded a prediction accuracy of 0.879 and revealed that most of the misclassified residues lie at the borders between substructures. We foresee machine learning models being used to identify stable substructures as candidates for building blocks to engineer new proteins.
Collapse
Affiliation(s)
- Michal Gala
- Department of Biophysics, Faculty of Science, P. J. Šafárik University, Jesena 5, 040 01 Košice, Slovakia;
| | - Gabriel Žoldák
- Center for Interdisciplinary Biosciences, Technology and Innovation Park, P. J. Šafárik University, Trieda SNP 1, 040 11 Košice, Slovakia
| |
Collapse
|
105
|
Quadir F, Roy RS, Soltanikazemi E, Cheng J. DeepComplex: A Web Server of Predicting Protein Complex Structures by Deep Learning Inter-chain Contact Prediction and Distance-Based Modelling. Front Mol Biosci 2021; 8:716973. [PMID: 34497831 PMCID: PMC8419425 DOI: 10.3389/fmolb.2021.716973] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 08/12/2021] [Indexed: 11/13/2022] Open
Abstract
Proteins interact to form complexes. Predicting the quaternary structure of protein complexes is useful for protein function analysis, protein engineering, and drug design. However, few user-friendly tools leveraging the latest deep learning technology for inter-chain contact prediction and the distance-based modelling to predict protein quaternary structures are available. To address this gap, we develop DeepComplex, a web server for predicting structures of dimeric protein complexes. It uses deep learning to predict inter-chain contacts in a homodimer or heterodimer. The predicted contacts are then used to construct a quaternary structure of the dimer by the distance-based modelling, which can be interactively viewed and analysed. The web server is freely accessible and requires no registration. It can be easily used by providing a job name and an email address along with the tertiary structure for one chain of a homodimer or two chains of a heterodimer. The output webpage provides the multiple sequence alignment, predicted inter-chain residue-residue contact map, and predicted quaternary structure of the dimer. DeepComplex web server is freely available at http://tulip.rnet.missouri.edu/deepcomplex/web_index.html.
Collapse
Affiliation(s)
| | | | | | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| |
Collapse
|
106
|
Affiliation(s)
- Patrick Cramer
- Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany.
| |
Collapse
|
107
|
Wang L, Liu J, Xia Y, Xu J, Zhou X, Zhang G. Distance-guided protein folding based on generalized descent direction. Brief Bioinform 2021; 22:6341661. [PMID: 34355233 DOI: 10.1093/bib/bbab296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/30/2021] [Accepted: 07/12/2021] [Indexed: 12/25/2022] Open
Abstract
Advances in the prediction of the inter-residue distance for a protein sequence have increased the accuracy to predict the correct folds of proteins with distance information. Here, we propose a distance-guided protein folding algorithm based on generalized descent direction, named GDDfold, which achieves effective structural perturbation and potential minimization in two stages. In the global stage, random-based direction is designed using evolutionary knowledge, which guides conformation population to cross potential barriers and explore conformational space rapidly in a large range. In the local stage, locally rugged potential landscape can be explored with the aid of conjugate-based direction integrated into a specific search strategy, which can improve the exploitation ability. GDDfold is tested on 347 proteins of a benchmark set, 24 template-free modeling (FM) approaches targets of CASP13 and 20 FM targets of CASP14. Results show that GDDfold correctly folds [template modeling (TM) score ≥ = 0.5] 316 out of 347 proteins, where 65 proteins have TM scores that are greater than 0.8, and significantly outperforms Rosetta-dist (distance-assisted fragment assembly method) and L-BFGSfold (distance geometry optimization method). On CASP FM targets, GDDfold is comparable with five state-of-the-art full-version methods, namely, Quark, RaptorX, Rosetta, MULTICOM and trRosetta in the CASP 13 and 14 server groups.
Collapse
Affiliation(s)
- Liujing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jiakang Xu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
108
|
Bouatta N, Sorger P, AlQuraishi M. Protein structure prediction by AlphaFold2: are attention and symmetries all you need? Acta Crystallogr D Struct Biol 2021; 77:982-991. [PMID: 34342271 PMCID: PMC8329862 DOI: 10.1107/s2059798321007531] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 07/21/2021] [Indexed: 11/11/2022] Open
Abstract
The functions of most proteins result from their 3D structures, but determining their structures experimentally remains a challenge, despite steady advances in crystallography, NMR and single-particle cryoEM. Computationally predicting the structure of a protein from its primary sequence has long been a grand challenge in bioinformatics, intimately connected with understanding protein chemistry and dynamics. Recent advances in deep learning, combined with the availability of genomic data for inferring co-evolutionary patterns, provide a new approach to protein structure prediction that is complementary to longstanding physics-based approaches. The outstanding performance of AlphaFold2 in the recent Critical Assessment of protein Structure Prediction (CASP14) experiment demonstrates the remarkable power of deep learning in structure prediction. In this perspective, we focus on the key features of AlphaFold2, including its use of (i) attention mechanisms and Transformers to capture long-range dependencies, (ii) symmetry principles to facilitate reasoning over protein structures in three dimensions and (iii) end-to-end differentiability as a unifying framework for learning from protein data. The rules of protein folding are ultimately encoded in the physical principles that underpin it; to conclude, the implications of having a powerful computational model for structure prediction that does not explicitly rely on those principles are discussed.
Collapse
Affiliation(s)
- Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Peter Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
109
|
Lento C, Wilson DJ. Subsecond Time-Resolved Mass Spectrometry in Dynamic Structural Biology. Chem Rev 2021; 122:7624-7646. [PMID: 34324314 DOI: 10.1021/acs.chemrev.1c00222] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Life at the molecular level is a dynamic world, where the key players-proteins, oligonucleotides, lipids, and carbohydrates-are in a perpetual state of structural flux, shifting rapidly between local minima on their conformational free energy landscapes. The techniques of classical structural biology, X-ray crystallography, structural NMR, and cryo-electron microscopy (cryo-EM), while capable of extraordinary structural resolution, are innately ill-suited to characterize biomolecules in their dynamically active states. Subsecond time-resolved mass spectrometry (MS) provides a unique window into the dynamic world of biological macromolecules, offering the capacity to directly monitor biochemical processes and conformational shifts with a structural dimension provided by the electrospray charge-state distribution, ion mobility, covalent labeling, or hydrogen-deuterium exchange. Over the past two decades, this suite of techniques has provided important insights into the inherently dynamic processes that drive function and pathogenesis in biological macromolecules, including (mis)folding, complexation, aggregation, ligand binding, and enzyme catalysis, among others. This Review provides a comprehensive account of subsecond time-resolved MS and the advances it has enabled in dynamic structural biology, with an emphasis on insights into the dynamic drivers of protein function.
Collapse
Affiliation(s)
- Cristina Lento
- Department of Chemistry, York University, Toronto, Ontario M3J 1P3, Canada
| | - Derek J Wilson
- Department of Chemistry, York University, Toronto, Ontario M3J 1P3, Canada
| |
Collapse
|
110
|
Woolfson DN. A Brief History of De Novo Protein Design: Minimal, Rational, and Computational. J Mol Biol 2021; 433:167160. [PMID: 34298061 DOI: 10.1016/j.jmb.2021.167160] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 07/07/2021] [Accepted: 07/12/2021] [Indexed: 12/26/2022]
Abstract
Protein design has come of age, but how will it mature? In the 1980s and the 1990s, the primary motivation for de novo protein design was to test our understanding of the informational aspect of the protein-folding problem; i.e., how does protein sequence determine protein structure and function? This necessitated minimal and rational design approaches whereby the placement of each residue in a design was reasoned using chemical principles and/or biochemical knowledge. At that time, though with some notable exceptions, the use of computers to aid design was not widespread. Over the past two decades, the tables have turned and computational protein design is firmly established. Here, I illustrate this progress through a timeline of de novo protein structures that have been solved to atomic resolution and deposited in the Protein Data Bank. From this, it is clear that the impact of rational and computational design has been considerable: More-complex and more-sophisticated designs are being targeted with many being resolved to atomic resolution. Furthermore, our ability to generate and manipulate synthetic proteins has advanced to a point where they are providing realistic alternatives to natural protein functions for applications both in vitro and in cells. Also, and increasingly, computational protein design is becoming accessible to non-specialists. This all begs the questions: Is there still a place for minimal and rational design approaches? And, what challenges lie ahead for the burgeoning field of de novo protein design as a whole?
Collapse
Affiliation(s)
- Derek N Woolfson
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Biomedical Sciences Building, University Walk, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK.
| |
Collapse
|
111
|
Mulnaes D, Golchin P, Koenig F, Gohlke H. TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning. J Chem Theory Comput 2021; 17:4599-4613. [PMID: 34161735 DOI: 10.1021/acs.jctc.1c00129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein domains are independent, functional, and stable structural units of proteins. Accurate protein domain boundary prediction plays an important role in understanding protein structure and evolution, as well as for protein structure prediction. Current domain boundary prediction methods differ in terms of boundary definition, methodology, and training databases resulting in disparate performance for different proteins. We developed TopDomain, an exhaustive metapredictor, that uses deep neural networks to combine multisource information from sequence- and homology-based features of over 50 primary predictors. For this purpose, we developed a new domain boundary data set termed the TopDomain data set, in which the true annotations are informed by SCOPe annotations, structural domain parsers, human inspection, and deep learning. We benchmark TopDomain against 2484 targets with 3354 boundaries from the TopDomain test set and achieve F1 scores of 78.4% and 73.8% for multidomain boundary prediction within ±20 residues and ±10 residues of the true boundary, respectively. When examined on targets from CASP11-13 competitions, TopDomain achieves F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly outperforms 15 widely used, state-of-the-art ab initio and homology-based domain boundary predictors. Finally, we implemented TopDomainTMC, which accurately predicts whether domain parsing is necessary for the target protein.
Collapse
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Pegah Golchin
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Filip Koenig
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, 40225 Düsseldorf, Germany.,John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
112
|
Xia YH, Peng CX, Zhou XG, Zhang GJ. A Sequential Niche Multimodal Conformational Sampling Algorithm for Protein Structure Prediction. Bioinformatics 2021; 37:4357-4365. [PMID: 34245242 DOI: 10.1093/bioinformatics/btab500] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/23/2021] [Accepted: 07/05/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Massive local minima on the protein energy landscape often cause traditional conformational sampling algorithms to be easily trapped in local basin regions, because they find it difficult to overcome high-energy barriers. Also, the lowest energy conformation may not correspond to the native structure due to the inaccuracy of energy models. This study investigates whether these two problems can be alleviated by a sequential niche technique without loss of accuracy. RESULTS A sequential niche multimodal conformational sampling algorithm for protein structure prediction (SNfold) is proposed in this study. In SNfold, a derating function is designed based on the knowledge learned from the previous sampling and used to construct a series of sampling-guided energy functions. These functions then help the sampling algorithm overcome high-energy barriers and avoid the re-sampling of the explored regions. In inaccurate protein energy models, the high-energy conformation that may correspond to the native structure can be sampled with successively updated sampling-guided energy functions. The proposed SNfold is tested on 300 benchmark proteins, 24 CASP13 and 19 CASP14 FM targets. Results show that SNfold correctly folds (TM-score ≥ 0.5) 231 out of 300 proteins. In particular, compared with Rosetta restrained by distance (Rosetta-dist), SNfold achieves higher average TM-score and improves the sampling efficiency by more than 100 times. On several CASP FM targets, SNfold also shows good performance compared with four state-of-the-art servers in CASP. As a plug-in conformational sampling algorithm, SNfold can be extended to other protein structure prediction methods. AVAILABILITY The source code and executable versions are freely available at https://github.com/iobio-zjut/SNfold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| |
Collapse
|
113
|
Lachance J, Matteau D, Brodeur J, Lloyd CJ, Mih N, King ZA, Knight TF, Feist AM, Monk JM, Palsson BO, Jacques P, Rodrigue S. Genome-scale metabolic modeling reveals key features of a minimal gene set. Mol Syst Biol 2021; 17:e10099. [PMID: 34288418 PMCID: PMC8290834 DOI: 10.15252/msb.202010099] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 06/18/2021] [Accepted: 06/22/2021] [Indexed: 12/19/2022] Open
Abstract
Mesoplasma florum, a fast-growing near-minimal organism, is a compelling model to explore rational genome designs. Using sequence and structural homology, the set of metabolic functions its genome encodes was identified, allowing the reconstruction of a metabolic network representing ˜ 30% of its protein-coding genes. Growth medium simplification enabled substrate uptake and product secretion rate quantification which, along with experimental biomass composition, were integrated as species-specific constraints to produce the functional iJL208 genome-scale model (GEM) of metabolism. Genome-wide expression and essentiality datasets as well as growth data on various carbohydrates were used to validate and refine iJL208. Discrepancies between model predictions and observations were mechanistically explained using protein structures and network analysis. iJL208 was also used to propose an in silico reduced genome. Comparing this prediction to the minimal cell JCVI-syn3.0 and its parent JCVI-syn1.0 revealed key features of a minimal gene set. iJL208 is a stepping-stone toward model-driven whole-genome engineering.
Collapse
Affiliation(s)
| | - Dominick Matteau
- Département de BiologieUniversité de SherbrookeSherbrookeQCCanada
| | - Joëlle Brodeur
- Département de BiologieUniversité de SherbrookeSherbrookeQCCanada
| | - Colton J Lloyd
- Department of BioengineeringUniversity of CaliforniaSan Diego, La JollaCAUSA
| | - Nathan Mih
- Department of BioengineeringUniversity of CaliforniaSan Diego, La JollaCAUSA
| | - Zachary A King
- Department of BioengineeringUniversity of CaliforniaSan Diego, La JollaCAUSA
| | | | - Adam M Feist
- Department of BioengineeringUniversity of CaliforniaSan Diego, La JollaCAUSA
- Department of PediatricsUniversity of CaliforniaSan Diego, La JollaCAUSA
| | - Jonathan M Monk
- Department of BioengineeringUniversity of CaliforniaSan Diego, La JollaCAUSA
| | - Bernhard O Palsson
- Department of BioengineeringUniversity of CaliforniaSan Diego, La JollaCAUSA
- Department of PediatricsUniversity of CaliforniaSan Diego, La JollaCAUSA
- Bioinformatics and Systems Biology ProgramUniversity of CaliforniaSan Diego, La JollaCAUSA
- Novo Nordisk Foundation Center for BiosustainabilityTechnical University of DenmarkLyngbyDenmark
| | | | | |
Collapse
|
114
|
Zhao KL, Liu J, Zhou XG, Su JZ, Zhang Y, Zhang GJ. MMpred: a distance-assisted multimodal conformation sampling for de novo protein structure prediction. Bioinformatics 2021; 37:4350-4356. [PMID: 34185079 DOI: 10.1093/bioinformatics/btab484] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 06/22/2021] [Accepted: 06/28/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. RESULTS A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages. In the first modal exploration stage, a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. In the second modal maintaining stage, an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on 320 non-redundant proteins, where MMpred obtains models with TM-score ≥ 0.5 on 268 cases, which is 20.3% higher than that of Rosetta guided with the same distance constraints. In addition, on 320 benchmark proteins, the average TM-score of the enhanced version of MMpred (E-MMpred) is 0.732 on the best model, which is comparable to trRosetta (0.730). AVAILABILITY The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, MI 48109-2218, USA
| | - Jian-Zhong Su
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325011, Zhejiang, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
115
|
Kamacioglu A, Tuncbag N, Ozlu N. Structural analysis of mammalian protein phosphorylation at a proteome level. Structure 2021; 29:1219-1229.e3. [PMID: 34192515 DOI: 10.1016/j.str.2021.06.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 04/07/2021] [Accepted: 06/04/2021] [Indexed: 10/21/2022]
Abstract
Phosphorylation is an essential post-translational modification for almost all cellular processes. Several global phosphoproteomics analyses have revealed phosphorylation profiles under different conditions. Beyond identification of phospho-sites, protein structures add another layer of information about their functionality. In this study, we systematically characterize phospho-sites based on their 3D locations in the protein and establish a location map for phospho-sites. More than 250,000 phospho-sites have been analyzed, of which 8,686 sites match at least one structure and are stratified based on their respective 3D positions. Core phospho-sites possess two distinct groups based on their dynamicity. Dynamic core phosphorylations are significantly more functional compared with static ones. The dynamic core and the interface phospho-sites are the most functional among all 3D phosphorylation groups. Our analysis provides global characterization and stratification of phospho-sites from a structural perspective that can be utilized for predicting functional relevance and filtering out false positives in phosphoproteomic studies.
Collapse
Affiliation(s)
- Altug Kamacioglu
- Department of Molecular Biology and Genetics, Koc University, Istanbul, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, 34450 Istanbul, Turkey; School of Medicine, Koc University, 34450 Istanbul, Turkey; Koc University Research Center for Translational Medicine (KUTTAM), 34450 Istanbul, Turkey.
| | - Nurhan Ozlu
- Department of Molecular Biology and Genetics, Koc University, Istanbul, Turkey; School of Medicine, Koc University, 34450 Istanbul, Turkey; Koc University Research Center for Translational Medicine (KUTTAM), 34450 Istanbul, Turkey.
| |
Collapse
|
116
|
Jiang H, Fan X. The Two-Step Clustering Approach for Metastable States Learning. Int J Mol Sci 2021; 22:6576. [PMID: 34205252 PMCID: PMC8233889 DOI: 10.3390/ijms22126576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 06/14/2021] [Accepted: 06/14/2021] [Indexed: 01/20/2023] Open
Abstract
Understanding the energy landscape and the conformational dynamics is crucial for studying many biological or chemical processes, such as protein-protein interaction and RNA folding. Molecular Dynamics (MD) simulations have been a major source of dynamic structure. Although many methods were proposed for learning metastable states from MD data, some key problems are still in need of further investigation. Here, we give a brief review on recent progresses in this field, with an emphasis on some popular methods belonging to a two-step clustering framework, and hope to draw more researchers to contribute to this area.
Collapse
Affiliation(s)
- Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou 310058, China;
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
117
|
Hou T, Bian Y, McGuire T, Xie XQ. Integrated Multi-Class Classification and Prediction of GPCR Allosteric Modulators by Machine Learning Intelligence. Biomolecules 2021; 11:biom11060870. [PMID: 34208096 PMCID: PMC8230833 DOI: 10.3390/biom11060870] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 05/30/2021] [Accepted: 06/08/2021] [Indexed: 01/01/2023] Open
Abstract
G-protein-coupled receptors (GPCRs) are the largest and most diverse group of cell surface receptors that respond to various extracellular signals. The allosteric modulation of GPCRs has emerged in recent years as a promising approach for developing target-selective therapies. Moreover, the discovery of new GPCR allosteric modulators can greatly benefit the further understanding of GPCR cell signaling mechanisms. It is critical but also challenging to make an accurate distinction of modulators for different GPCR groups in an efficient and effective manner. In this study, we focus on an 11-class classification task with 10 GPCR subtype classes and a random compounds class. We used a dataset containing 34,434 compounds with allosteric modulators collected from classical GPCR families A, B, and C, as well as random drug-like compounds. Six types of machine learning models, including support vector machine, naïve Bayes, decision tree, random forest, logistic regression, and multilayer perceptron, were trained using different combinations of features including molecular descriptors, Atom-pair fingerprints, MACCS fingerprints, and ECFP6 fingerprints. The performances of trained machine learning models with different feature combinations were closely investigated and discussed. To the best of our knowledge, this is the first work on the multi-class classification of GPCR allosteric modulators. We believe that the classification models developed in this study can be used as simple and accurate tools for the discovery and development of GPCR allosteric modulators.
Collapse
Affiliation(s)
- Tianling Hou
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (T.H.); (Y.B.); (T.M.)
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yuemin Bian
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (T.H.); (Y.B.); (T.M.)
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Terence McGuire
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (T.H.); (Y.B.); (T.M.)
- NIH National Center of Excellence for Computational Drug Abuse Research (CDAR), University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xiang-Qun Xie
- Department of Pharmaceutical Sciences, Computational Chemical Genomics Screen (CCGS) Center and Pharmacometrics System Pharmacology Program, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA; (T.H.); (Y.B.); (T.M.)
- Drug Discovery Institute, Departments of Computational Biology and of Structural Biology, University of Pittsburgh, Pittsburgh, PA 15261, USA
- Correspondence:
| |
Collapse
|
118
|
DNCON2_Inter: predicting interchain contacts for homodimeric and homomultimeric protein complexes using multiple sequence alignments of monomers and deep learning. Sci Rep 2021; 11:12295. [PMID: 34112907 PMCID: PMC8192766 DOI: 10.1038/s41598-021-91827-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/28/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning methods that achieved great success in predicting intrachain residue-residue contacts have been applied to predict interchain contacts between proteins. However, these methods require multiple sequence alignments (MSAs) of a pair of interacting proteins (dimers) as input, which are often difficult to obtain because there are not many known protein complexes available to generate MSAs of sufficient depth for a pair of proteins. In recognizing that multiple sequence alignments of a monomer that forms homomultimers contain the co-evolutionary signals of both intrachain and interchain residue pairs in contact, we applied DNCON2 (a deep learning-based protein intrachain residue-residue contact predictor) to predict both intrachain and interchain contacts for homomultimers using multiple sequence alignment (MSA) and other co-evolutionary features of a single monomer followed by discrimination of interchain and intrachain contacts according to the tertiary structure of the monomer. We name this tool DNCON2_Inter. Allowing true-positive predictions within two residue shifts, the best average precision was obtained for the Top-L/10 predictions of 22.9% for homodimers and 17.0% for higher-order homomultimers. In some instances, especially where interchain contact densities are high, DNCON2_Inter predicted interchain contacts with 100% precision. We also developed Con_Complex, a complex structure reconstruction tool that uses predicted contacts to produce the structure of the complex. Using Con_Complex, we show that the predicted contacts can be used to accurately construct the structure of some complexes. Our experiment demonstrates that monomeric multiple sequence alignments can be used with deep learning to predict interchain contacts of homomeric proteins.
Collapse
|
119
|
Gu Q, Kumar A, Bray S, Creason A, Khanteymoori A, Jalili V, Grüning B, Goecks J. Galaxy-ML: An accessible, reproducible, and scalable machine learning toolkit for biomedicine. PLoS Comput Biol 2021; 17:e1009014. [PMID: 34061826 PMCID: PMC8213174 DOI: 10.1371/journal.pcbi.1009014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 06/18/2021] [Accepted: 04/27/2021] [Indexed: 11/25/2022] Open
Abstract
Supervised machine learning is an essential but difficult to use approach in biomedical data analysis. The Galaxy-ML toolkit (https://galaxyproject.org/community/machine-learning/) makes supervised machine learning more accessible to biomedical scientists by enabling them to perform end-to-end reproducible machine learning analyses at large scale using only a web browser. Galaxy-ML extends Galaxy (https://galaxyproject.org), a biomedical computational workbench used by tens of thousands of scientists across the world, with a suite of tools for all aspects of supervised machine learning.
Collapse
Affiliation(s)
- Qiang Gu
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
- The Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Anup Kumar
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Simon Bray
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Allison Creason
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
- The Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Alireza Khanteymoori
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Vahid Jalili
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
- The Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Jeremy Goecks
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
- The Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
- * E-mail:
| |
Collapse
|
120
|
Shao J, Yan K, Liu B. FoldRec-C2C: protein fold recognition by combining cluster-to-cluster model and protein similarity network. Brief Bioinform 2021; 22:5873289. [PMID: 32685972 PMCID: PMC7454262 DOI: 10.1093/bib/bbaa144] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 05/26/2020] [Accepted: 06/11/2020] [Indexed: 12/27/2022] Open
Abstract
As a key for studying the protein structures, protein fold recognition is playing an important role in predicting the protein structures associated with COVID-19 and other important structures. However, the existing computational predictors only focus on the protein pairwise similarity or the similarity between two groups of proteins from 2-folds. However, the homology relationship among proteins is in a hierarchical structure. The global protein similarity network will contribute to the performance improvement. In this study, we proposed a predictor called FoldRec-C2C to globally incorporate the interactions among proteins into the prediction. For the FoldRec-C2C predictor, protein fold recognition problem is treated as an information retrieval task in nature language processing. The initial ranking results were generated by a surprised ranking algorithm Learning to Rank, and then three re-ranking algorithms were performed on the ranking lists to adjust the results globally based on the protein similarity network, including seq-to-seq model, seq-to-cluster model and cluster-to-cluster model (C2C). When tested on a widely used and rigorous benchmark dataset LINDAHL dataset, FoldRec-C2C outperforms other 34 state-of-the-art methods in this field. The source code and data of FoldRec-C2C can be downloaded from http://bliulab.net/FoldRec-C2C/download.
Collapse
Affiliation(s)
- Jiangyi Shao
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Ke Yan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
121
|
Ghorbani A, Quinlan EM, Larijani M. Evolutionary Comparative Analyses of DNA-Editing Enzymes of the Immune System: From 5-Dimensional Description of Protein Structures to Immunological Insights and Applications to Protein Engineering. Front Immunol 2021; 12:642343. [PMID: 34135887 PMCID: PMC8201067 DOI: 10.3389/fimmu.2021.642343] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/06/2021] [Indexed: 01/02/2023] Open
Abstract
The immune system is unique among all biological sub-systems in its usage of DNA-editing enzymes to introduce targeted gene mutations and double-strand DNA breaks to diversify antigen receptor genes and combat viral infections. These processes, initiated by specific DNA-editing enzymes, often result in mistargeted induction of genome lesions that initiate and drive cancers. Like other molecules involved in human health and disease, the DNA-editing enzymes of the immune system have been intensively studied in humans and mice, with little attention paid (< 1% of published studies) to the same enzymes in evolutionarily distant species. Here, we present a systematic review of the literature on the characterization of one such DNA-editing enzyme, activation-induced cytidine deaminase (AID), from an evolutionary comparative perspective. The central thesis of this review is that although the evolutionary comparative approach represents a minuscule fraction of published works on this and other DNA-editing enzymes, this approach has made significant impacts across the fields of structural biology, immunology, and cancer research. Using AID as an example, we highlight the value of the evolutionary comparative approach in discoveries already made, and in the context of emerging directions in immunology and protein engineering. We introduce the concept of 5-dimensional (5D) description of protein structures, a more nuanced view of a structure that is made possible by evolutionary comparative studies. In this higher dimensional view of a protein's structure, the classical 3-dimensional (3D) structure is integrated in the context of real-time conformations and evolutionary time shifts (4th dimension) and the relevance of these dynamics to its biological function (5th dimension).
Collapse
Affiliation(s)
- Atefeh Ghorbani
- Program in Immunology and Infectious Diseases, Department of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL, Canada
- Department of Molecular Biology and Biochemistry, Faculty of Science, Simon Fraser University, Burnaby, BC, Canada
| | - Emma M. Quinlan
- Program in Immunology and Infectious Diseases, Department of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL, Canada
| | - Mani Larijani
- Program in Immunology and Infectious Diseases, Department of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, NL, Canada
- Department of Molecular Biology and Biochemistry, Faculty of Science, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
122
|
Machine learning in protein structure prediction. Curr Opin Chem Biol 2021; 65:1-8. [PMID: 34015749 DOI: 10.1016/j.cbpa.2021.04.005] [Citation(s) in RCA: 121] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 04/10/2021] [Indexed: 12/31/2022]
Abstract
Prediction of protein structure from sequence has been intensely studied for many decades, owing to the problem's importance and its uniquely well-defined physical and computational bases. While progress has historically ebbed and flowed, the past two years saw dramatic advances driven by the increasing "neuralization" of structure prediction pipelines, whereby computations previously based on energy models and sampling procedures are replaced by neural networks. The extraction of physical contacts from the evolutionary record; the distillation of sequence-structure patterns from known structures; the incorporation of templates from homologs in the Protein Databank; and the refinement of coarsely predicted structures into finely resolved ones have all been reformulated using neural networks. Cumulatively, this transformation has resulted in algorithms that can now predict single protein domains with a median accuracy of 2.1 Å, setting the stage for a foundational reconfiguration of the role of biomolecular modeling within the life sciences.
Collapse
|
123
|
Pirklbauer G, Stieger CE, Matzinger M, Winkler S, Mechtler K, Dorfer V. MS Annika: A New Cross-Linking Search Engine. J Proteome Res 2021; 20:2560-2569. [PMID: 33852321 PMCID: PMC8155564 DOI: 10.1021/acs.jproteome.0c01000] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Indexed: 11/30/2022]
Abstract
Cross-linking mass spectrometry (XL-MS) has become a powerful technique that enables insights into protein structures and protein interactions. The development of cleavable cross-linkers has further promoted XL-MS through search space reduction, thereby allowing for proteome-wide studies. These new analysis possibilities foster the development of new cross-linkers, which not every search engine can deal with out of the box. In addition, some search engines for XL-MS data also struggle with the validation of identified cross-linked peptides, that is, false discovery rate (FDR) estimation, as FDR calculation is hampered by the fact that not only one but two peptides in a single spectrum have to be correct. We here present our new search engine, MS Annika, which can identify cross-linked peptides in MS2 spectra from a wide variety of cleavable cross-linkers. We show that MS Annika provides realistic estimates of FDRs without the need of arbitrary score cutoffs, being able to provide on average 44% more identifications at a similar or better true FDR than comparable tools. In addition, MS Annika can be used on proteome-wide studies due to fast, parallelized processing and provides a way to visualize the identified cross-links in protein 3D structures.
Collapse
Affiliation(s)
- Georg
J. Pirklbauer
- University
of Applied Sciences Upper Austria, Bioinformatics
Research Group, Softwarepark
11, 4232 Hagenberg, Austria
| | - Christian E. Stieger
- Institute
of Molecular Pathology (IMP), Vienna BioCenter
(VBC), Campus-Vienna-Biocenter
1, 1030 Vienna, Austria
- Chemical
Biology Department Leibniz-Forschungsinstitut für Molekulare
Pharmakologie (FMP), Robert-Rössle-Strasse 10, 13125 Berlin, Germany
| | - Manuel Matzinger
- Institute
of Molecular Pathology (IMP), Vienna BioCenter
(VBC), Campus-Vienna-Biocenter
1, 1030 Vienna, Austria
| | - Stephan Winkler
- University
of Applied Sciences Upper Austria, Bioinformatics
Research Group, Softwarepark
11, 4232 Hagenberg, Austria
| | - Karl Mechtler
- Institute
of Molecular Pathology (IMP), Vienna BioCenter
(VBC), Campus-Vienna-Biocenter
1, 1030 Vienna, Austria
- Institute
of Molecular Biotechnology (IMBA), Austrian Academy of Sciences, Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
- Gregor
Mendel Institute (GMI), Austrian Academy of Sciences, Vienna BioCenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Viktoria Dorfer
- University
of Applied Sciences Upper Austria, Bioinformatics
Research Group, Softwarepark
11, 4232 Hagenberg, Austria
| |
Collapse
|
124
|
Vedithi SC, Malhotra S, Acebrón-García-de-Eulate M, Matusevicius M, Torres PHM, Blundell TL. Structure-Guided Computational Approaches to Unravel Druggable Proteomic Landscape of Mycobacterium leprae. Front Mol Biosci 2021; 8:663301. [PMID: 34026836 PMCID: PMC8138464 DOI: 10.3389/fmolb.2021.663301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open
Abstract
Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.
Collapse
Affiliation(s)
- Sundeep Chaitanya Vedithi
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Sundeep Chaitanya Vedithi,
| | - Sony Malhotra
- Rutherford Appleton Laboratory, Science and Technology Facilities Council, Oxon, United Kingdom
| | | | | | - Pedro Henrique Monteiro Torres
- Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,Tom L. Blundell,
| |
Collapse
|
125
|
Wang ZZ, Shi XX, Huang GY, Hao GF, Yang GF. Fragment-based drug design facilitates selective kinase inhibitor discovery. Trends Pharmacol Sci 2021; 42:551-565. [PMID: 33958239 DOI: 10.1016/j.tips.2021.04.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 03/30/2021] [Accepted: 04/07/2021] [Indexed: 12/16/2022]
Abstract
Protein kinases (PKs) are important drug targets, but kinases selectivity poses a challenge to protein kinase inhibitors (PKIs) design. Fragment-based drug discovery (FBDD) has achieved great success in the discovery of highly specific PKIs. It makes full use of kinase-fragment interaction in target kinase subpockets to obtain promising selectivity. However, it's difficult to understand the complicated kinase-fragment interaction space, and systemic discussion of these interactions is still lacking. Herein, we introduce the advantages of the FBDD strategy in PKIs design. Key features of the selectivity of kinase-fragment interactions are summarized and analyzed. Some promising PKIs are introduced as case studies to help understand the fragment-to-lead (F2L) optimization process. Novel strategies and technologies for FBDD in PKIs discovery are also outlooked.
Collapse
Affiliation(s)
- Zhi-Zheng Wang
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, 430079, China; International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Xing-Xing Shi
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, 430079, China; International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Guang-Yi Huang
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, 430079, China; International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Ge-Fei Hao
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, 430079, China; International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China; State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China.
| | - Guang-Fu Yang
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, 430079, China; International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
126
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 158] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
127
|
Gaur NK, Goyal VD, Kulkarni K, Makde RD. Machine learning classifiers aid virtual screening for efficient design of mini-protein therapeutics. Bioorg Med Chem Lett 2021; 38:127852. [PMID: 33609660 DOI: 10.1016/j.bmcl.2021.127852] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 02/01/2021] [Accepted: 02/05/2021] [Indexed: 11/15/2022]
Abstract
De novo design of mini-proteins (4-12 kDa) has recently been shown to produce new candidates for protein therapeutics. They are temperature stable molecules that bind to the drug target with high affinity for inhibiting its interactions. The development of mini-protein binders requires laboratory screening of tens of thousands of molecules for effective target binding. In this study we trained machine learning classifiers which can distinguish, with 90% accuracy and 80% precision, mini-protein binders from non-binding molecules designed for a particular target; this significantly reduces the number of mini protein candidates for experimental screening. Further, on the basis of our results we propose a multi-stage protocol where a small dataset (few hundred experimentally verified target-specific mini-proteins) can be used to train classifiers for improving the efficiency of mini-protein design for any specific target.
Collapse
Affiliation(s)
- Neeraj K Gaur
- Beamline Development and Application Section, Bhabha Atomic Research Centre, Mumbai 400085, India; Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune 411008, India.
| | - Venuka Durani Goyal
- Beamline Development and Application Section, Bhabha Atomic Research Centre, Mumbai 400085, India
| | - Kiran Kulkarni
- Division of Biochemical Sciences, CSIR-National Chemical Laboratory, Pune 411008, India; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Ravindra D Makde
- Beamline Development and Application Section, Bhabha Atomic Research Centre, Mumbai 400085, India
| |
Collapse
|
128
|
Werner M, Gapsys V, de Groot BL. One Plus One Makes Three: Triangular Coupling of Correlated Amino Acid Mutations. J Phys Chem Lett 2021; 12:3195-3201. [PMID: 33760609 PMCID: PMC8041375 DOI: 10.1021/acs.jpclett.1c00380] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 03/17/2021] [Indexed: 06/12/2023]
Abstract
Correlated mutations have played a pivotal role in the recent success in protein fold prediction. Understanding nonadditive effects of mutations is crucial for altering protein structure, as mutations of multiple residues may change protein stability or binding affinity in a manner unforeseen by the investigation of single mutants. While the couplings between amino acids can be inferred from homologous protein sequences, the physical mechanisms underlying these correlations remain elusive. In this work we demonstrate that calculations based on the first-principles of statistical mechanics are capable of capturing the effects of nonadditivities in protein mutations. The identified thermodynamic couplings cover the short-range as well as previously unknown long-range correlations. We further explore a set of mutations in staphyloccocal nuclease to unravel an intricate interaction pathway underlying the correlations between amino acid mutations.
Collapse
Affiliation(s)
- Martin Werner
- Computational
Biomolecular Dynamics Group, Max-Planck
Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Vytautas Gapsys
- Computational
Biomolecular Dynamics Group, Max-Planck
Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Bert L. de Groot
- Computational
Biomolecular Dynamics Group, Max-Planck
Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| |
Collapse
|
129
|
Waman VP, Sen N, Varadi M, Daina A, Wodak SJ, Zoete V, Velankar S, Orengo C. The impact of structural bioinformatics tools and resources on SARS-CoV-2 research and therapeutic strategies. Brief Bioinform 2021; 22:742-768. [PMID: 33348379 PMCID: PMC7799268 DOI: 10.1093/bib/bbaa362] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 11/06/2020] [Accepted: 11/09/2020] [Indexed: 01/18/2023] Open
Abstract
SARS-CoV-2 is the causative agent of COVID-19, the ongoing global pandemic. It has posed a worldwide challenge to human health as no effective treatment is currently available to combat the disease. Its severity has led to unprecedented collaborative initiatives for therapeutic solutions against COVID-19. Studies resorting to structure-based drug design for COVID-19 are plethoric and show good promise. Structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species. The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design. Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics. Here, we provide an overview of ongoing efforts on developing structural bioinformatics tools and resources for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 infection and therapeutic development. These include (i) understanding differences between SARS-CoV-2 and SARS-CoV, leading to increased infectivity of SARS-CoV-2, (ii) deciphering key residues in the SARS-CoV-2 involved in receptor-antibody recognition, (iii) analysis of variants in host proteins that affect host susceptibility to infection and (iv) analyses facilitating structure-based drug and vaccine design against SARS-CoV-2.
Collapse
Affiliation(s)
| | | | | | - Antoine Daina
- Molecular Modeling Group at SIB, Swiss Institute of Bioinformatics
| | | | - Vincent Zoete
- Department of Fundamental Oncology at the University of Lausanne and Group leader at SIB
| | | | | |
Collapse
|
130
|
Zhou H, Cao H, Skolnick J. FRAGSITE: A Fragment-Based Approach for Virtual Ligand Screening. J Chem Inf Model 2021; 61:2074-2089. [PMID: 33724022 DOI: 10.1021/acs.jcim.0c01160] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
To reduce time and cost, virtual ligand screening (VLS) often precedes experimental ligand screening in modern drug discovery. Traditionally, high-resolution structure-based docking approaches rely on experimental structures, while ligand-based approaches need known binders to the target protein and only explore their nearby chemical space. In contrast, our structure-based FINDSITEcomb2.0 approach takes advantage of predicted, low-resolution structures and information from ligands that bind distantly related proteins whose binding sites are similar to the target protein. Using a boosted tree regression machine learning framework, we significantly improved FINDSITEcomb2.0 by integrating ligand fragment scores as encoded by molecular fingerprints with the global ligand similarity scores of FINDSITEcomb2.0. The new approach, FRAGSITE, exploits our observation that ligand fragments, e.g., rings, tend to interact with stereochemically conserved protein subpockets that also occur in evolutionarily unrelated proteins. FRAGSITE was benchmarked on the 102 protein DUD-E set, where any template protein whose sequence identify >30% to the target was excluded. Within the top 100 ranked molecules, FRAGSITE improves VLS precision and recall by 14.3 and 18.5%, respectively, relative to FINDSITEcomb2.0. Moreover, the mean top 1% enrichment factor increases from 25.2 to 30.2. On average, both outperform state-of-the-art deep learning-based methods such as AtomNet. On the more challenging unbiased set LIT-PCBA, FRAGSITE also shows better performance than ligand similarity-based and docking approaches such as two-dimensional ECFP4 and Surflex-Dock v.3066. On a subset of 23 targets from DEKOIS 2.0, FRAGSITE shows much better performance than the boosted tree regression-based, vScreenML scoring function. Experimental testing of FRAGSITE's predictions shows that it has more hits and covers a more diverse region of chemical space than FINDSITEcomb2.0. For the two proteins that were experimentally tested, DHFR, a well-studied protein that catalyzes the conversion of dihydrofolate to tetrahydrofolate, and the kinase ACVR1, FRAGSITE identified new small-molecule nanomolar binders. Interestingly, one new binder of DHFR is a kinase inhibitor predicted to bind in a new subpocket. For ACVR1, FRAGSITE identified new molecules that have diverse scaffolds and estimated nanomolar to micromolar affinities. Thus, FRAGSITE shows significant improvement over prior state-of-the-art ligand virtual screening approaches. A web server is freely available for academic users at http:/sites.gatech.edu/cssb/FRAGSITE.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| | - Hongnan Cao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, NW, Atlanta, Georgia 30332-2000, United States
| |
Collapse
|
131
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
132
|
Lim S, Lu Y, Cho CY, Sung I, Kim J, Kim Y, Park S, Kim S. A review on compound-protein interaction prediction methods: Data, format, representation and model. Comput Struct Biotechnol J 2021; 19:1541-1556. [PMID: 33841755 PMCID: PMC8008185 DOI: 10.1016/j.csbj.2021.03.004] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 02/28/2021] [Accepted: 03/01/2021] [Indexed: 01/27/2023] Open
Abstract
There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality. Computational methods have become powerful ever to analyze such complex the data. Thus, recent successes in the improved quality of CPI prediction are due to use of both sophisticated computational techniques and higher quality information in the databases. The goal of this article is to provide reviews of topics related to CPI, such as data, format, representation, to computational models, so that researchers can take full advantages of these resources to develop novel prediction methods. Chemical compounds and protein data from various resources were discussed in terms of data formats and encoding schemes. For the CPI methods, we grouped prediction methods into five categories from traditional machine learning techniques to state-of-the-art deep learning techniques. In closing, we discussed emerging machine learning topics to help both experimental and computational scientists leverage the current knowledge and strategies to develop more powerful and accurate CPI prediction methods.
Collapse
Affiliation(s)
- Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea
| | - Yijingxiu Lu
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Chang Yun Cho
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
| | - Inyoung Sung
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
| | - Jungwoo Kim
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Youngkuk Kim
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sungjoon Park
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
| | - Sun Kim
- Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea
- Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
- Institute of Engineering Research, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, College of Natural Sciences, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
133
|
Wang B, Su Z, Wu Y. Characterizing the function of domain linkers in regulating the dynamics of multi-domain fusion proteins by microsecond molecular dynamics simulations and artificial intelligence. Proteins 2021; 89:884-895. [PMID: 33620752 DOI: 10.1002/prot.26066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 01/20/2021] [Accepted: 02/20/2021] [Indexed: 11/12/2022]
Abstract
Multi-domain proteins are not only formed through natural evolution but can also be generated by recombinant DNA technology. Because many fusion proteins can enhance the selectivity of cell targeting, these artificially produced molecules, called multi-specific biologics, are promising drug candidates, especially for immunotherapy. Moreover, the rational design of domain linkers in fusion proteins is becoming an essential step toward a quantitative understanding of the dynamics in these biopharmaceutics. We developed a computational framework to characterize the impacts of peptide linkers on the dynamics of multi-specific biologics. Specifically, we first constructed a benchmark containing six types of linkers that represent various lengths and degrees of flexibility and used them to connect two natural proteins as a test system. We then projected the microsecond dynamics of these proteins generated from Anton onto a coarse-grained conformational space. We further analyzed the similarity of dynamics among different proteins in this low-dimensional space by a neural-network-based classification model. Finally, we applied hierarchical clustering to place linkers into different subgroups based on the classification results. The clustering results suggest that the length of linkers, which is used to spatially separate different functional modules, plays the most important role in regulating the dynamics of this fusion protein. Given the same number of amino acids, linker flexibility functions as a regulator of protein dynamics. In summary, we illustrated that a new computational strategy can be used to study the dynamics of multi-domain fusion proteins by a combination of long timescale molecular dynamics simulation, coarse-grained feature extraction, and artificial intelligence.
Collapse
Affiliation(s)
- Bo Wang
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Zhaoqian Su
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
134
|
Ayoub R, Lee Y. Protein structure search to support the development of protein structure prediction methods. Proteins 2021; 89:648-658. [PMID: 33458852 DOI: 10.1002/prot.26048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 12/14/2020] [Accepted: 12/31/2020] [Indexed: 11/06/2022]
Abstract
Protein structure prediction is a long-standing unsolved problem in molecular biology that has seen renewed interest with the recent success of deep learning with AlphaFold at CASP13. While developing and evaluating protein structure prediction methods, researchers may want to identify the most similar known structures to their predicted structures. These predicted structures often have low sequence and structure similarity to known structures. We show how RUPEE, a purely geometric protein structure search, is able to identify the structures most similar to structure predictions, regardless of how they vary from known structures, something existing protein structure searches struggle with. RUPEE accomplishes this through the use of a novel linear encoding of protein structures as a sequence of residue descriptors. Using a fast Needleman-Wunsch algorithm, RUPEE is able to perform alignments on the sequences of residue descriptors for every available structure. This is followed by a series of increasingly accurate structure alignments from TM-align alignments initialized with the Needleman-Wunsch residue descriptor alignments to standard TM-align alignments of the final results. By using alignment normalization effectively at each stage, RUPEE also can execute containment searches in addition to full-length searches to identify structural motifs within proteins. We compare the results of RUPEE to the protein structure searches mTM-align, SSM, CATHEDRAL, and VAST using a benchmark derived from the protein structure predictions submitted to CASP13. RUPEE identifies better alignments on average with respect to TM-score as well as scores specific to SSM and CATHEDRAL, Q-score and SSAP-score, respectively.
Collapse
Affiliation(s)
- Ronald Ayoub
- School of Computing and Engineering, University of Missouri at Kansas City, Kansas City, Missouri, USA
| | - Yugyung Lee
- School of Computing and Engineering, University of Missouri at Kansas City, Kansas City, Missouri, USA
| |
Collapse
|
135
|
Yudenko A, Smolentseva A, Maslov I, Semenov O, Goncharov IM, Nazarenko VV, Maliar NL, Borshchevskiy V, Gordeliy V, Remeeva A, Gushchin I. Rational Design of a Split Flavin-Based Fluorescent Reporter. ACS Synth Biol 2021; 10:72-83. [PMID: 33325704 DOI: 10.1021/acssynbio.0c00454] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Protein-fragment complementation assays are used ubiquitously for probing protein-protein interactions. Most commonly, the reporter protein is split in two parts, which are then fused to the proteins of interest and can reassemble and provide a readout if the proteins of interest interact with each other. The currently known split fluorescent proteins either can be used only in aerobic conditions and assemble irreversibly, or require addition of exogenous chromophores, which complicates the design of experiments. In recent years, light-oxygen-voltage (LOV) domains of several photoreceptor proteins have been developed into flavin-based fluorescent proteins (FbFPs) that, under some circumstances, can outperform commonly used fluorescent proteins such as GFP. Here, we show that CagFbFP, a small thermostable FbFP based on a LOV domain-containing protein from Chloroflexus aggregans, can serve as a split fluorescent reporter. We use the available genetic and structural information to identify three loops between the conserved secondary structure elements, Aβ-Bβ, Eα-Fα, and Hβ-Iβ, that tolerate insertion of flexible poly-Gly/Ser segments and eventually splitting. We demonstrate that the designed split pairs, when fused to interacting proteins, are fluorescent in vivo in E. coli and human cells and have low background fluorescence. Our results enable probing protein-protein interactions in anaerobic conditions without using exogenous fluorophores and provide a basis for further development of LOV and PAS (Per-Arnt-Sim) domain-based fluorescent reporters and optogenetic tools.
Collapse
Affiliation(s)
- Anna Yudenko
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Anastasia Smolentseva
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Ivan Maslov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Oleg Semenov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Ivan M. Goncharov
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Vera V. Nazarenko
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Nina L. Maliar
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Valentin Borshchevskiy
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Valentin Gordeliy
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
- Institut de Biologie Structurale J.-P. Ebel, Université Grenoble Alpes-CEA-CNRS, 38044 Grenoble, France
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry), Forschungszentrum Jülich, 52425 Jülich, Germany
- JuStruct: Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425 Jülich, Germany
| | - Alina Remeeva
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Ivan Gushchin
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| |
Collapse
|
136
|
Neuwald AF. Reflections on the quest to obtain biological information from genomic data. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
137
|
Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020; 1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence → structure → function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.
Collapse
Affiliation(s)
- Wenhao Gao
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
138
|
Ziegler SJ, Mallinson SJ, St. John PC, Bomble YJ. Advances in integrative structural biology: Towards understanding protein complexes in their cellular context. Comput Struct Biotechnol J 2020; 19:214-225. [PMID: 33425253 PMCID: PMC7772369 DOI: 10.1016/j.csbj.2020.11.052] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Revised: 11/25/2020] [Accepted: 11/28/2020] [Indexed: 01/26/2023] Open
Abstract
Microorganisms rely on protein interactions to transmit signals, react to stimuli, and grow. One of the best ways to understand these protein interactions is through structural characterization. However, in the past, structural knowledge was limited to stable, high-affinity complexes that could be crystallized. Recent developments in structural biology have revolutionized how protein interactions are characterized. The combination of multiple techniques, known as integrative structural biology, has provided insight into how large protein complexes interact in their native environment. In this mini-review, we describe the past, present, and potential future of integrative structural biology as a tool for characterizing protein interactions in their cellular context.
Collapse
Key Words
- CLEM, correlated light and electron microscopy
- Crosslinking mass spectrometry
- Cryo-electron microscopy
- Cryo-electron tomography
- EPR, electron paramagnetic resonance
- FRET, Forster resonance energy transfer
- ISB, Integrative structural biology
- Integrative structural biology
- ML, machine learning
- MR, molecular replacement
- MSAs, multiple sequence alignments
- MX, macromolecular crystallography
- NMR, nuclear magnetic resonance
- PDB, Protein Data Bank
- Protein docking
- Protein structure prediction
- Quinary interactions
- SAD, single-wavelength anomalous dispersion
- SANS, small angle neutron scattering
- SAXS, small angle X-ray scattering
- X-ray crystallography
- XL-MS, cross-linking mass spectrometry
- cryo-EM SPA, cryo-EM single particle analysis
- cryo-EM, cryo-electron microscopy
- cryo-ET, cryo-electron tomography
Collapse
Affiliation(s)
- Samantha J. Ziegler
- Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO 80401, USA
| | - Sam J.B. Mallinson
- Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO 80401, USA
| | - Peter C. St. John
- Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO 80401, USA
| | - Yannick J. Bomble
- Biosciences Center, National Renewable Energy Laboratory, 15013 Denver West Parkway, Golden, CO 80401, USA
| |
Collapse
|
139
|
Chen M, Chen X, Jin S, Lu W, Lin X, Wolynes PG. Protein Structure Refinement Guided by Atomic Packing Frustration Analysis. J Phys Chem B 2020; 124:10889-10898. [PMID: 32931278 DOI: 10.1021/acs.jpcb.0c06719] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Recent advances in machine learning, bioinformatics, and the understanding of the folding problem have enabled efficient predictions of protein structures with moderate accuracy, even for targets where there is little information from templates. All-atom molecular dynamics simulations provide a route to refine such predicted structures, but unguided atomistic simulations, even when lengthy in time, often fail to eliminate incorrect structural features that would prevent the structure from becoming more energetically favorable owing to the necessity of making large scale motions and to overcoming energy barriers for side chain repacking. In this study, we show that localizing packing frustration at atomic resolution by examining the statistics of the energetic changes that occur when the local environment of a site is changed allows one to identify the most likely locations of incorrect contacts. The global statistics of atomic resolution frustration in structures that have been predicted using various algorithms provide strong indicators of structural quality when tested over a database of 20 targets from previous CASP experiments. Residues that are more correctly located turn out to be more minimally frustrated than more poorly positioned sites. These observations provide a diagnosis of both global and local quality of predicted structures and thus can be used as guidance in all-atom refinement simulations of the 20 targets. Refinement simulations guided by atomic packing frustration turn out to be quite efficient and significantly improve the quality of the structures.
Collapse
Affiliation(s)
- Mingchen Chen
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Xun Chen
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | - Shikai Jin
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States
| | - Wei Lu
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Physics and Astronomy, Rice University, Houston, Texas 77030, United States
| | - Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States.,Department of Physics and Astronomy, Rice University, Houston, Texas 77030, United States
| |
Collapse
|
140
|
Buckner C. Understanding adversarial examples requires a theory of artefacts for deep learning. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-00266-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
141
|
Lawson CE, Martí JM, Radivojevic T, Jonnalagadda SVR, Gentz R, Hillson NJ, Peisert S, Kim J, Simmons BA, Petzold CJ, Singer SW, Mukhopadhyay A, Tanjore D, Dunn JG, Garcia Martin H. Machine learning for metabolic engineering: A review. Metab Eng 2020; 63:34-60. [PMID: 33221420 DOI: 10.1016/j.ymben.2020.10.005] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/22/2020] [Accepted: 10/31/2020] [Indexed: 12/14/2022]
Abstract
Machine learning provides researchers a unique opportunity to make metabolic engineering more predictable. In this review, we offer an introduction to this discipline in terms that are relatable to metabolic engineers, as well as providing in-depth illustrative examples leveraging omics data and improving production. We also include practical advice for the practitioner in terms of data management, algorithm libraries, computational resources, and important non-technical issues. A variety of applications ranging from pathway construction and optimization, to genetic editing optimization, cell factory testing, and production scale-up are discussed. Moreover, the promising relationship between machine learning and mechanistic models is thoroughly reviewed. Finally, the future perspectives and most promising directions for this combination of disciplines are examined.
Collapse
Affiliation(s)
- Christopher E Lawson
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA
| | - Jose Manuel Martí
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Tijana Radivojevic
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Sai Vamshi R Jonnalagadda
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Reinhard Gentz
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Nathan J Hillson
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Sean Peisert
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; University of California Davis, Davis, CA, 95616, USA
| | - Joonhoon Kim
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA; Pacific Northwest National Laboratory, Richland, 99354, WA, USA
| | - Blake A Simmons
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Christopher J Petzold
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA
| | - Steven W Singer
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA
| | - Aindrila Mukhopadhyay
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, USA
| | - Deepti Tanjore
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Advanced Biofuels and Bioproducts Process Development Unit, Emeryville, CA, 94608, USA
| | | | - Hector Garcia Martin
- Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; Joint BioEnergy Institute, Emeryville, CA, 94608, USA; DOE Agile BioFoundry, Emeryville, CA, 94608, USA; Basque Center for Applied Mathematics, 48009, Bilbao, Spain; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, USA.
| |
Collapse
|
142
|
Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020; 18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open
Abstract
Homology modeling is a method for building protein 3D structures using protein primary sequence and utilizing prior knowledge gained from structural similarities with other proteins. The homology modeling process is done in sequential steps where sequence/structure alignment is optimized, then a backbone is built and later, side-chains are added. Once the low-homology loops are modeled, the whole 3D structure is optimized and validated. In the past three decades, a few collective and collaborative initiatives allowed for continuous progress in both homology and ab initio modeling. Critical Assessment of protein Structure Prediction (CASP) is a worldwide community experiment that has historically recorded the progress in this field. Folding@Home and Rosetta@Home are examples of crowd-sourcing initiatives where the community is sharing computational resources, whereas RosettaCommons is an example of an initiative where a community is sharing a codebase for the development of computational algorithms. Foldit is another initiative where participants compete with each other in a protein folding video game to predict 3D structure. In the past few years, contact maps deep machine learning was introduced to the 3D structure prediction process, adding more information and increasing the accuracy of models significantly. In this review, we will take the reader in a journey of exploration from the beginnings to the most recent turnabouts, which have revolutionized the field of homology modeling. Moreover, we discuss the new trends emerging in this rapidly growing field.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, CZ-613 00 Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Purkynova 656/123, 612 00 Brno, Czech Republic
| |
Collapse
|
143
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
144
|
Neelamraju S, Wales DJ, Gosavi S. Protein energy landscape exploration with structure-based models. Curr Opin Struct Biol 2020; 64:145-151. [DOI: 10.1016/j.sbi.2020.07.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/30/2020] [Accepted: 07/15/2020] [Indexed: 12/11/2022]
|
145
|
Rashid MBMA. Artificial Intelligence Effecting a Paradigm Shift in Drug Development. SLAS Technol 2020; 26:3-15. [PMID: 32940124 DOI: 10.1177/2472630320956931] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The inverse relationship between the cost of drug development and the successful integration of drugs into the market has resulted in the need for innovative solutions to overcome this burgeoning problem. This problem could be attributed to several factors, including the premature termination of clinical trials, regulatory factors, or decisions made in the earlier drug development processes. The introduction of artificial intelligence (AI) to accelerate and assist drug development has resulted in cheaper and more efficient processes, ultimately improving the success rates of clinical trials. This review aims to showcase and compare the different applications of AI technology that aid automation and improve success in drug development, particularly in novel drug target identification and design, drug repositioning, biomarker identification, and effective patient stratification, through exploration of different disease landscapes. In addition, it will also highlight how these technologies are translated into the clinic. This paradigm shift will lead to even greater advancements in the integration of AI in automating processes within drug development and discovery, enabling the probability and reality of attaining future precision and personalized medicine.
Collapse
|
146
|
Xu S, Yang K, Li R, Zhang L. mRNA Vaccine Era-Mechanisms, Drug Platform and Clinical Prospection. Int J Mol Sci 2020; 21:E6582. [PMID: 32916818 PMCID: PMC7554980 DOI: 10.3390/ijms21186582] [Citation(s) in RCA: 203] [Impact Index Per Article: 40.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 08/26/2020] [Accepted: 08/30/2020] [Indexed: 12/14/2022] Open
Abstract
Messenger ribonucleic acid (mRNA)-based drugs, notably mRNA vaccines, have been widely proven as a promising treatment strategy in immune therapeutics. The extraordinary advantages associated with mRNA vaccines, including their high efficacy, a relatively low severity of side effects, and low attainment costs, have enabled them to become prevalent in pre-clinical and clinical trials against various infectious diseases and cancers. Recent technological advancements have alleviated some issues that hinder mRNA vaccine development, such as low efficiency that exist in both gene translation and in vivo deliveries. mRNA immunogenicity can also be greatly adjusted as a result of upgraded technologies. In this review, we have summarized details regarding the optimization of mRNA vaccines, and the underlying biological mechanisms of this form of vaccines. Applications of mRNA vaccines in some infectious diseases and cancers are introduced. It also includes our prospections for mRNA vaccine applications in diseases caused by bacterial pathogens, such as tuberculosis. At the same time, some suggestions for future mRNA vaccine development about storage methods, safety concerns, and personalized vaccine synthesis can be found in the context.
Collapse
Affiliation(s)
- Shuqin Xu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Science, Fudan University, Shanghai 200438, China; (S.X.); (K.Y.)
| | - Kunpeng Yang
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Science, Fudan University, Shanghai 200438, China; (S.X.); (K.Y.)
| | - Rose Li
- M.B.B.S., School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China;
| | - Lu Zhang
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Science, Fudan University, Shanghai 200438, China; (S.X.); (K.Y.)
- Shanghai Engineering Research Center of Industrial Microorganisms, Shanghai 200438, China
| |
Collapse
|
147
|
Griffin AC, Topaloglu U, Davis S, Chung AE. From Patient Engagement to Precision Oncology: Leveraging Informatics to Advance Cancer Care. Yearb Med Inform 2020; 29:235-242. [PMID: 32823322 PMCID: PMC7442514 DOI: 10.1055/s-0040-1701983] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVES Conduct a survey of the literature for advancements in cancer informatics over the last three years in three specific areas where there has been unprecedented growth: 1) digital health; 2) machine learning; and 3) precision oncology. We also highlight the ethical implications and future opportunities within each area. METHODS A search was conducted over a three-year period in two electronic databases (PubMed, Google Scholar) to identify peer-reviewed articles and conference proceedings. Search terms included variations of the following: neoplasms[MeSH], informatics[MeSH], cancer, oncology, clinical cancer informatics, medical cancer informatics. The search returned too many articles for practical review (23,994 from PubMed and 23,100 from Google Scholar). Thus, we conducted searches of key PubMed-indexed informatics journals and proceedings. We further limited our search to manuscripts that demonstrated a clear focus on clinical or translational cancer informatics. Manuscripts were then selected based on their methodological rigor, scientific impact, innovation, and contribution towards cancer informatics as a field or on their impact on cancer care and research. RESULTS Key developments and opportunities in cancer informatics research in the areas of digital health, machine learning, and precision oncology were summarized. CONCLUSION While there are numerous innovations in the field of cancer informatics to advance prevention and clinical care, considerable challenges remain related to data sharing and privacy, digital accessibility, and algorithm biases and interpretation. The implementation and application of these findings in cancer care necessitates further consideration and research.
Collapse
Affiliation(s)
| | - Umit Topaloglu
- Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Sean Davis
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arlene E. Chung
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
- UNC Lineberger Comprehensive Cancer Center, Chapel Hill, NC, USA
| |
Collapse
|
148
|
Poot Velez AH, Fontove F, Del Rio G. Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes. Int J Mol Sci 2020; 21:E4787. [PMID: 32640745 PMCID: PMC7370293 DOI: 10.3390/ijms21134787] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 06/20/2020] [Accepted: 06/28/2020] [Indexed: 01/22/2023] Open
Abstract
Predicting protein-protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm-parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96-99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.
Collapse
Affiliation(s)
- Albros Hermes Poot Velez
- Department of biochemistry and structural biology, Instituto de fisiologia celular, UNAM Mexico City 04510, Mexico;
| | | | - Gabriel Del Rio
- Department of biochemistry and structural biology, Instituto de fisiologia celular, UNAM Mexico City 04510, Mexico;
| |
Collapse
|
149
|
Saravanan KM, Zhang H, Zhang H, Xi W, Wei Y. On the Conformational Dynamics of β-Amyloid Forming Peptides: A Computational Perspective. Front Bioeng Biotechnol 2020; 8:532. [PMID: 32656188 PMCID: PMC7325929 DOI: 10.3389/fbioe.2020.00532] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 05/04/2020] [Indexed: 12/12/2022] Open
Abstract
Understanding the conformational dynamics of proteins and peptides involved in important functions is still a difficult task in computational structural biology. Because such conformational transitions in β-amyloid (Aβ) forming peptides play a crucial role in many neurological disorders, researchers from different scientific fields have been trying to address issues related to the folding of Aβ forming peptides together. Many theoretical models have been proposed in the recent years for studying Aβ peptides using mathematical, physicochemical, and molecular dynamics simulation, and machine learning approaches. In this article, we have comprehensively reviewed the developmental advances in the theoretical models for Aβ peptide folding and interactions, particularly in the context of neurological disorders. Furthermore, we have extensively reviewed the advances in molecular dynamics simulation as a tool used for studying the conversions between polymorphic amyloid forms and applications of using machine learning approaches in predicting Aβ peptides and aggregation-prone regions in proteins. We have also provided details on the theoretical advances in the study of Aβ peptides, which would enhance our understanding of these peptides at the molecular level and eventually lead to the development of targeted therapies for certain acute neurological disorders such as Alzheimer's disease in the future.
Collapse
Affiliation(s)
| | | | | | - Wenhui Xi
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- Center for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
150
|
Liu ZL, Hu JH, Jiang F, Wu YD. CRiSP: accurate structure prediction of disulfide-rich peptides with cystine-specific sequence alignment and machine learning. Bioinformatics 2020; 36:3385-3392. [PMID: 32215567 DOI: 10.1093/bioinformatics/btaa193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 02/06/2020] [Accepted: 03/22/2020] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION High-throughput sequencing discovers many naturally occurring disulfide-rich peptides or cystine-rich peptides (CRPs) with diversified bioactivities. However, their structure information, which is very important to peptide drug discovery, is still very limited. RESULTS We have developed a CRP-specific structure prediction method called Cystine-Rich peptide Structure Prediction (CRiSP), based on a customized template database with cystine-specific sequence alignment and three machine-learning predictors. The modeling accuracy is significantly better than several popular general-purpose structure modeling methods, and our CRiSP can provide useful model quality estimations. AVAILABILITY AND IMPLEMENTATION The CRiSP server is freely available on the website at http://wulab.com.cn/CRISP. CONTACT wuyd@pkusz.edu.cn or jiangfan@pku.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zi-Lin Liu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Jing-Hao Hu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Fan Jiang
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,NanoAI Biotech Co., Ltd, Shenzhen 518118, China
| | - Yun-Dong Wu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.,College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China.,Shenzhen Bay Laboratory, Shenzhen 518055, China
| |
Collapse
|