1
|
Najar Najafi N, Karbassian R, Hajihassani H, Azimzadeh Irani M. Unveiling the influence of fastest nobel prize winner discovery: alphafold's algorithmic intelligence in medical sciences. J Mol Model 2025; 31:163. [PMID: 40387957 DOI: 10.1007/s00894-025-06392-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 05/06/2025] [Indexed: 05/20/2025]
Abstract
CONTEXT AlphaFold's advanced AI technology has transformed protein structure interpretation. By predicting three-dimensional protein structures from amino acid sequences, AlphaFold has solved the complex protein-folding problem, previously challenging for experimental methods due to numerous possible conformations. Since its inception, AlphaFold has introduced several versions, including AlphaFold2, AlphaFold DB, AlphaFold Multimer, Alpha Missense, and AlphaFold3, each further enhancing protein structure prediction. Remarkably, AlphaFold is recognized as the fastest Nobel Prize winner in science history. This technology has extensive applications, potentially transforming treatment and diagnosis in medical sciences by reducing drug design costs and time, while elucidating structural pathways of human body systems. Numerous studies have demonstrated how AlphaFold aids in understanding health conditions by providing critical information about protein mutations, abnormal protein-protein interactions, and changes in protein dynamics. Researchers have also developed new technologies and pipelines using different versions of AlphaFold to amplify its potential. However, addressing existing limitations is crucial to maximizing AlphaFold's capacity to redefine medical research. This article reviews AlphaFold's impact on five key aspects of medical sciences: protein mutation, protein-protein interaction, molecular dynamics, drug design, and immunotherapy. METHODS This review examines the contributions of various AlphaFold versions AlphaFold2, AlphaFold DB, AlphaFold Multimer, Alpha Missense, and AlphaFold3 to protein structure prediction. The methods include an extensive analysis of computational techniques and software used in interpreting and predicting protein structures, emphasizing advances in AI technology and its applications in medical research.
Collapse
Affiliation(s)
- Niki Najar Najafi
- Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Reyhaneh Karbassian
- Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Helia Hajihassani
- Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | | |
Collapse
|
2
|
Sutherland CA, Stevens DM, Seong K, Wei W, Krasileva KV. The resistance awakens: Diversity at the DNA, RNA, and protein levels informs engineering of plant immune receptors from Arabidopsis to crops. THE PLANT CELL 2025; 37:koaf109. [PMID: 40344182 PMCID: PMC12118082 DOI: 10.1093/plcell/koaf109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2025] [Revised: 04/17/2025] [Accepted: 04/21/2025] [Indexed: 05/11/2025]
Abstract
Plants rely on germline-encoded, innate immune receptors to sense pathogens and initiate the defense response. The exponential increase in quality and quantity of genomes, RNA-seq datasets, and protein structures has underscored the incredible biodiversity of plant immunity. Arabidopsis continues to serve as a valuable model and theoretical foundation of our understanding of wild plant diversity of immune receptors, while expansion of study into agricultural crops has also revealed distinct evolutionary trajectories and challenges. Here, we provide the classical context for study of both intracellular nucleotide-binding, leucine-rich repeat receptors and surface-localized pattern recognition receptors at the levels of DNA sequences, transcriptional regulation, and protein structures. We then examine how recent technology has shaped our understanding of immune receptor evolution and informed our ability to efficiently engineer resistance. We summarize current literature and provide an outlook on how researchers take inspiration from natural diversity in bioengineering efforts for disease resistance from Arabidopsis and other model systems to crops.
Collapse
Affiliation(s)
- Chandler A Sutherland
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Danielle M Stevens
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Kyungyong Seong
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Wei Wei
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Ksenia V Krasileva
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
3
|
Balasco N, Esposito L, Vitagliano L. Structural Biology in the AlphaFold Era: How Far Is Artificial Intelligence from Deciphering the Protein Folding Code? Biomolecules 2025; 15:674. [PMID: 40427567 PMCID: PMC12109453 DOI: 10.3390/biom15050674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2025] [Revised: 04/24/2025] [Accepted: 05/02/2025] [Indexed: 05/29/2025] Open
Abstract
Proteins are biomolecules characterized by uncommon chemical and physicochemical complexities coupled with extreme responsiveness to even minor chemical modifications or environmental variations. Since the shape that proteins assume is fundamental for their function, understanding the chemical and structural bases that drive their three-dimensional structures represents the central problem for an atomic-level interpretation of biology. Not surprisingly, this question has progressively become the Holy Grail of structural biology (the folding problem). From this perspective, we initially describe and discuss the different formulations of the folding problem. In the present manuscript, the folding problem is framed from a historical perspective, effectively highlighting the progress made in the last lustrum. We chronologically summarize the major contributions that traditional methodologies provide in approaching this multifaceted problem. We then describe the recent advent and evolution of predictive approaches based on machine learning techniques that are revolutionizing the field by pointing out the potentialities and limitations of this approach. In the final part of the perspective, we illustrate the contribution that computational approaches will make in current structural biology to overcome the limitations of the reductionist approach of studying individual molecules to afford the atomic-level characterization of entire cellular compartments.
Collapse
Affiliation(s)
- Nicole Balasco
- Institute of Molecular Biology and Pathology, National Research Council (CNR), c/o Department Chemistry, Sapienza University of Rome, 00185 Rome, Italy;
| | - Luciana Esposito
- Institute of Biostructure and Bioimaging, Department of Biomedical Sciences, National Research Council (CNR), 80131 Naples, Italy;
| | - Luigi Vitagliano
- Institute of Biostructure and Bioimaging, Department of Biomedical Sciences, National Research Council (CNR), 80131 Naples, Italy;
| |
Collapse
|
4
|
Abdizadeh T, Rezaei S, Emadi Z, Sadeghi R, Saffari-Chaleshtori J, Sadeghi M. Investigation of bioremediation for glyphosate and its metabolite in soil using arbuscular mycorrhizal GmHsp60 protein: a molecular docking and molecular dynamics simulations approach. J Biomol Struct Dyn 2024:1-25. [PMID: 39829398 DOI: 10.1080/07391102.2024.2445767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 06/18/2024] [Indexed: 01/22/2025]
Abstract
The widespread use of glyphosate and the high dependence of the agricultural industry on this herbicide cause environmental pollution and pose a threat to living organisms. One of the appropriate solutions in sustainable agriculture to deal with pollution caused by glyphosate and its metabolites is creating a symbiotic relationship between plants and mycorrhizal fungi. Glomalin-related soil protein is a key protein for the bioremediation of glyphosate and its metabolite aminomethyl phosphonic acid in soil. This study uses homology modeling, molecular docking, and molecular dynamic simulation approaches to investigate the binding mechanism of glomalin-related soil protein from arbuscular mycorrhiza (GmHsp60) with glyphosate and its metabolite and the role of soil protein in the removal and sequestering of common agricultural soil pollutants. GmHsp60 protein structure was predicted by homology modeling, and the quality of the generated model was assessed. Then, the interaction between glyphosate and aminomethyl phosphonic acid and the modeled GmHsp60 protein was explored by molecular docking. Based on docking results, GmHsp60 has an efficient role in the bioremediation of glyphosate and aminomethyl phosphonic acid (-6.03 and -5.34 kcal/mol). Glyphosate forms three hydrogen bonds with Lys258, Gly262, and Glu58 of GmHsp60, and aminomethyl phosphonic acid forms three hydrogen bonds with Lys258, Gly261, and Gly262 of GmHsp60. In addition, the glyphosate's and its metabolite's stability was confirmed by molecular docking simulations and binding free energy calculations using MM/PBSA analysis. This study provides a molecular-level understanding of GmHsp60 expression and function for glyphosate bioremediation.
Collapse
Affiliation(s)
- Tooba Abdizadeh
- Clinical Biochemistry Research Center, Basic Health Sciences Institute, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Somayeh Rezaei
- Department of Environmental Health Engineering, School of Health, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Zahra Emadi
- Department of Environmental Health Engineering, School of Health, Shahrekord University of Medical Sciences, Shahrekord, Iran
- Student Research Committee, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Ramin Sadeghi
- Chemical Engineering Department, Iran University of Science & Technology, Narmak, Tehran, Iran
| | - Javad Saffari-Chaleshtori
- Clinical Biochemistry Research Center, Basic Health Sciences Institute, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Mehraban Sadeghi
- Clinical Biochemistry Research Center, Basic Health Sciences Institute, Shahrekord University of Medical Sciences, Shahrekord, Iran
- Department of Environmental Health Engineering, School of Health, Shahrekord University of Medical Sciences, Shahrekord, Iran
| |
Collapse
|
5
|
Sleator RD. Solving the protein folding problem…. FEBS Lett 2024; 598:2831-2835. [PMID: 39428256 DOI: 10.1002/1873-3468.15043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 10/01/2024] [Accepted: 10/04/2024] [Indexed: 10/22/2024]
Abstract
The protein folding problem was, to paraphrase Churchill, 'A riddle wrapped in a mystery inside an enigma'. The riddle, in this context, was the folding code; what interactions at the amino acid level are driving the folding process? The mystery was the kinetic question (Levinthal's paradox); how does the folding process occur so quickly (typically in timescales ranging from μS to mS)? Finally, the enigma represents the computational problem of developing approaches to predict the final folded sate of a protein given only its amino acid sequence. Herein, I trace the path to solving this riddle wrapped in a mystery inside an enigma.
Collapse
Affiliation(s)
- Roy D Sleator
- Department of Biological Sciences, Munster Technological University, Cork, Ireland
| |
Collapse
|
6
|
Molotkov I, Mardis ER, Artomov M. Making sense of missense: challenges and opportunities in variant pathogenicity prediction. Dis Model Mech 2024; 17:dmm052218. [PMID: 39676521 PMCID: PMC11683568 DOI: 10.1242/dmm.052218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2024] Open
Abstract
Computational tools for predicting variant pathogenicity are widely used to support clinical variant interpretation. Recently, several models, which do not rely on known variant classifications during training, have been developed. These approaches can potentially overcome biases of current clinical databases, such as misclassifications, and can potentially better generalize to novel, unclassified variants. AlphaMissense is one such model, built on the highly successful protein structure prediction model, AlphaFold. AlphaMissense has shown great performance in benchmarks of functional and clinical data, outperforming many supervised models that were trained on similar data. However, like other in silico predictors, AlphaMissense has notable limitations. As a large deep learning model, it lacks interpretability, does not assess the functional impact of variants, and provides pathogenicity scores that are not disease specific. Improving interpretability and precision in computational tools for variant interpretation remains a promising area for advancing clinical genetics.
Collapse
Affiliation(s)
- Ivan Molotkov
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, OH 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43215, USA
| | - Elaine R. Mardis
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, OH 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43215, USA
| | - Mykyta Artomov
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, OH 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43215, USA
| |
Collapse
|
7
|
Haque N, Wagenknecht JB, Ratnasinghe BD, Zimmermann MT. Systematic analysis of the relationship between fold-dependent flexibility and artificial intelligence protein structure prediction. PLoS One 2024; 19:e0313308. [PMID: 39591473 PMCID: PMC11594405 DOI: 10.1371/journal.pone.0313308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Accepted: 10/23/2024] [Indexed: 11/28/2024] Open
Abstract
Artificial Intelligence (AI)-based deep learning methods for predicting protein structures are reshaping knowledge development and scientific discovery. Recent large-scale application of AI models for protein structure prediction has changed perceptions about complicated biological problems and empowered a new generation of structure-based hypothesis testing. It is well-recognized that proteins have a modular organization according to archetypal folds. However, it is yet to be determined if predicted structures are tuned to one conformation of flexible proteins or if they represent average conformations. Further, whether or not the answer is protein fold-dependent. Therefore, in this study, we analyzed 2878 proteins with at least ten distinct experimental structures available, from which we can estimate protein topological rigidity verses heterogeneity from experimental measurements. We found that AlphaFold v2 (AF2) predictions consistently return one specific form to high accuracy, with 99.68% of distinct folds (n = 623 out of 628) having an experimental structure within 2.5Å RMSD from a predicted structure. Yet, 27.70% and 10.82% of folds (174 and 68 out of 628 folds) have at least one experimental structure over 2.5Å and 5Å RMSD, respectively, from their AI-predicted structure. This information is important for how researchers apply and interpret the output of AF2 and similar tools. Additionally, it enabled us to score fold types according to how homogeneous versus heterogeneous their conformations are. Importantly, folds with high heterogeneity are enriched among proteins which regulate vital biological processes including immune cell differentiation, immune activation, and metabolism. This result demonstrates that a large amount of protein fold flexibility has already been experimentally measured, is vital for critical cellular processes, and is currently unaccounted for in structure prediction databases. Therefore, the structure-prediction revolution begets the protein dynamics revolution!
Collapse
Affiliation(s)
- Neshatul Haque
- Computational Structural Genomics Unit, Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States of America
| | - Jessica B. Wagenknecht
- Computational Structural Genomics Unit, Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States of America
| | - Brian D. Ratnasinghe
- Computational Structural Genomics Unit, Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States of America
| | - Michael T. Zimmermann
- Computational Structural Genomics Unit, Linda T. and John A. Mellowes Center for Genomic Sciences and Precision Medicine, Medical College of Wisconsin, Milwaukee, WI, United States of America
- Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, WI, United States of America
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, United States of America
| |
Collapse
|
8
|
Olanders G, Testa G, Tibo A, Nittinger E, Tyrchan C. Challenge for Deep Learning: Protein Structure Prediction of Ligand-Induced Conformational Changes at Allosteric and Orthosteric Sites. J Chem Inf Model 2024; 64:8481-8494. [PMID: 39484820 DOI: 10.1021/acs.jcim.4c01475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
In the realm of biomedical research, understanding the intricate structure of proteins is crucial, as these structures determine how proteins function within our bodies and interact with potential drugs. Traditionally, methods like X-ray crystallography and cryo-electron microscopy have been used to unravel these structures, but they are often challenging, time-consuming and costly. Recently, a breakthrough in computational biology has emerged with the development of deep learning algorithms capable of predicting protein structures based on their amino acid sequences (Jumper, J., et al. Nature 2021, 596, 583. Lane, T. J. Nature Methods 2023, 20, 170. Kryshtafovych, A., et al. Proteins: Structure, Function and Bioinformatics 2021, 89, 1607). This study focuses on predicting the dynamic changes that proteins undergo upon ligand binding, specifically when they bind to allosteric sites, i.e. a pocket different from the active site. Allosteric modulators are particularly important for drug discovery, as they open new avenues for designing drugs that can target proteins more effectively and with fewer side effects (Nussinov, R.; Tsai, C. J. Cell 2013, 153, 293). To study this, we curated a data set of 578 X-ray structures comprised of proteins displaying orthosteric and allosteric binding as well as a general framework to evaluate deep learning-based structure prediction methods. Our findings demonstrate the potential and current limitations of deep learning methods, such as AlphaFold2 (Jumper, J., et al. Nature 2021, 596, 583), NeuralPLexer (Qiao, Z., et al. Nat Mach Intell 2024, 6, 195), and RoseTTAFold All-Atom (Krishna, R., et al. Science 2024, 384, eadl2528) to predict not just static protein structures but also the dynamic conformational changes. Herein we show that predicting the allosteric induce-fit conformation still poses a challenge to deep learning methods as they more accurately predict the orthosteric bound conformation compared to the allosteric induce fit conformation. For AlphaFold2, we observed that conformational diversity, and sampling between the apo and holo state could be increased by modifying the MSA depth, but this did not enhance the ability to generate conformations close to the allosteric induced-fit conformation. To further support advancements in protein structure prediction field, the curated data set and evaluation framework are made publicly available.
Collapse
Affiliation(s)
- Gustav Olanders
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden
| | - Giulia Testa
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden
| | - Alessandro Tibo
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, 43183 Gothenburg, Sweden
| | - Eva Nittinger
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, 43183 Gothenburg, Sweden
| |
Collapse
|
9
|
Ngo K, Yang PC, Yarov-Yarovoy V, Clancy CE, Vorobyov I. Harnessing AlphaFold to reveal hERG channel conformational state secrets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.27.577468. [PMID: 38352360 PMCID: PMC10862728 DOI: 10.1101/2024.01.27.577468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
To design safe, selective, and effective new therapies, there must be a deep understanding of the structure and function of the drug target. One of the most difficult problems to solve has been resolution of discrete conformational states of transmembrane ion channel proteins. An example is KV11.1 (hERG), comprising the primary cardiac repolarizing current, I Kr. hERG is a notorious drug anti-target against which all promising drugs are screened to determine potential for arrhythmia. Drug interactions with the hERG inactivated state are linked to elevated arrhythmia risk, and drugs may become trapped during channel closure. However, the structural details of multiple conformational states have remained elusive. Here, we guided AlphaFold2 to predict plausible hERG inactivated and closed conformations, obtaining results consistent with multiple available experimental data. Drug docking simulations demonstrated hERG state-specific drug interactions in good agreement with experimental results, revealing that most drugs bind more effectively in the inactivated state and are trapped in the closed state. Molecular dynamics simulations demonstrated ion conduction for an open but not AlphaFold2 predicted inactivated state that aligned with earlier studies. Finally, we identified key molecular determinants of state transitions by analyzing interaction networks across closed, open, and inactivated states in agreement with earlier mutagenesis studies. Here, we demonstrate a readily generalizable application of AlphaFold2 as an effective and robust method to predict discrete protein conformations, reconcile seemingly disparate data and identify novel linkages from structure to function.
Collapse
Affiliation(s)
- Khoa Ngo
- Center for Precision Medicine and Data Science, University of California, Davis, California
- Department of Physiology and Membrane Biology, University of California, Davis, California
| | - Pei-Chi Yang
- Center for Precision Medicine and Data Science, University of California, Davis, California
- Department of Physiology and Membrane Biology, University of California, Davis, California
| | - Vladimir Yarov-Yarovoy
- Center for Precision Medicine and Data Science, University of California, Davis, California
- Department of Physiology and Membrane Biology, University of California, Davis, California
- Department of Anesthesiology and Pain Medicine, University of California, Davis, California
| | - Colleen E. Clancy
- Center for Precision Medicine and Data Science, University of California, Davis, California
- Department of Physiology and Membrane Biology, University of California, Davis, California
- Department of Pharmacology, University of California, Davis, California
| | - Igor Vorobyov
- Department of Physiology and Membrane Biology, University of California, Davis, California
- Department of Pharmacology, University of California, Davis, California
| |
Collapse
|
10
|
Evseev P, Gutnik D, Evpak A, Kasimova A, Miroshnikov K. Origin, Evolution and Diversity of φ29-like Phages-Review and Bioinformatic Analysis. Int J Mol Sci 2024; 25:10838. [PMID: 39409167 PMCID: PMC11476376 DOI: 10.3390/ijms251910838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 10/04/2024] [Accepted: 10/07/2024] [Indexed: 10/20/2024] Open
Abstract
Phage φ29 and related bacteriophages are currently the smallest known tailed viruses infecting various representatives of both Gram-positive and Gram-negative bacteria. They are characterised by genomic content features and distinctive properties that are unique among known tailed phages; their characteristics include protein primer-driven replication and a packaging process characteristic of this group. Searches conducted using public genomic databases revealed in excess of 2000 entries, including bacteriophages, phage plasmids and sequences identified as being archaeal that share the characteristic features of phage φ29. An analysis of predicted proteins, however, indicated that the metagenomic sequences attributed as archaeal appear to be misclassified and belong to bacteriophages. An analysis of the translated polypeptides of major capsid proteins (MCPs) of φ29-related phages indicated the dissimilarity of MCP sequences to those of almost all other known Caudoviricetes groups and a possible distant relationship to MCPs of T7-like (Autographiviridae) phages. Sequence searches conducted using HMM revealed the relatedness between the main structural proteins of φ29-like phages and an unusual lactococcal phage, KSY1 (Chopinvirus KSY1), whose genome contains two genes of RNA polymerase that are similar to the RNA polymerases of phages of the Autographiviridae and Schitoviridae (N4-like) families. An analysis of the tail tube proteins of φ29-like phages indicated their dissimilarity of the lower collar protein to tail proteins of all other viral groups, but revealed its possible distant relatedness with proteins of toxin translocation complexes. The combination of the unique features and distinctive origin of φ29-related phages suggests the categorisation of this vast group in a new order or as a new taxon of a higher rank.
Collapse
Affiliation(s)
- Peter Evseev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Street 16/10, 117997 Moscow, Russia
- Laboratory of Molecular Microbiology, Pirogov Russian National Research Medical University, Ostrovityanova Street 1, 117997 Moscow, Russia
| | - Daria Gutnik
- Limnological Institute, Siberian Branch of the Russian Academy of Sciences, Ulan-Batorsakaya Street, 3, 664033 Irkutsk, Russia
| | - Alena Evpak
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Street 16/10, 117997 Moscow, Russia
| | - Anastasia Kasimova
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky Prospekt, 47, 119991 Moscow, Russia
| | - Konstantin Miroshnikov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Street 16/10, 117997 Moscow, Russia
| |
Collapse
|
11
|
Wei G, Wu N, Zhao K, Yang S, Wang L, Liu Y. DeepCheck: multitask learning aids in assessing microbial genome quality. Brief Bioinform 2024; 25:bbae539. [PMID: 39438078 PMCID: PMC11495869 DOI: 10.1093/bib/bbae539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/26/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024] Open
Abstract
Metagenomic analyses facilitate the exploration of the microbial world, advancing our understanding of microbial roles in ecological and biological processes. A pivotal aspect of metagenomic analysis involves assessing the quality of metagenome-assembled genomes (MAGs), crucial for accurate biological insights. Current machine learning-based methods often treat completeness and contamination prediction as separate tasks, overlooking their inherent relationship and limiting models' generalization. In this study, we present DeepCheck, a multitasking deep learning framework for simultaneous prediction of MAG completeness and contamination. DeepCheck consistently outperforms existing tools in accuracy across various experimental settings and demonstrates comparable speed while maintaining high predictive accuracy even for new lineages. Additionally, we employ interpretable machine learning techniques to identify specific genes and pathways that drive the model's predictions, enabling independent investigation and assessment of these biological elements for deeper insights.
Collapse
Affiliation(s)
- Guo Wei
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210000, China
| | - Nannan Wu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210000, China
| | - Kunyang Zhao
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210000, China
| | - Sihai Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210000, China
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, 159 Panlong road, Xuanwu District, Nanjing 210000, China
| | - Long Wang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210000, China
| | - Yan Liu
- Department of Computer Science, Yangzhou University, 196 Huaxi Road, Hanjiang District, Yangzhou 225100, China
| |
Collapse
|
12
|
Zhang S, Li J, Chen SJ. Machine learning in RNA structure prediction: Advances and challenges. Biophys J 2024; 123:2647-2657. [PMID: 38297836 PMCID: PMC11393687 DOI: 10.1016/j.bpj.2024.01.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/08/2024] [Accepted: 01/24/2024] [Indexed: 02/02/2024] Open
Abstract
RNA molecules play a crucial role in various biological processes, with their functionality closely tied to their structures. The remarkable advancements in machine learning techniques for protein structure prediction have shown promise in the field of RNA structure prediction. In this perspective, we discuss the advances and challenges encountered in constructing machine learning-based models for RNA structure prediction. We explore topics including model building strategies, specific challenges involved in predicting RNA secondary (2D) and tertiary (3D) structures, and approaches to these challenges. In addition, we highlight the advantages and challenges of constructing RNA language models. Given the rapid advances of machine learning techniques, we anticipate that machine learning-based models will serve as important tools for predicting RNA structures, thereby enriching our understanding of RNA structures and their corresponding functions.
Collapse
Affiliation(s)
- Sicheng Zhang
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Jun Li
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Shi-Jie Chen
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri; Department of Biochemistry, University of Missouri, Columbia, Missouri.
| |
Collapse
|
13
|
Capponi S, Wang S. AI in cellular engineering and reprogramming. Biophys J 2024; 123:2658-2670. [PMID: 38576162 PMCID: PMC11393708 DOI: 10.1016/j.bpj.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/19/2024] [Accepted: 04/01/2024] [Indexed: 04/06/2024] Open
Abstract
During the last decade, artificial intelligence (AI) has increasingly been applied in biophysics and related fields, including cellular engineering and reprogramming, offering novel approaches to understand, manipulate, and control cellular function. The potential of AI lies in its ability to analyze complex datasets and generate predictive models. AI algorithms can process large amounts of data from single-cell genomics and multiomic technologies, allowing researchers to gain mechanistic insights into the control of cell identity and function. By integrating and interpreting these complex datasets, AI can help identify key molecular events and regulatory pathways involved in cellular reprogramming. This knowledge can inform the design of precision engineering strategies, such as the development of new transcription factor and signaling molecule cocktails, to manipulate cell identity and drive authentic cell fate across lineage boundaries. Furthermore, when used in combination with computational methods, AI can accelerate and improve the analysis and understanding of the intricate relationships between genes, proteins, and cellular processes. In this review article, we explore the current state of AI applications in biophysics with a specific focus on cellular engineering and reprogramming. Then, we showcase a couple of recent applications where we combined machine learning with experimental and computational techniques. Finally, we briefly discuss the challenges and prospects of AI in cellular engineering and reprogramming, emphasizing the potential of these technologies to revolutionize our ability to engineer cells for a variety of applications, from disease modeling and drug discovery to regenerative medicine and biomanufacturing.
Collapse
Affiliation(s)
- Sara Capponi
- IBM Almaden Research Center, San Jose, California; Center for Cellular Construction, San Francisco, California.
| | - Shangying Wang
- Bay Area Institute of Science, Altos Labs, Redwood City, California.
| |
Collapse
|
14
|
Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024; 53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Global environmental issues and sustainable development call for new technologies for fine chemical synthesis and waste valorization. Biocatalysis has attracted great attention as the alternative to the traditional organic synthesis. However, it is challenging to navigate the vast sequence space to identify those proteins with admirable biocatalytic functions. The recent development of deep-learning based structure prediction methods such as AlphaFold2 reinforced by different computational simulations or multiscale calculations has largely expanded the 3D structure databases and enabled structure-based design. While structure-based approaches shed light on site-specific enzyme engineering, they are not suitable for large-scale screening of potential biocatalysts. Effective utilization of big data using machine learning techniques opens up a new era for accelerated predictions. Here, we review the approaches and applications of structure-based and machine-learning guided enzyme design. We also provide our view on the challenges and perspectives on effectively employing enzyme design approaches integrating traditional molecular simulations and machine learning, and the importance of database construction and algorithm development in attaining predictive ML models to explore the sequence fitness landscape for the design of admirable biocatalysts.
Collapse
Affiliation(s)
- Jiahui Zhou
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| | - Meilan Huang
- School of Chemistry and Chemical Engineering, Queen's University, David Keir Building, Stranmillis Road, Belfast BT9 5AG, Northern Ireland, UK.
| |
Collapse
|
15
|
Chen L, Li Q, Nasif KFA, Xie Y, Deng B, Niu S, Pouriyeh S, Dai Z, Chen J, Xie CY. AI-Driven Deep Learning Techniques in Protein Structure Prediction. Int J Mol Sci 2024; 25:8426. [PMID: 39125995 PMCID: PMC11313475 DOI: 10.3390/ijms25158426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 07/29/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Protein structure prediction is important for understanding their function and behavior. This review study presents a comprehensive review of the computational models used in predicting protein structure. It covers the progression from established protein modeling to state-of-the-art artificial intelligence (AI) frameworks. The paper will start with a brief introduction to protein structures, protein modeling, and AI. The section on established protein modeling will discuss homology modeling, ab initio modeling, and threading. The next section is deep learning-based models. It introduces some state-of-the-art AI models, such as AlphaFold (AlphaFold, AlphaFold2, AlphaFold3), RoseTTAFold, ProteinBERT, etc. This section also discusses how AI techniques have been integrated into established frameworks like Swiss-Model, Rosetta, and I-TASSER. The model performance is compared using the rankings of CASP14 (Critical Assessment of Structure Prediction) and CASP15. CASP16 is ongoing, and its results are not included in this review. Continuous Automated Model EvaluatiOn (CAMEO) complements the biennial CASP experiment. Template modeling score (TM-score), global distance test total score (GDT_TS), and Local Distance Difference Test (lDDT) score are discussed too. This paper then acknowledges the ongoing difficulties in predicting protein structure and emphasizes the necessity of additional searches like dynamic protein behavior, conformational changes, and protein-protein interactions. In the application section, this paper introduces some applications in various fields like drug design, industry, education, and novel protein development. In summary, this paper provides a comprehensive overview of the latest advancements in established protein modeling and deep learning-based models for protein structure predictions. It emphasizes the significant advancements achieved by AI and identifies potential areas for further investigation.
Collapse
Affiliation(s)
- Lingtao Chen
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Qiaomu Li
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Kazi Fahim Ahmad Nasif
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Ying Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Bobin Deng
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Shuteng Niu
- Department of Computer Science, Bowling Green State University, Bowling Green, OH 43403, USA;
| | - Seyedamin Pouriyeh
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Zhiyu Dai
- Division of Pulmonary and Critical Care Medicine, John T. Milliken Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA;
| | - Jiawei Chen
- College of Computing, Data Science and Society, University of California, Berkeley, CA 94720, USA;
| | - Chloe Yixin Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| |
Collapse
|
16
|
Dahlström KM, Salminen TA. Apprehensions and emerging solutions in ML-based protein structure prediction. Curr Opin Struct Biol 2024; 86:102819. [PMID: 38631107 DOI: 10.1016/j.sbi.2024.102819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/31/2024] [Indexed: 04/19/2024]
Abstract
The three-dimensional structure of proteins determines their function in vital biological processes. Thus, when the structure is known, the molecular mechanism of protein function can be understood in more detail and obtained information utilized in biotechnological, diagnostics, and therapeutic applications. Over the past five years, machine learning (ML)-based modeling has pushed protein structure prediction to the next level with AlphaFold in the front line, predicting the structure for hundreds of millions of proteins. Further advances recently report promising ML-based approaches for solving remaining challenges by incorporating functionally important metals, co-factors, post-translational modifications, structural dynamics, and interdomain and multimer interactions in the structure prediction process.
Collapse
Affiliation(s)
- Käthe M Dahlström
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland
| | - Tiina A Salminen
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland.
| |
Collapse
|
17
|
Wang L, Wen Z, Liu SW, Zhang L, Finley C, Lee HJ, Fan HJS. Overview of AlphaFold2 and breakthroughs in overcoming its limitations. Comput Biol Med 2024; 176:108620. [PMID: 38761500 DOI: 10.1016/j.compbiomed.2024.108620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 05/01/2024] [Accepted: 05/14/2024] [Indexed: 05/20/2024]
Abstract
Predicting three-dimensional (3D) protein structures has been challenging for decades. The emergence of AlphaFold2 (AF2), a deep learning-based machine learning method developed by DeepMind, became a game changer in the protein folding community. AF2 can predict a protein's three-dimensional structure with high confidence based on its amino acid sequence. Accurate prediction of protein structures can dramatically accelerate our understanding of biological mechanisms and provide a solid foundation for reliable drug design. Although AF2 breaks through the barriers in predicting protein structures, many rooms remain to be further studied. This review provides a brief historical overview of the development of protein structure prediction, covering template-based, template-free, and machine learning-based methods. In addition to reviewing the potential benefits (Pros) and considerations (Cons) of using AF2, this review summarizes the diverse applications, including protein structure predictions, dynamic changes, point mutation, integration of language model and experimental data, protein complex, and protein-peptide interaction. It underscores recent advancements in efficiency, reliability, and broad application of AF2. This comprehensive review offers valuable insights into the applications of AF2 and AF2-inspired AI methods in structural biology and its potential for clinically significant drug target discovery.
Collapse
Affiliation(s)
- Lei Wang
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Zehua Wen
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Shi-Wei Liu
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China
| | - Lihong Zhang
- Digestive Department, Binhai New Area Hospital of TCM Tianjin, Tianjin, 300451, China
| | - Cierra Finley
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA
| | - Ho-Jin Lee
- Department of Natural Sciences, Southwest Tennessee Community College, Memphis, TN, 38015, USA; Division of Natural & Mathematical Sciences, LeMoyne-Own College, Memphis, TN, 38126, USA.
| | - Hua-Jun Shawn Fan
- College of Chemical Engineering, Sichuan University of Science and Engineering, Zigong City, Sichuan Province, 64300, China.
| |
Collapse
|
18
|
Evseev PV, Sukhova AS, Tkachenko NA, Skryabin YP, Popova AV. Lytic Capsule-Specific Acinetobacter Bacteriophages Encoding Polysaccharide-Degrading Enzymes. Viruses 2024; 16:771. [PMID: 38793652 PMCID: PMC11126041 DOI: 10.3390/v16050771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/26/2024] Open
Abstract
The genus Acinetobacter comprises both environmental and clinically relevant species associated with hospital-acquired infections. Among them, Acinetobacter baumannii is a critical priority bacterial pathogen, for which the research and development of new strategies for antimicrobial treatment are urgently needed. Acinetobacter spp. produce a variety of structurally diverse capsular polysaccharides (CPSs), which surround the bacterial cells with a thick protective layer. These surface structures are primary receptors for capsule-specific bacteriophages, that is, phages carrying tailspikes with CPS-depolymerizing/modifying activities. Phage tailspike proteins (TSPs) exhibit hydrolase, lyase, or esterase activities toward the corresponding CPSs of a certain structure. In this study, the data on all lytic capsule-specific phages infecting Acinetobacter spp. with genomes deposited in the NCBI GenBank database by January 2024 were summarized. Among the 149 identified TSPs encoded in the genomes of 143 phages, the capsular specificity (K specificity) of 46 proteins has been experimentally determined or predicted previously. The specificity of 63 TSPs toward CPSs, produced by various Acinetobacter K types, was predicted in this study using a bioinformatic analysis. A comprehensive phylogenetic analysis confirmed the prediction and revealed the possibility of the genetic exchange of gene regions corresponding to the CPS-recognizing/degrading parts of different TSPs between morphologically and taxonomically distant groups of capsule-specific Acinetobacter phages.
Collapse
Affiliation(s)
- Peter V. Evseev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia;
- State Research Center for Applied Microbiology and Biotechnology, City District Serpukhov, Moscow Region, 142279 Obolensk, Russia; (A.S.S.); (Y.P.S.)
- Pirogov Russian National Research Medical University, 117997 Moscow, Russia
| | - Anastasia S. Sukhova
- State Research Center for Applied Microbiology and Biotechnology, City District Serpukhov, Moscow Region, 142279 Obolensk, Russia; (A.S.S.); (Y.P.S.)
| | - Nikolay A. Tkachenko
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 117997 Moscow, Russia;
| | - Yuriy P. Skryabin
- State Research Center for Applied Microbiology and Biotechnology, City District Serpukhov, Moscow Region, 142279 Obolensk, Russia; (A.S.S.); (Y.P.S.)
| | - Anastasia V. Popova
- State Research Center for Applied Microbiology and Biotechnology, City District Serpukhov, Moscow Region, 142279 Obolensk, Russia; (A.S.S.); (Y.P.S.)
| |
Collapse
|
19
|
Gündüz HA, Mreches R, Moosbauer J, Robertson G, To XY, Franzosa EA, Huttenhower C, Rezaei M, McHardy AC, Bischl B, Münch PC, Binder M. Optimized model architectures for deep learning on genomic data. Commun Biol 2024; 7:516. [PMID: 38693292 PMCID: PMC11063068 DOI: 10.1038/s42003-024-06161-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 04/08/2024] [Indexed: 05/03/2024] Open
Abstract
The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.
Collapse
Affiliation(s)
- Hüseyin Anil Gündüz
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - René Mreches
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Julia Moosbauer
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Gary Robertson
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Xiao-Yin To
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Eric A Franzosa
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Mina Rezaei
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Alice C McHardy
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- German Centre for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany
| | - Bernd Bischl
- Department of Statistics, LMU Munich, Munich, Germany
- Munich Center for Machine Learning, Munich, Germany
| | - Philipp C Münch
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany.
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.
- German Centre for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany.
| | - Martin Binder
- Department of Statistics, LMU Munich, Munich, Germany.
- Munich Center for Machine Learning, Munich, Germany.
| |
Collapse
|
20
|
Wang C, Gu C, Lv Y, Liu H, Wang Y, Zuo Y, Jiang G, Liu L, Liu J. AlphaFold2 assists in providing novel mechanistic insights into the interactions among the LUBAC subunits. Acta Biochim Biophys Sin (Shanghai) 2024; 56:1034-1043. [PMID: 38655618 PMCID: PMC11322871 DOI: 10.3724/abbs.2024047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 01/31/2024] [Indexed: 04/26/2024] Open
Abstract
The linear ubiquitin chain assembly complex (LUBAC) is the only known E3 ligase complex in which the ubiquitin-like (UBL) domains of SHARPIN and HOIL-1L interact with HOIP to determine the structural stability of LUBAC. The interactions between subunits within LUBAC have been a topic of extensive research. However, the impact of the LTM motif on the interaction between the UBL domains of SHARPIN and HOIL-1L with HOIP remains unclear. Here, we discover that the absence of the LTM motif in the AlphaFold2-predicted LUBAC structure alters the HOIP-UBA structure. We employ GeoPPI to calculate the changes in binding free energy (ΔG) caused by single-point mutations between subunits, simulating their protein-protein interactions. The results reveal that the presence of the LTM motif decreases the interaction between the UBL domains of SHARPIN and HOIL-1L with HOIP, leading to a decrease in the structural stability of LUBAC. Furthermore, using the AlphaFold2-predicted results, we find that HOIP (629‒695) and HOIP-UBA bind to both sides of HOIL-1L-UBL, respectively. The experiments of Gromacs molecular dynamics simulations, SPR and ITC demonstrate that the elongated domain formed by HOIP (629‒695) and HOIP-UBA, hereafter referred to as the HOIP (466‒695) structure, interacts with HOIL-1L-UBL to form a structurally stable complex. These findings illustrate the collaborative interaction between HOIP-UBA and HOIP (629‒695) with HOIL-1L-UBL, which influences the structural stability of LUBAC.
Collapse
Affiliation(s)
- Chenchen Wang
- College of Veterinary MedicineNortheast Agricultural UniversityHarbin150030China
| | - Chunying Gu
- Department of Medical Laboratory Science and TechnologyHarbin Medical University-DaqingDaqing163319China
| | - Ying Lv
- College of Life SciencesNortheast Agricultural UniversityHarbin150030China
| | - Hongyu Liu
- Preventive and Control Center for Animal Disease of Heilongjiang ProvinceHarbin150069China
| | - Yanan Wang
- College of Basic Medical SciencesHarbin Medical University-DaqingDaqing163319China
| | - Yongmei Zuo
- Heilongjiang Institute of Animal Health InspectionHarbin150006China
| | - Guangyu Jiang
- College of Basic Medical SciencesHarbin Medical University-DaqingDaqing163319China
| | - Lili Liu
- College of Basic Medical SciencesHarbin Medical University-DaqingDaqing163319China
| | - Jiafu Liu
- College of Basic Medical SciencesHarbin Medical University-DaqingDaqing163319China
| |
Collapse
|
21
|
Grassmann G, Miotto M, Desantis F, Di Rienzo L, Tartaglia GG, Pastore A, Ruocco G, Monti M, Milanetti E. Computational Approaches to Predict Protein-Protein Interactions in Crowded Cellular Environments. Chem Rev 2024; 124:3932-3977. [PMID: 38535831 PMCID: PMC11009965 DOI: 10.1021/acs.chemrev.3c00550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/20/2024] [Accepted: 02/21/2024] [Indexed: 04/11/2024]
Abstract
Investigating protein-protein interactions is crucial for understanding cellular biological processes because proteins often function within molecular complexes rather than in isolation. While experimental and computational methods have provided valuable insights into these interactions, they often overlook a critical factor: the crowded cellular environment. This environment significantly impacts protein behavior, including structural stability, diffusion, and ultimately the nature of binding. In this review, we discuss theoretical and computational approaches that allow the modeling of biological systems to guide and complement experiments and can thus significantly advance the investigation, and possibly the predictions, of protein-protein interactions in the crowded environment of cell cytoplasm. We explore topics such as statistical mechanics for lattice simulations, hydrodynamic interactions, diffusion processes in high-viscosity environments, and several methods based on molecular dynamics simulations. By synergistically leveraging methods from biophysics and computational biology, we review the state of the art of computational methods to study the impact of molecular crowding on protein-protein interactions and discuss its potential revolutionizing effects on the characterization of the human interactome.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Fausta Desantis
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- The
Open University Affiliated Research Centre at Istituto Italiano di
Tecnologia, Genoa 16163, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
| | - Gian Gaetano Tartaglia
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
- Center
for Human Technologies, Genoa 16152, Italy
| | - Annalisa Pastore
- Experiment
Division, European Synchrotron Radiation
Facility, Grenoble 38043, France
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| | - Michele Monti
- RNA
System Biology Lab, Department of Neuroscience and Brain Technologies, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Rome 00161, Italy
- Department
of Physics, Sapienza University, Rome 00185, Italy
| |
Collapse
|
22
|
Chidambara Thanu V, Jabeen A, Ranganathan S. iBio-GATS-A Semi-Automated Workflow for Structural Modelling of Insect Odorant Receptors. Int J Mol Sci 2024; 25:3055. [PMID: 38474300 DOI: 10.3390/ijms25053055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/26/2024] [Accepted: 03/04/2024] [Indexed: 03/14/2024] Open
Abstract
Insects utilize seven transmembrane (7TM) odorant receptor (iOR) proteins, with an inverted topology compared to G-protein coupled receptors (GPCRs), to detect chemical cues in the environment. For pest biocontrol, chemical attractants are used to trap insect pests. However, with the influx of invasive insect pests, novel odorants are urgently needed, specifically designed to match 3D iOR structures. Experimental structural determination of these membrane receptors remains challenging and only four experimental iOR structures from two evolutionarily distant organisms have been solved. Template-based modelling (TBM) is a complementary approach, to generate model structures, selecting templates based on sequence identity. As the iOR family is highly divergent, a different template selection approach than sequence identity is needed. Bio-GATS template selection for GPCRs, based on hydrophobicity correspondence, has been morphed into iBio-GATS, for template selection from available experimental iOR structures. This easy-to-use semi-automated workflow has been extended to generate high-quality models from any iOR sequence from the selected template, using Python and shell scripting. This workflow was successfully validated on Apocrypta bakeri Orco and Machilis hrabei OR5 structures. iBio-GATS models generated for the fruit fly iOR, OR59b and Orco, yielded functional ligand binding results concordant with experimental mutagenesis findings, compared to AlphaFold2 models.
Collapse
Affiliation(s)
| | - Amara Jabeen
- Applied Biosciences, Macquarie University, Sydney 2109, Australia
| | | |
Collapse
|
23
|
Tarakanov RI, Evseev PV, Vo HTN, Troshin KS, Gutnik DI, Ignatov AN, Toshchakov SV, Miroshnikov KA, Jafarov IH, Dzhalilov FSU. Xanthomonas Phage PBR31: Classifying the Unclassifiable. Viruses 2024; 16:406. [PMID: 38543771 PMCID: PMC10975493 DOI: 10.3390/v16030406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 02/24/2024] [Accepted: 03/04/2024] [Indexed: 05/23/2024] Open
Abstract
The ability of bacteriophages to destroy bacteria has made them the subject of extensive research. Interest in bacteriophages has recently increased due to the spread of drug-resistant bacteria, although genomic research has not kept pace with the growth of genomic data. Genomic analysis and, especially, the taxonomic description of bacteriophages are often difficult due to the peculiarities of the evolution of bacteriophages, which often includes the horizontal transfer of genes and genomic modules. The latter is particularly pronounced for temperate bacteriophages, which are capable of integration into the bacterial chromosome. Xanthomonas phage PBR31 is a temperate bacteriophage, which has been neither described nor classified previously, that infects the plant pathogen Xanthomonas campestris pv. campestris. Genomic analysis, including phylogenetic studies, indicated the separation of phage PBR31 from known classified bacteriophages, as well as its distant relationship with other temperate bacteriophages, including the Lederbervirus group. Bioinformatic analysis of proteins revealed distinctive features of PBR31, including the presence of a protein similar to the small subunit of D-family DNA polymerase and advanced lysis machinery. Taxonomic analysis showed the possibility of assigning phage PBR31 to a new taxon, although the complete taxonomic description of Xanthomonas phage PBR31 and other related bacteriophages is complicated by the complex evolutionary history of the formation of its genome. The general biological features of the PBR31 phage were analysed for the first time. Due to its presumably temperate lifestyle, there is doubt as to whether the PBR31 phage is appropriate for phage control purposes. Bioinformatics analysis, however, revealed the presence of cell wall-degrading enzymes that can be utilised for the treatment of bacterial infections.
Collapse
Affiliation(s)
- Rashit I. Tarakanov
- Department of Plant Protection, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (R.I.T.); (K.S.T.)
| | - Peter V. Evseev
- Department of Plant Protection, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (R.I.T.); (K.S.T.)
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str., 16/10, 117997 Moscow, Russia
- Laboratory of Molecular Microbiology, Pirogov Russian National Research Medical University, Ostrovityanova 1, 117997 Moscow, Russia
| | - Ha T. N. Vo
- Faculty of Agronomy, Nong Lam University, Quarter 6, Thu Duc District, Ho Chi Minh City 721400, Vietnam
| | - Konstantin S. Troshin
- Department of Plant Protection, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (R.I.T.); (K.S.T.)
| | - Daria I. Gutnik
- Limnological Institute, Siberian Branch of the Russian Academy of Sciences, 664033 Irkutsk, Russia;
| | - Aleksandr N. Ignatov
- Agrobiotechnology Department, Agrarian and Technological Institute, RUDN University, Miklukho-Maklaya Str. 6, 117198 Moscow, Russia;
| | - Stepan V. Toshchakov
- Center for Genome Research, National Research Center “Kurchatov Institute”, Kurchatov Sq., 1, 123098 Moscow, Russia
| | - Konstantin A. Miroshnikov
- Department of Plant Protection, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (R.I.T.); (K.S.T.)
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str., 16/10, 117997 Moscow, Russia
| | - Ibrahim H. Jafarov
- Azerbaijan Scientific Research Institute for Plant Protection and Industrial Crops, AZ 4200 Ganja, Azerbaijan
| | - Fevzi S.-U. Dzhalilov
- Department of Plant Protection, Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (R.I.T.); (K.S.T.)
| |
Collapse
|
24
|
Messeri L, Crockett MJ. Artificial intelligence and illusions of understanding in scientific research. Nature 2024; 627:49-58. [PMID: 38448693 DOI: 10.1038/s41586-024-07146-0] [Citation(s) in RCA: 49] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/31/2024] [Indexed: 03/08/2024]
Abstract
Scientists are enthusiastically imagining ways in which artificial intelligence (AI) tools might improve research. Why are AI tools so attractive and what are the risks of implementing them across the research pipeline? Here we develop a taxonomy of scientists' visions for AI, observing that their appeal comes from promises to improve productivity and objectivity by overcoming human shortcomings. But proposed AI solutions can also exploit our cognitive limitations, making us vulnerable to illusions of understanding in which we believe we understand more about the world than we actually do. Such illusions obscure the scientific community's ability to see the formation of scientific monocultures, in which some types of methods, questions and viewpoints come to dominate alternative approaches, making science less innovative and more vulnerable to errors. The proliferation of AI tools in science risks introducing a phase of scientific enquiry in which we produce more but understand less. By analysing the appeal of these tools, we provide a framework for advancing discussions of responsible knowledge production in the age of AI.
Collapse
Affiliation(s)
- Lisa Messeri
- Department of Anthropology, Yale University, New Haven, CT, USA.
| | - M J Crockett
- Department of Psychology, Princeton University, Princeton, NJ, USA.
- University Center for Human Values, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
25
|
Ohdate K, Sakata M, Maeda K, Sakamaki Y, Nimura-Matsune K, Ohbayashi R, Hess WR, Watanabe S. Discovery of novel replication proteins for large plasmids in cyanobacteria and their potential applications in genetic engineering. Front Microbiol 2024; 15:1311290. [PMID: 38419637 PMCID: PMC10899382 DOI: 10.3389/fmicb.2024.1311290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 01/31/2024] [Indexed: 03/02/2024] Open
Abstract
Numerous cyanobacteria capable of oxygenic photosynthesis possess multiple large plasmids exceeding 100 kbp in size. These plasmids are believed to have distinct replication and distribution mechanisms, as they coexist within cells without causing incompatibilities between plasmids. However, information on plasmid replication proteins (Rep) in cyanobacteria is limited. Synechocystis sp. PCC 6803 hosts four large plasmids, pSYSM, pSYSX, pSYSA, and pSYSG, but Rep proteins for these plasmids, except for CyRepA1 on pSYSA, are unknown. Using Autonomous Replication sequencing (AR-seq), we identified two potential Rep genes in Synechocystis 6803, slr6031 and slr6090, both located on pSYSX. The corresponding Rep candidates, Slr6031 and Slr6090, share structural similarities with Rep-associated proteins of other bacteria and homologs were also identified in various cyanobacteria. We observed autonomous replication activity for Slr6031 and Slr6090 in Synechococcus elongatus PCC 7942 by fusing their genes with a construct expressing GFP and introducing them via transformation. The slr6031/slr6090-containing plasmids exhibited lower copy numbers and instability in Synechococcus 7942 cells compared to the expression vector pYS. While recombination occurred in the case of slr6090, the engineered plasmid with slr6031 coexisted with plasmids encoding CyRepA1 or Slr6090 in Synechococcus 7942 cells, indicating the compatibility of Slr6031 and Slr6090 with CyRepA1. Based on these results, we designated Slr6031 and Slr6090 as CyRepX1 (Cyanobacterial Rep-related protein encoded on pSYSX) and CyRepX2, respectively, demonstrating that pSYSX is a plasmid with "two Reps in one plasmid." Furthermore, we determined the copy number and stability of plasmids with cyanobacterial Reps in Synechococcus 7942 and Synechocystis 6803 to elucidate their potential applications. The novel properties of CyRepX1 and 2, as revealed by this study, hold promise for the development of innovative genetic engineering tools in cyanobacteria.
Collapse
Affiliation(s)
- Kazuma Ohdate
- Department of Bioscience, Faculty of Life Science, Tokyo University of Agriculture, Tokyo, Japan
| | - Minori Sakata
- Department of Bioscience, Faculty of Life Science, Tokyo University of Agriculture, Tokyo, Japan
| | - Kaisei Maeda
- Laboratory for Chemistry and Life Science, Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan
| | - Yutaka Sakamaki
- Department of Bioscience, Faculty of Life Science, Tokyo University of Agriculture, Tokyo, Japan
| | - Kaori Nimura-Matsune
- Department of Bioscience, Faculty of Life Science, Tokyo University of Agriculture, Tokyo, Japan
| | - Ryudo Ohbayashi
- Department of Biological Science, Faculty of Science, Shizuoka University, Shizuoka, Japan
| | - Wolfgang R. Hess
- Genetics and Experimental Bioinformatics Group, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Satoru Watanabe
- Department of Bioscience, Faculty of Life Science, Tokyo University of Agriculture, Tokyo, Japan
| |
Collapse
|
26
|
Evseev PV, Tarakanov RI, Vo HTN, Suzina NE, Vasilyeva AA, Ignatov AN, Miroshnikov KA, Dzhalilov FSU. Characterisation of New Foxunavirus Phage Murka with the Potential of Xanthomonas campestris pv. campestris Control. Viruses 2024; 16:198. [PMID: 38399973 PMCID: PMC10892653 DOI: 10.3390/v16020198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 01/25/2024] [Accepted: 01/25/2024] [Indexed: 02/25/2024] Open
Abstract
Phages of phytopathogenic bacteria are considered to be promising agents for the biological control of bacterial diseases in plants. This paper reports on the isolation and characterisation of a new Xanthomonas campestris pv. campestris phage, Murka. Phage morphology and basic kinetic characteristics of the infection were determined, and a phylogenomic analysis was performed. The phage was able to lyse a reasonably broad range (64%, 9 of the 14 of the Xanthomonas campestris pv. campestris strains used in the study) of circulating strains of the cabbage black rot pathogen. This lytic myovirus has a DNA genome of 44,044 bp and contains 83 predicted genes. Taxonomically, it belongs to the genus Foxunavirus. This bacteriophage is promising for use as a possible means of biological control of cabbage black rot.
Collapse
Affiliation(s)
- Peter V. Evseev
- Department of Plant Protection, Russian State Agrarian University—Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (P.V.E.); (A.A.V.); (A.N.I.); (K.A.M.); (F.S.-U.D.)
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str. 16/10, 117997 Moscow, Russia
- Laboratory of Molecular Microbiology, Pirogov Russian National Research Medical University, Ostrovityanova 1, 117997 Moscow, Russia
| | - Rashit I. Tarakanov
- Department of Plant Protection, Russian State Agrarian University—Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (P.V.E.); (A.A.V.); (A.N.I.); (K.A.M.); (F.S.-U.D.)
| | - Ha T. N. Vo
- Faculty of Agronomy, Nong Lam University, Quarter 6, Thu Duc District, Ho Chi Minh City 721400, Vietnam;
| | - Natalia E. Suzina
- Skryabin Institute of Biochemistry and Physiology of Microorganisms, Federal Research Center “Pushchino Center for Biological Research of the Russian Academy of Sciences”, Prosp. Nauki, 5, 142290 Pushchino, Russia;
| | - Anna A. Vasilyeva
- Department of Plant Protection, Russian State Agrarian University—Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (P.V.E.); (A.A.V.); (A.N.I.); (K.A.M.); (F.S.-U.D.)
| | - Alexander N. Ignatov
- Department of Plant Protection, Russian State Agrarian University—Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (P.V.E.); (A.A.V.); (A.N.I.); (K.A.M.); (F.S.-U.D.)
- Agrobiotechnology Department, Agrarian and Technological Institute, RUDN University, Miklukho-Maklaya Str., 6, 117198 Moscow, Russia
| | - Konstantin A. Miroshnikov
- Department of Plant Protection, Russian State Agrarian University—Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (P.V.E.); (A.A.V.); (A.N.I.); (K.A.M.); (F.S.-U.D.)
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str. 16/10, 117997 Moscow, Russia
| | - Fevzi S.-U. Dzhalilov
- Department of Plant Protection, Russian State Agrarian University—Moscow Timiryazev Agricultural Academy, Timiryazevskaya Str. 49, 127434 Moscow, Russia; (P.V.E.); (A.A.V.); (A.N.I.); (K.A.M.); (F.S.-U.D.)
| |
Collapse
|
27
|
Erten M. MehNet: a vigesimal-based model by amino acid melting points generates unique ID numbers for protein sequences. J Biomol Struct Dyn 2024:1-7. [PMID: 38230442 DOI: 10.1080/07391102.2024.2302937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/02/2024] [Indexed: 01/18/2024]
Abstract
The amino acid encoding plays a pivotal role in machine learning-based methods for predicting protein structure and function, as well as in protein mapping techniques. Additionally, the classification of protein sequences presents its own challenges. The current study aims to assign a constant value to each amino acid, thereby creating distinctions among protein sequences. The datasets used in this study were obtained from the UniProt Knowledgebase. Subsequently, these datasets underwent preprocessing steps, and identical sequences were categorized under the same headings. Each amino acid was ranked based on its respective melting point and was assigned a vigesimal digit. These generated vigesimal digits were subsequently converted to decimal values. The centerpiece of this methodology was the melting point hashing table, which was given the name 'MehNet'. Ultimately, each protein sequence was assigned a unique identification number. This approach successfully digitized protein sequences. Notably, experiments involving randomly distributed vigesimal digits for amino acids did not yield results as promising as those achieved with MehNet. The model's classification phase, which utilizes a k-nearest neighbors (kNN) classifier, demonstrates exceptional performance in miscellaneous viral sequences. It achieves high accuracy rates, with an overall accuracy of 99.75%. Notably, it achieves an outstanding accuracy of 99.92% for the Influenza C class, highlighting its ability to distinguish closely related viral sequences.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Mehmet Erten
- Department of Medical Biochemistry, Fethi Sekin City Hospital, Elazığ, Turkey
| |
Collapse
|
28
|
Ge F, Arif M, Yan Z, Alahmadi H, Worachartcheewan A, Shoombuatong W. Review of Computational Methods and Database Sources for Predicting the Effects of Coding Frameshift Small Insertion and Deletion Variations. ACS OMEGA 2024; 9:2032-2047. [PMID: 38250421 PMCID: PMC10795160 DOI: 10.1021/acsomega.3c07662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/30/2023] [Accepted: 12/04/2023] [Indexed: 01/23/2024]
Abstract
Genetic variations (including substitutions, insertions, and deletions) exert a profound influence on DNA sequences. These variations are systematically classified as synonymous, nonsynonymous, and nonsense, each manifesting distinct effects on proteins. The implementation of high-throughput sequencing has significantly augmented our comprehension of the intricate interplay between gene variations and protein structure and function, as well as their ramifications in the context of diseases. Frameshift variations, particularly small insertions and deletions (indels), disrupt protein coding and are instrumental in disease pathogenesis. This review presents a succinct review of computational methods, databases, current challenges, and future directions in predicting the consequences of coding frameshift small indels variations. We analyzed the predictive efficacy, reliability, and utilization of computational methods and variant account, reliability, and utilization of database. Besides, we also compared the prediction methodologies on GOF/LOF pathogenic variation data. Addressing the challenges pertaining to prediction accuracy and cross-species generalizability, nascent technologies such as AI and deep learning harbor immense potential to enhance predictive capabilities. The importance of interdisciplinary research and collaboration cannot be overstated for devising effective diagnosis, treatment, and prevention strategies concerning diseases associated with coding frameshift indels variations.
Collapse
Affiliation(s)
- Fang Ge
- State
Key Laboratory of Organic Electronics and lnformation Displays &
lnstitute of Advanced Materials (IAM), Nanjing University of Posts
& Telecommunications, 9 Wenyuan Road, Nanjing 210023, China
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| | - Muhammad Arif
- College
of Science and Engineering, Hamad Bin Khalifa
University, Doha 34110, Qatar
| | - Zihao Yan
- School
of Computer Science and Engineering, Nanjing
University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Hanin Alahmadi
- College
of Computer Science and Engineering, Taibah
University, Madinah 344, Saudi Arabia
| | - Apilak Worachartcheewan
- Department
of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center
for Research Innovation and Biomedical Informatics, Faculty of Medical
Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
29
|
Peng CX, Liang F, Xia YH, Zhao KL, Hou MH, Zhang GJ. Recent Advances and Challenges in Protein Structure Prediction. J Chem Inf Model 2024; 64:76-95. [PMID: 38109487 DOI: 10.1021/acs.jcim.3c01324] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Fang Liang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
30
|
Beam K, Sharma P, Levy P, Beam AL. Artificial intelligence in the neonatal intensive care unit: the time is now. J Perinatol 2024; 44:131-135. [PMID: 37443271 DOI: 10.1038/s41372-023-01719-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 06/24/2023] [Accepted: 07/03/2023] [Indexed: 07/15/2023]
Abstract
Artificial intelligence (AI) has the potential to revolutionize the neonatal intensive care unit (NICU) care by leveraging the large-scale, high-dimensional data that are generated by NICU patients. There is an emerging recognition that the confluence of technological progress, commercialization pathways, and rich data sets provides a unique opportunity for AI to make a lasting impact on the NICU. In this perspective article, we discuss four broad categories of AI applications in the NICU: imaging interpretation, prediction modeling of electronic health record data, integration of real-time monitoring data, and documentation and billing. By enhancing decision-making, streamlining processes, and improving patient outcomes, AI holds the potential to transform the quality of care for vulnerable newborns, making the excitement surrounding AI advancements well-founded and the potential for significant positive change stronger than ever before.
Collapse
Affiliation(s)
- Kristyn Beam
- Department of Neonatology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Puneet Sharma
- Division of Newborn Medicine, Department of Pediatrics Boston Children's Hospital, Boston, MA, USA
| | - Phil Levy
- Division of Newborn Medicine, Department of Pediatrics Boston Children's Hospital, Boston, MA, USA
| | - Andrew L Beam
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
31
|
Zhou H, Skolnick J. FRAGSITE2: A structure and fragment-based approach for virtual ligand screening. Protein Sci 2024; 33:e4869. [PMID: 38100293 PMCID: PMC10751727 DOI: 10.1002/pro.4869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 12/06/2023] [Accepted: 12/09/2023] [Indexed: 12/17/2023]
Abstract
Protein function annotation and drug discovery often involve finding small molecule binders. In the early stages of drug discovery, virtual ligand screening (VLS) is frequently applied to identify possible hits before experimental testing. While our recent ligand homology modeling (LHM)-machine learning VLS method FRAGSITE outperformed approaches that combined traditional docking to generate protein-ligand poses and deep learning scoring functions to rank ligands, a more robust approach that could identify a more diverse set of binding ligands is needed. Here, we describe FRAGSITE2 that shows significant improvement on protein targets lacking known small molecule binders and no confident LHM identified template ligands when benchmarked on two commonly used VLS datasets: For both the DUD-E set and DEKOIS2.0 set and ligands having a Tanimoto coefficient (TC) < 0.7 to the template ligands, the 1% enrichment factor (EF1% ) of FRAGSITE2 is significantly better than those for FINDSITEcomb2.0 , an earlier LHM algorithm. For the DUD-E set, FRAGSITE2 also shows better ROC enrichment factor and AUPR (area under the precision-recall curve) than the deep learning DenseFS scoring function. Comparison with the RF-score-VS on the 76 target subset of DEKOIS2.0 and a TC < 0.99 to training DUD-E ligands, FRAGSITE2 has double the EF1% . Its boosted tree regression method provides for more robust performance than a deep learning multiple layer perceptron method. When compared with the pretrained language model for protein target features, FRAGSITE2 also shows much better performance. Thus, FRAGSITE2 is a promising approach that can discover novel hits for protein targets. FRAGSITE2's web service is freely available to academic users at http://sites.gatech.edu/cssb/FRAGSITE2.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| |
Collapse
|
32
|
YOUSEF M, ALLMER J. Deep learning in bioinformatics. Turk J Biol 2023; 47:366-382. [PMID: 38681776 PMCID: PMC11045206 DOI: 10.55730/1300-0152.2671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/28/2023] [Accepted: 12/18/2023] [Indexed: 05/01/2024] Open
Abstract
Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and then survey the recent advances and challenges of applying deep learning to various bioinformatics problems, such as genome sequencing, gene expression analysis, protein structure prediction, drug discovery, and disease diagnosis. We also discuss future directions and opportunities for deep learning in bioinformatics. We aim to provide an overview of deep learning so that bioinformaticians applying deep learning models can consider all critical technical and ethical aspects. Thus, our target audience is biomedical informatics researchers who use deep learning models for inference. This review will inspire more bioinformatics researchers to adopt deep-learning methods for their research questions while considering fairness, potential biases, explainability, and accountability.
Collapse
Affiliation(s)
- Malik YOUSEF
- Department of Information Systems, Zefat Academic College, Zefat,
Israel
| | - Jens ALLMER
- Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim an der Ruhr,
Germany
| |
Collapse
|
33
|
Lukianova AA, Shneider MM, Evseev PV, Egorov MV, Kasimova AA, Shpirt AM, Shashkov AS, Knirel YA, Kostryukova ES, Miroshnikov KA. Depolymerisation of the Klebsiella pneumoniae Capsular Polysaccharide K21 by Klebsiella Phage K5. Int J Mol Sci 2023; 24:17288. [PMID: 38139119 PMCID: PMC10743669 DOI: 10.3390/ijms242417288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/05/2023] [Accepted: 12/07/2023] [Indexed: 12/24/2023] Open
Abstract
Klebsiella pneumoniae is a pathogen associated with various infection types, which often exhibits multiple antibiotic resistance. Phages, or bacterial viruses, have an ability to specifically target and destroy K. pneumoniae, offering a potential means of combatting multidrug-resistant infections. Phage enzymes are another promising therapeutic agent that can break down bacterial capsular polysaccharide, which shields K. pneumoniae from the immune response and external factors. In this study, Klebsiella phage K5 was isolated; this phage is active against Klebsiella pneumoniae with the capsular type K21. It was demonstrated that the phage can effectively lyse the host culture. The adsorption apparatus of the phage has revealed two receptor-binding proteins (RBPs) with predicted polysaccharide depolymerising activity. A recombinant form of both RBPs was obtained and experiments showed that one of them depolymerised the capsular polysaccharide K21. The structure of this polysaccharide and its degradation fragments were analysed. The second receptor-binding protein showed no activity on capsular polysaccharide of any of the 31 capsule types tested, so the substrate for this enzyme remains to be determined in the future. Klebsiella phage K5 may be considered a useful agent against Klebsiella infections.
Collapse
Affiliation(s)
- Anna A. Lukianova
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str. 16/10, 117997 Moscow, Russia; (P.V.E.); (M.V.E.); (K.A.M.)
| | - Mikhail M. Shneider
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str. 16/10, 117997 Moscow, Russia; (P.V.E.); (M.V.E.); (K.A.M.)
| | - Peter V. Evseev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str. 16/10, 117997 Moscow, Russia; (P.V.E.); (M.V.E.); (K.A.M.)
| | - Mikhail V. Egorov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str. 16/10, 117997 Moscow, Russia; (P.V.E.); (M.V.E.); (K.A.M.)
| | - Anastasiya A. Kasimova
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky prosp. 47, 119991 Moscow, Russia; (A.A.K.); (A.M.S.); (A.S.S.); (Y.A.K.)
| | - Anna M. Shpirt
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky prosp. 47, 119991 Moscow, Russia; (A.A.K.); (A.M.S.); (A.S.S.); (Y.A.K.)
| | - Alexander S. Shashkov
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky prosp. 47, 119991 Moscow, Russia; (A.A.K.); (A.M.S.); (A.S.S.); (Y.A.K.)
| | - Yuriy A. Knirel
- Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky prosp. 47, 119991 Moscow, Russia; (A.A.K.); (A.M.S.); (A.S.S.); (Y.A.K.)
| | - Elena S. Kostryukova
- Lopukhin Federal Research and Clinical Center of Physical-Chemical Medicine, Federal Medical Biological Agency, Malaya Pirogovskaya Str. 1, 119435 Moscow, Russia;
| | - Konstantin A. Miroshnikov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Miklukho-Maklaya Str. 16/10, 117997 Moscow, Russia; (P.V.E.); (M.V.E.); (K.A.M.)
| |
Collapse
|
34
|
Lv Q, Zhou F, Liu X, Zhi L. Artificial intelligence in small molecule drug discovery from 2018 to 2023: Does it really work? Bioorg Chem 2023; 141:106894. [PMID: 37776682 DOI: 10.1016/j.bioorg.2023.106894] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/24/2023] [Accepted: 09/25/2023] [Indexed: 10/02/2023]
Abstract
Utilizing artificial intelligence (AI) in drug design represents an advanced approach for identifying targets and developing new drugs. Integrating AI techniques significantly reduces the workload involved in drug development and enhances the efficiency of early-stage drug discovery. This review aims to present a comprehensive overview of the utilization of AI methods in the field of small drug design, with a specific focus on four key areas: protein structure prediction, molecular virtual screening, molecular design, and absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction. Additionally, the role and limitations of AI in drug development are explored, and the impact of AI on decision-making processes is studied. It is important to note that while AI can bring numerous benefits to the early stage of drug development, the direction and quality of decision-making should still be emphasized, as AI should be considered as a tool rather than a decisive factor.
Collapse
Affiliation(s)
- Qi Lv
- School of Pharmacy, Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, PR China
| | - Feilong Zhou
- School of Pharmacy, Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, PR China
| | - Xinhua Liu
- School of Pharmacy, Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Hefei 230032, PR China.
| | - Liping Zhi
- School of Health Management, Anhui Medical University Hefei, 230032, PR China.
| |
Collapse
|
35
|
Roterman I, Stapor K, Konieczny L. Role of environmental specificity in CASP results. BMC Bioinformatics 2023; 24:425. [PMID: 37950210 PMCID: PMC10638730 DOI: 10.1186/s12859-023-05559-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 11/06/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Recently, significant progress has been made in the field of protein structure prediction by the application of artificial intelligence techniques, as shown by the results of the CASP13 and CASP14 (Critical Assessment of Structure Prediction) competition. However, the question of the mechanism behind the protein folding process itself remains unanswered. Correctly predicting the structure also does not solve the problem of, for example, amyloid proteins, where a polypeptide chain with an unaltered sequence adopts a different 3D structure. RESULTS This work was an attempt at explaining the structural variation by considering the contribution of the environment to protein structuring. The application of the fuzzy oil drop (FOD) model to assess the validity of the selected models provided in the CASP13, CASP14 and CASP15 projects reveals the need for an environmental factor to determine the 3D structure of proteins. Consideration of the external force field in the form of polar water (Fuzzy Oil Drop) and a version modified by the presence of the hydrophobic compounds, FOD-M (FOD-Modified) reveals that the protein folding process is environmentally dependent. An analysis of selected models from the CASP competitions indicates the need for structure prediction as dependent on the consideration of the protein folding environment. CONCLUSIONS The conditions governed by the environment direct the protein folding process occurring in a certain environment. Therefore, the variation of the external force field should be taken into account in the models used in protein structure prediction.
Collapse
Affiliation(s)
- Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University - Medical College, Medyczna 7, 30-688, Krakow, Poland.
| | - Katarzyna Stapor
- Faculty of Automatic, Electronics and Computer Science, Department of Applied, Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
| | - Leszek Konieczny
- Jagiellonian University - Medical College, Kopernika 7, 31-034, Krakow, Poland
| |
Collapse
|
36
|
Elizalde MJ, Gorelick DA. Mechanistic toxicology in light of genetic compensation. Toxicol Sci 2023; 197:kfad113. [PMID: 37941503 PMCID: PMC10823772 DOI: 10.1093/toxsci/kfad113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023] Open
Abstract
Mechanistic toxicology seeks to identify the molecular and cellular mechanisms by which toxicants exert their deleterious effects. One powerful approach is to generate mutations in genes that respond to a particular toxicant, and then test how such mutations change the effects of the toxicant. CRISPR is a rapid and versatile approach to generate mutations in cultured cells and in animal models. Many studies use CRISPR to generate short insertions or deletions in a target gene and then assume that the resulting mutation, such as a premature termination codon, causes a loss of functional protein. However, recent studies demonstrate that this assumption is flawed. Cells can compensate for short insertion and deletion mutations, leading toxicologists to draw erroneous conclusions from mutant studies. In this review, we will discuss mechanisms by which a mutation in one gene may be rescued by compensatory activity. We will discuss how CRISPR insertion and deletion mutations are susceptible to compensation by transcriptional adaptation, alternative splicing, and rescue by maternally derived gene products. We will review evidence that measuring levels of messenger RNA transcribed from a mutated gene is an unreliable indicator of the severity of the mutation. Finally, we provide guidelines for using CRISPR to generate mutations that avoid compensation.
Collapse
Affiliation(s)
- Mary Jane Elizalde
- Department of Molecular & Cellular Biology, Center for Precision Environmental Health, Baylor College of Medicine, Houston, TX 77030, United States
| | - Daniel A Gorelick
- Department of Molecular & Cellular Biology, Center for Precision Environmental Health, Baylor College of Medicine, Houston, TX 77030, United States
| |
Collapse
|
37
|
Larrea-Sebal A, Jebari-Benslaiman S, Galicia-Garcia U, Jose-Urteaga AS, Uribe KB, Benito-Vicente A, Martín C. Predictive Modeling and Structure Analysis of Genetic Variants in Familial Hypercholesterolemia: Implications for Diagnosis and Protein Interaction Studies. Curr Atheroscler Rep 2023; 25:839-859. [PMID: 37847331 PMCID: PMC10618353 DOI: 10.1007/s11883-023-01154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 10/18/2023]
Abstract
PURPOSE OF REVIEW Familial hypercholesterolemia (FH) is a hereditary condition characterized by elevated levels of low-density lipoprotein cholesterol (LDL-C), which increases the risk of cardiovascular disease if left untreated. This review aims to discuss the role of bioinformatics tools in evaluating the pathogenicity of missense variants associated with FH. Specifically, it highlights the use of predictive models based on protein sequence, structure, evolutionary conservation, and other relevant features in identifying genetic variants within LDLR, APOB, and PCSK9 genes that contribute to FH. RECENT FINDINGS In recent years, various bioinformatics tools have emerged as valuable resources for analyzing missense variants in FH-related genes. Tools such as REVEL, Varity, and CADD use diverse computational approaches to predict the impact of genetic variants on protein function. These tools consider factors such as sequence conservation, structural alterations, and receptor binding to aid in interpreting the pathogenicity of identified missense variants. While these predictive models offer valuable insights, the accuracy of predictions can vary, especially for proteins with unique characteristics that might not be well represented in the databases used for training. This review emphasizes the significance of utilizing bioinformatics tools for assessing the pathogenicity of FH-associated missense variants. Despite their contributions, a definitive diagnosis of a genetic variant necessitates functional validation through in vitro characterization or cascade screening. This step ensures the precise identification of FH-related variants, leading to more accurate diagnoses. Integrating genetic data with reliable bioinformatics predictions and functional validation can enhance our understanding of the genetic basis of FH, enabling improved diagnosis, risk stratification, and personalized treatment for affected individuals. The comprehensive approach outlined in this review promises to advance the management of this inherited disorder, potentially leading to better health outcomes for those affected by FH.
Collapse
Affiliation(s)
- Asier Larrea-Sebal
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
- Fundación Biofisika Bizkaia, 48940, Leioa, Spain
| | - Shifa Jebari-Benslaiman
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Unai Galicia-Garcia
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Ane San Jose-Urteaga
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Kepa B Uribe
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Asier Benito-Vicente
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - César Martín
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain.
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain.
| |
Collapse
|
38
|
Boulos I, Jabbour J, Khoury S, Mikhael N, Tishkova V, Candoni N, Ghadieh HE, Veesler S, Bassim Y, Azar S, Harb F. Exploring the World of Membrane Proteins: Techniques and Methods for Understanding Structure, Function, and Dynamics. Molecules 2023; 28:7176. [PMID: 37894653 PMCID: PMC10608922 DOI: 10.3390/molecules28207176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/13/2023] [Accepted: 10/04/2023] [Indexed: 10/29/2023] Open
Abstract
In eukaryotic cells, membrane proteins play a crucial role. They fall into three categories: intrinsic proteins, extrinsic proteins, and proteins that are essential to the human genome (30% of which is devoted to encoding them). Hydrophobic interactions inside the membrane serve to stabilize integral proteins, which span the lipid bilayer. This review investigates a number of computational and experimental methods used to study membrane proteins. It encompasses a variety of technologies, including electrophoresis, X-ray crystallography, cryogenic electron microscopy (cryo-EM), nuclear magnetic resonance spectroscopy (NMR), biophysical methods, computational methods, and artificial intelligence. The link between structure and function of membrane proteins has been better understood thanks to these approaches, which also hold great promise for future study in the field. The significance of fusing artificial intelligence with experimental data to improve our comprehension of membrane protein biology is also covered in this paper. This effort aims to shed light on the complexity of membrane protein biology by investigating a variety of experimental and computational methods. Overall, the goal of this review is to emphasize how crucial it is to understand the functions of membrane proteins in eukaryotic cells. It gives a general review of the numerous methods used to look into these crucial elements and highlights the demand for multidisciplinary approaches to advance our understanding.
Collapse
Affiliation(s)
- Imad Boulos
- Faculty of Medicine and Medical Sciences, University of Balamand, Tripoli P.O. Box 100, Lebanon; (I.B.); (J.J.); (S.K.); (N.M.); (H.E.G.); (Y.B.); (S.A.)
| | - Joy Jabbour
- Faculty of Medicine and Medical Sciences, University of Balamand, Tripoli P.O. Box 100, Lebanon; (I.B.); (J.J.); (S.K.); (N.M.); (H.E.G.); (Y.B.); (S.A.)
| | - Serena Khoury
- Faculty of Medicine and Medical Sciences, University of Balamand, Tripoli P.O. Box 100, Lebanon; (I.B.); (J.J.); (S.K.); (N.M.); (H.E.G.); (Y.B.); (S.A.)
| | - Nehme Mikhael
- Faculty of Medicine and Medical Sciences, University of Balamand, Tripoli P.O. Box 100, Lebanon; (I.B.); (J.J.); (S.K.); (N.M.); (H.E.G.); (Y.B.); (S.A.)
| | - Victoria Tishkova
- CNRS, CINaM (Centre Interdisciplinaire de Nanosciences de Marseille), Campus de Luminy, Case 913, Aix-Marseille University, CEDEX 09, F-13288 Marseille, France; (V.T.); (N.C.); (S.V.)
| | - Nadine Candoni
- CNRS, CINaM (Centre Interdisciplinaire de Nanosciences de Marseille), Campus de Luminy, Case 913, Aix-Marseille University, CEDEX 09, F-13288 Marseille, France; (V.T.); (N.C.); (S.V.)
| | - Hilda E. Ghadieh
- Faculty of Medicine and Medical Sciences, University of Balamand, Tripoli P.O. Box 100, Lebanon; (I.B.); (J.J.); (S.K.); (N.M.); (H.E.G.); (Y.B.); (S.A.)
| | - Stéphane Veesler
- CNRS, CINaM (Centre Interdisciplinaire de Nanosciences de Marseille), Campus de Luminy, Case 913, Aix-Marseille University, CEDEX 09, F-13288 Marseille, France; (V.T.); (N.C.); (S.V.)
| | - Youssef Bassim
- Faculty of Medicine and Medical Sciences, University of Balamand, Tripoli P.O. Box 100, Lebanon; (I.B.); (J.J.); (S.K.); (N.M.); (H.E.G.); (Y.B.); (S.A.)
| | - Sami Azar
- Faculty of Medicine and Medical Sciences, University of Balamand, Tripoli P.O. Box 100, Lebanon; (I.B.); (J.J.); (S.K.); (N.M.); (H.E.G.); (Y.B.); (S.A.)
| | - Frédéric Harb
- Faculty of Medicine and Medical Sciences, University of Balamand, Tripoli P.O. Box 100, Lebanon; (I.B.); (J.J.); (S.K.); (N.M.); (H.E.G.); (Y.B.); (S.A.)
| |
Collapse
|
39
|
Schneider B, Sweeney BA, Bateman A, Cerny J, Zok T, Szachniuk M. When will RNA get its AlphaFold moment? Nucleic Acids Res 2023; 51:9522-9532. [PMID: 37702120 PMCID: PMC10570031 DOI: 10.1093/nar/gkad726] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/13/2023] [Accepted: 08/22/2023] [Indexed: 09/14/2023] Open
Abstract
The protein structure prediction problem has been solved for many types of proteins by AlphaFold. Recently, there has been considerable excitement to build off the success of AlphaFold and predict the 3D structures of RNAs. RNA prediction methods use a variety of techniques, from physics-based to machine learning approaches. We believe that there are challenges preventing the successful development of deep learning-based methods like AlphaFold for RNA in the short term. Broadly speaking, the challenges are the limited number of structures and alignments making data-hungry deep learning methods unlikely to succeed. Additionally, there are several issues with the existing structure and sequence data, as they are often of insufficient quality, highly biased and missing key information. Here, we discuss these challenges in detail and suggest some steps to remedy the situation. We believe that it is possible to create an accurate RNA structure prediction method, but it will require solving several data quality and volume issues, usage of data beyond simple sequence alignments, or the development of new less data-hungry machine learning methods.
Collapse
Affiliation(s)
- Bohdan Schneider
- Institute of Biotechnology of the Czech Academy of Sciences, Prumyslova 595, CZ-252 50 Vestec, Czech Republic
| | - Blake Alexander Sweeney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Jiri Cerny
- Institute of Biotechnology of the Czech Academy of Sciences, Prumyslova 595, CZ-252 50 Vestec, Czech Republic
| | - Tomasz Zok
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
| | - Marta Szachniuk
- Institute of Computing Science and European Centre for Bioinformatics and Genomics, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| |
Collapse
|
40
|
Broz M, Jukič M, Bren U. Naive Prediction of Protein Backbone Phi and Psi Dihedral Angles Using Deep Learning. Molecules 2023; 28:7046. [PMID: 37894526 PMCID: PMC10609058 DOI: 10.3390/molecules28207046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/06/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023] Open
Abstract
Protein structure prediction represents a significant challenge in the field of bioinformatics, with the prediction of protein structures using backbone dihedral angles recently achieving significant progress due to the rise of deep neural network research. However, there is a trend in protein structure prediction research to employ increasingly complex neural networks and contributions from multiple models. This study, on the other hand, explores how a single model transparently behaves using sequence data only and what can be expected from the predicted angles. To this end, the current paper presents data acquisition, deep learning model definition, and training toward the final protein backbone angle prediction. The method applies a simple fully connected neural network (FCNN) model that takes only the primary structure of the protein with a sliding window of size 21 as input to predict protein backbone ϕ and ψ dihedral angles. Despite its simplicity, the model shows surprising accuracy for the ϕ angle prediction and somewhat lower accuracy for the ψ angle prediction. Moreover, this study demonstrates that protein secondary structure prediction is also possible with simple neural networks that take in only the protein amino-acid residue sequence, but more complex models are required for higher accuracies.
Collapse
Affiliation(s)
- Matic Broz
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
| | - Marko Jukič
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška ulica 8, SI-6000 Koper, Slovenia
- Institute of Environmental Protection and Sensors, Beloruska ulica 7, SI-2000 Maribor, Slovenia
| | - Urban Bren
- Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova ulica 17, SI-2000 Maribor, Slovenia
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška ulica 8, SI-6000 Koper, Slovenia
- Institute of Environmental Protection and Sensors, Beloruska ulica 7, SI-2000 Maribor, Slovenia
| |
Collapse
|
41
|
Stuart DD, Guzman-Perez A, Brooijmans N, Jackson EL, Kryukov GV, Friedman AA, Hoos A. Precision Oncology Comes of Age: Designing Best-in-Class Small Molecules by Integrating Two Decades of Advances in Chemistry, Target Biology, and Data Science. Cancer Discov 2023; 13:2131-2149. [PMID: 37712571 PMCID: PMC10551669 DOI: 10.1158/2159-8290.cd-23-0280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/27/2023] [Accepted: 07/28/2023] [Indexed: 09/16/2023]
Abstract
Small-molecule drugs have enabled the practice of precision oncology for genetically defined patient populations since the first approval of imatinib in 2001. Scientific and technology advances over this 20-year period have driven the evolution of cancer biology, medicinal chemistry, and data science. Collectively, these advances provide tools to more consistently design best-in-class small-molecule drugs against known, previously undruggable, and novel cancer targets. The integration of these tools and their customization in the hands of skilled drug hunters will be necessary to enable the discovery of transformational therapies for patients across a wider spectrum of cancers. SIGNIFICANCE Target-centric small-molecule drug discovery necessitates the consideration of multiple approaches to identify chemical matter that can be optimized into drug candidates. To do this successfully and consistently, drug hunters require a comprehensive toolbox to avoid following the "law of instrument" or Maslow's hammer concept where only one tool is applied regardless of the requirements of the task. Combining our ever-increasing understanding of cancer and cancer targets with the technological advances in drug discovery described below will accelerate the next generation of small-molecule drugs in oncology.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Axel Hoos
- Scorpion Therapeutics, Boston, Massachusetts
| |
Collapse
|
42
|
Bauer J, Rajagopal N, Gupta P, Gupta P, Nixon AE, Kumar S. How can we discover developable antibody-based biotherapeutics? Front Mol Biosci 2023; 10:1221626. [PMID: 37609373 PMCID: PMC10441133 DOI: 10.3389/fmolb.2023.1221626] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 07/10/2023] [Indexed: 08/24/2023] Open
Abstract
Antibody-based biotherapeutics have emerged as a successful class of pharmaceuticals despite significant challenges and risks to their discovery and development. This review discusses the most frequently encountered hurdles in the research and development (R&D) of antibody-based biotherapeutics and proposes a conceptual framework called biopharmaceutical informatics. Our vision advocates for the syncretic use of computation and experimentation at every stage of biologic drug discovery, considering developability (manufacturability, safety, efficacy, and pharmacology) of potential drug candidates from the earliest stages of the drug discovery phase. The computational advances in recent years allow for more precise formulation of disease concepts, rapid identification, and validation of targets suitable for therapeutic intervention and discovery of potential biotherapeutics that can agonize or antagonize them. Furthermore, computational methods for de novo and epitope-specific antibody design are increasingly being developed, opening novel computationally driven opportunities for biologic drug discovery. Here, we review the opportunities and limitations of emerging computational approaches for optimizing antigens to generate robust immune responses, in silico generation of antibody sequences, discovery of potential antibody binders through virtual screening, assessment of hits, identification of lead drug candidates and their affinity maturation, and optimization for developability. The adoption of biopharmaceutical informatics across all aspects of drug discovery and development cycles should help bring affordable and effective biotherapeutics to patients more quickly.
Collapse
Affiliation(s)
- Joschka Bauer
- Early Stage Pharmaceutical Development Biologicals, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach/Riss, Germany
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
| | - Nandhini Rajagopal
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Priyanka Gupta
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Pankaj Gupta
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Andrew E. Nixon
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| | - Sandeep Kumar
- In Silico Team, Boehringer Ingelheim, Hannover, Germany
- Biotherapeutics Discovery, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, United States
| |
Collapse
|
43
|
Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods 2023; 20:1203-1212. [PMID: 37500759 DOI: 10.1038/s41592-023-01940-w] [Citation(s) in RCA: 307] [Impact Index Per Article: 153.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 06/14/2023] [Indexed: 07/29/2023]
Abstract
Advances in sequencing technologies and bioinformatics tools have dramatically increased the recovery rate of microbial genomes from metagenomic data. Assessing the quality of metagenome-assembled genomes (MAGs) is a critical step before downstream analysis. Here, we present CheckM2, an improved method of predicting genome quality of MAGs using machine learning. Using synthetic and experimental data, we demonstrate that CheckM2 outperforms existing tools in both accuracy and computational speed. In addition, CheckM2's database can be rapidly updated with new high-quality reference genomes, including taxa represented only by a single genome. We also show that CheckM2 accurately predicts genome quality for MAGs from novel lineages, even for those with reduced genome size (for example, Patescibacteria and the DPANN superphylum). CheckM2 provides accurate genome quality predictions across bacterial and archaeal lineages, giving increased confidence when inferring biological conclusions from MAGs.
Collapse
Affiliation(s)
- Alex Chklovski
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Donovan H Parks
- Donovan Parks, Bioinformatic Consultant, Castlegar, British Columbia, Canada
| | - Ben J Woodcroft
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
| | - Gene W Tyson
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia.
| |
Collapse
|
44
|
Zhang XE, Liu C, Dai J, Yuan Y, Gao C, Feng Y, Wu B, Wei P, You C, Wang X, Si T. Enabling technology and core theory of synthetic biology. SCIENCE CHINA. LIFE SCIENCES 2023; 66:1742-1785. [PMID: 36753021 PMCID: PMC9907219 DOI: 10.1007/s11427-022-2214-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/04/2022] [Indexed: 02/09/2023]
Abstract
Synthetic biology provides a new paradigm for life science research ("build to learn") and opens the future journey of biotechnology ("build to use"). Here, we discuss advances of various principles and technologies in the mainstream of the enabling technology of synthetic biology, including synthesis and assembly of a genome, DNA storage, gene editing, molecular evolution and de novo design of function proteins, cell and gene circuit engineering, cell-free synthetic biology, artificial intelligence (AI)-aided synthetic biology, as well as biofoundries. We also introduce the concept of quantitative synthetic biology, which is guiding synthetic biology towards increased accuracy and predictability or the real rational design. We conclude that synthetic biology will establish its disciplinary system with the iterative development of enabling technologies and the maturity of the core theory.
Collapse
Affiliation(s)
- Xian-En Zhang
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Chenli Liu
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Junbiao Dai
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Yingjin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Caixia Gao
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Bian Wu
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Ping Wei
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| | - Chun You
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing, 100084, China.
| | - Tong Si
- Faculty of Synthetic Biology, Shenzhen Institute of Advanced Technology, Shenzhen, 518055, China.
- Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
45
|
Liu Q, Feng NN, Chen LJ. Genetic analysis of a child with SATB2‑associated syndrome and literature study. Exp Ther Med 2023; 26:372. [PMID: 37415841 PMCID: PMC10320656 DOI: 10.3892/etm.2023.12071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 02/23/2023] [Indexed: 07/08/2023] Open
Abstract
The present study aimed to investigate clinical phenotype and genotype characteristics of a male child with SATB2-associated syndrome (SAS) and analyzed the relationship between these characteristics and the possible underlying genetic mechanism. His clinical phenotype was analyzed. Using a high-throughput sequencing platform, his DNA samples were subjected to medical exome sequencing, screened for suspected variant loci and analyzed for chromosomal copy number variations. The suspected pathogenic loci were verified by Sanger sequencing. He presented with phenotypic anomalies of delayed growth, delayed speech and mental development, facial dysmorphism showing the typical manifestation of SAS and motor retardation symptoms. Gene sequencing result analyses revealed a de novo heterozygous repeat insertion shift mutation in the SATB2 gene (NM_015265.3) c.771dupT (p.Met258Tyrfs*46), resulting in a frameshift mutation from methionine to tyrosine at the amino acid site 258 and a truncated protein with 46 amino acids missing. The parents showed no mutation at this locus. This mutation was identified as the nosogenesis of this syndrome in children. To the best of the authors' knowledge, this is the first report on this mutation. The clinical manifestations and gene variation characteristics of 39 previously reported SAS cases were analyzed together with this case. The findings of the present study suggested severely impaired language development, facial dysmorphism and varying degrees of delayed intellectual development as the characteristic clinical manifestations of SAS.
Collapse
Affiliation(s)
- Qian Liu
- Center for Reproductive Medicine, Center for Prenatal Genetics, First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| | - Nan-Nan Feng
- Center for Reproductive Medicine, Center for Prenatal Genetics, First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| | - Lin-Jiao Chen
- Center for Reproductive Medicine, Center for Prenatal Genetics, First Hospital of Jilin University, Changchun, Jilin 130021, P.R. China
| |
Collapse
|
46
|
Abdul-Khalek N, Wimmer R, Overgaard MT, Gregersen Echers S. Insight on physicochemical properties governing peptide MS1 response in HPLC-ESI-MS/MS: A deep learning approach. Comput Struct Biotechnol J 2023; 21:3715-3727. [PMID: 37560124 PMCID: PMC10407266 DOI: 10.1016/j.csbj.2023.07.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 08/11/2023] Open
Abstract
Accurate and absolute quantification of peptides in complex mixtures using quantitative mass spectrometry (MS)-based methods requires foreground knowledge and isotopically labeled standards, thereby increasing analytical expenses, time consumption, and labor, thus limiting the number of peptides that can be accurately quantified. This originates from differential ionization efficiency between peptides and thus, understanding the physicochemical properties that influence the ionization and response in MS analysis is essential for developing less restrictive label-free quantitative methods. Here, we used equimolar peptide pool repository data to develop a deep learning model capable of identifying amino acids influencing the MS1 response. By using an encoder-decoder with an attention mechanism and correlating attention weights with amino acid physicochemical properties, we obtain insight on properties governing the peptide-level MS1 response within the datasets. While the problem cannot be described by one single set of amino acids and properties, distinct patterns were reproducibly obtained. Properties are grouped in three main categories related to peptide hydrophobicity, charge, and structural propensities. Moreover, our model can predict MS1 intensity output under defined conditions based solely on peptide sequence input. Using a refined training dataset, the model predicted log-transformed peptide MS1 intensities with an average error of 9.7 ± 0.5% based on 5-fold cross validation, and outperformed random forest and ridge regression models on both log-transformed and real scale data. This work demonstrates how deep learning can facilitate identification of physicochemical properties influencing peptide MS1 responses, but also illustrates how sequence-based response prediction and label-free peptide-level quantification may impact future workflows within quantitative proteomics.
Collapse
Affiliation(s)
- Naim Abdul-Khalek
- Department of Chemistry and Bioscience, Aalborg University, Aalborg 9220, Denmark
| | - Reinhard Wimmer
- Department of Chemistry and Bioscience, Aalborg University, Aalborg 9220, Denmark
| | | | | |
Collapse
|
47
|
Gutnik D, Evseev P, Miroshnikov K, Shneider M. Using AlphaFold Predictions in Viral Research. Curr Issues Mol Biol 2023; 45:3705-3732. [PMID: 37185764 PMCID: PMC10136805 DOI: 10.3390/cimb45040240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 04/19/2023] [Accepted: 04/20/2023] [Indexed: 05/17/2023] Open
Abstract
Elucidation of the tertiary structure of proteins is an important task for biological and medical studies. AlphaFold, a modern deep-learning algorithm, enables the prediction of protein structure to a high level of accuracy. It has been applied in numerous studies in various areas of biology and medicine. Viruses are biological entities infecting eukaryotic and procaryotic organisms. They can pose a danger for humans and economically significant animals and plants, but they can also be useful for biological control, suppressing populations of pests and pathogens. AlphaFold can be used for studies of molecular mechanisms of viral infection to facilitate several activities, including drug design. Computational prediction and analysis of the structure of bacteriophage receptor-binding proteins can contribute to more efficient phage therapy. In addition, AlphaFold predictions can be used for the discovery of enzymes of bacteriophage origin that are able to degrade the cell wall of bacterial pathogens. The use of AlphaFold can assist fundamental viral research, including evolutionary studies. The ongoing development and improvement of AlphaFold can ensure that its contribution to the study of viral proteins will be significant in the future.
Collapse
Affiliation(s)
- Daria Gutnik
- Limnological Institute of the Siberian Branch of the Russian Academy of Sciences, 3 Ulan-Batorskaya Str., 664033 Irkutsk, Russia
| | - Peter Evseev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 16/10 Miklukho-Maklaya Str., GSP-7, 117997 Moscow, Russia
| | - Konstantin Miroshnikov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 16/10 Miklukho-Maklaya Str., GSP-7, 117997 Moscow, Russia
| | - Mikhail Shneider
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 16/10 Miklukho-Maklaya Str., GSP-7, 117997 Moscow, Russia
| |
Collapse
|
48
|
Durham J, Zhang J, Humphreys IR, Pei J, Cong Q. Recent advances in predicting and modeling protein-protein interactions. Trends Biochem Sci 2023; 48:527-538. [PMID: 37061423 DOI: 10.1016/j.tibs.2023.03.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/17/2023]
Abstract
Protein-protein interactions (PPIs) drive biological processes, and disruption of PPIs can cause disease. With recent breakthroughs in structure prediction and a deluge of genomic sequence data, computational methods to predict PPIs and model spatial structures of protein complexes are now approaching the accuracy of experimental approaches for permanent interactions and show promise for elucidating transient interactions. As we describe here, the key to this success is rich evolutionary information deciphered from thousands of homologous sequences that coevolve in interacting partners. This covariation signal, revealed by sophisticated statistical and machine learning (ML) algorithms, predicts physiological interactions. Accurate artificial intelligence (AI)-based modeling of protein structures promises to provide accurate 3D models of PPIs at a proteome-wide scale.
Collapse
Affiliation(s)
- Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA, USA; Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
49
|
Fierro F, Peri L, Hübner H, Tabor-Schkade A, Waterloo L, Löber S, Pfeiffer T, Weikert D, Dingjan T, Margulis E, Gmeiner P, Niv MY. Inhibiting a promiscuous GPCR: iterative discovery of bitter taste receptor ligands. Cell Mol Life Sci 2023; 80:114. [PMID: 37012410 PMCID: PMC11072104 DOI: 10.1007/s00018-023-04765-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 03/09/2023] [Accepted: 03/21/2023] [Indexed: 04/05/2023]
Abstract
The human GPCR family comprises circa 800 members, activated by hundreds of thousands of compounds. Bitter taste receptors, TAS2Rs, constitute a large and distinct subfamily, expressed orally and extra-orally and involved in physiological and pathological conditions. TAS2R14 is the most promiscuous member, with over 150 agonists and 3 antagonists known prior to this study. Due to the scarcity of inhibitors and to the importance of chemical probes for exploring TAS2R14 functions, we aimed to discover new ligands for this receptor, with emphasis on antagonists. To cope with the lack of experimental structure of the receptor, we used a mixed experimental/computational methodology which iteratively improved the performance of the predicted structure. The increasing number of active compounds, obtained here through experimental screening of FDA-approved drug library, and through chemically synthesized flufenamic acid derivatives, enabled the refinement of the binding pocket, which in turn improved the structure-based virtual screening reliability. This mixed approach led to the identification of 10 new antagonists and 200 new agonists of TAS2R14, illustrating the untapped potential of rigorous medicinal chemistry for TAS2Rs. 9% of the ~ 1800 pharmaceutical drugs here tested activate TAS2R14, nine of them at sub-micromolar concentrations. The iterative framework suggested residues involved in the activation process, is suitable for expanding bitter and bitter-masking chemical space, and is applicable to other promiscuous GPCRs lacking experimental structures.
Collapse
Affiliation(s)
- Fabrizio Fierro
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Lior Peri
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Harald Hübner
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Alina Tabor-Schkade
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Lukas Waterloo
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Stefan Löber
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Tara Pfeiffer
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Dorothee Weikert
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany
| | - Tamir Dingjan
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Eitan Margulis
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Peter Gmeiner
- Department of Chemistry and Pharmacy, Medicinal Chemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Str. 10, 91058, Erlangen, Germany.
| | - Masha Y Niv
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel.
| |
Collapse
|
50
|
de Brevern AG. An agnostic analysis of the human AlphaFold2 proteome using local protein conformations. Biochimie 2023; 207:11-19. [PMID: 36417962 DOI: 10.1016/j.biochi.2022.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 11/21/2022]
Abstract
Knowledge of the 3D structure of proteins is a valuable asset for understanding their precise biological mechanisms. However, the cost of production of 3D structures and experimental difficulties limit their obtaining. The proposal of 3D structural models is consequently an appealing alternative. The release of the AlphaFold Deep Learning approach has revolutionized the field. The recent near-complete human proteome proposal makes it possible to analyse large amounts of data and evaluate the results of the approach in greater depth. The 3D human proteome was thus analysed in light of the classic secondary structures, and many less-used protein local conformations (PolyProline II helices, type of γ-turns, of β-turns and of β-bulges, curvature of the helices, and a structural alphabet). Without questioning the global quality of the approach, this analysis highlights certain local conformations, which maybe poorly predicted and they could therefore be better addressed.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Bioinformatics team, F-75014, Paris, France.
| |
Collapse
|