1
|
Gainza P, Bunker RD, Townson SA, Castle JC. Machine learning to predict de novo protein-protein interactions. Trends Biotechnol 2025:S0167-7799(25)00158-1. [PMID: 40425414 DOI: 10.1016/j.tibtech.2025.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 04/23/2025] [Accepted: 04/23/2025] [Indexed: 05/29/2025]
Abstract
Advances in machine learning for structural biology have dramatically enhanced our capacity to predict protein-protein interactions (PPIs). Here, we review recent developments in the computational prediction of PPIs, particularly focusing on innovations that enable interaction predictions that have no precedence in nature, termed de novo. We discuss novel machine learning algorithms for PPI prediction, including approaches based on co-folding and atomic graphs. We further highlight methods that learn from molecular surfaces, which can predict PPIs not found in nature including interactions induced by small molecules. Finally, we explore the emerging biotechnological applications enabled by these predictive capabilities, including the prediction of antibody-antigen complexes and molecular glue-induced PPIs, and discuss their potential to empower drug discovery and protein engineering.
Collapse
Affiliation(s)
- Pablo Gainza
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland.
| | - Richard D Bunker
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland
| | - Sharon A Townson
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland
| | - John C Castle
- Monte Rosa Therapeutics, Klybeckstrasse 191, 4057 Basel, Switzerland.
| |
Collapse
|
2
|
Yuan R, Zhang J, Zhou J, Cong Q. Recent progress and future challenges in structure-based protein-protein interaction prediction. Mol Ther 2025; 33:2252-2268. [PMID: 40195117 DOI: 10.1016/j.ymthe.2025.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2025] [Revised: 03/05/2025] [Accepted: 04/02/2025] [Indexed: 04/09/2025] Open
Abstract
Protein-protein interactions (PPIs) play a fundamental role in cellular processes, and understanding these interactions is crucial for advances in both basic biological science and biomedical applications. This review presents an overview of recent progress in computational methods for modeling protein complexes and predicting PPIs based on 3D structures, focusing on the transformative role of artificial intelligence-based approaches. We further discuss the expanding biomedical applications of PPI research, including the elucidation of disease mechanisms, drug discovery, and therapeutic design. Despite these advances, significant challenges remain in predicting host-pathogen interactions, interactions between intrinsically disordered regions, and interactions related to immune responses. These challenges are worthwhile for future explorations and represent the frontier of research in this field.
Collapse
Affiliation(s)
- Rongqing Yuan
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
3
|
Xiong S, Cai J, Shi H, Cui F, Zhang Z, Wei L. UMPPI: Unveiling Multilevel Protein-Peptide Interaction Prediction via Language Models. J Chem Inf Model 2025; 65:3789-3799. [PMID: 40077987 DOI: 10.1021/acs.jcim.4c02365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2025]
Abstract
Protein-peptide interactions are essential to cellular processes and disease mechanisms. Identifying protein-peptide binding residues is critical for understanding peptide function and advancing drug discovery. However, experimental methods are costly and time-intensive, while existing computational approaches often predict interactions or binding residues separately, lack effective feature integration, or rely heavily on limited high-quality structural data. To address these challenges, we propose UMPPI (Unveiling Multilevel Protein-Peptide Interaction), a multiobjective framework based on the pretrained protein language model ESM2. UMPPI simultaneously predicts binary protein-peptide interactions and binding residues on both peptides and proteins through a multiobjective optimization strategy. By integrating ESM2 to encode sequences and extract latent structural information, UMPPI bridges the gap between sequence-based and structure-based methods. Extensive experiments demonstrated that UMPPI successfully captured binary interactions between peptides and proteins and identified the binding residues on peptides and proteins. UMPPI can serve as a useful tool for protein-peptide interaction prediction and identification of critical binding residues, thereby facilitating the peptide drug discovery process.
Collapse
Affiliation(s)
- Shuwen Xiong
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Jiajie Cai
- School of Software, Shandong University, Jinan 250101, China
| | - Hua Shi
- School of Optoelectronic and Communication Engineering, Xiamen University of Technology, Xiamen 361005, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
4
|
Fang A, Zhang Z, Zhou A, Zitnik M. ATOMICA: Learning Universal Representations of Intermolecular Interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.02.646906. [PMID: 40291688 PMCID: PMC12026499 DOI: 10.1101/2025.04.02.646906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Molecular interactions underlie nearly all biological processes, but most machine learning models treat molecules in isolation or specialize in a single type of interaction, such as protein-ligand or protein-protein binding. This siloed approach prevents generalization across biomolecular classes and limits the ability to model interaction interfaces systematically. We introduce ATOMICA, a geometric deep learning model that learns atomic-scale representations of intermolecular interfaces across diverse biomolecular modalities, including small molecules, metal ions, amino acids, and nucleic acids. ATOMICA uses a self-supervised denoising and masking objective to train on 2,037,972 interaction complexes and generate hierarchical embeddings at the levels of atoms, chemical blocks, and molecular interfaces. The model generalizes across molecular classes and recovers shared physicochemical features without supervision. Its latent space captures compositional and chemical similarities across interaction types and follows scaling laws that improve representation quality with increasing biomolecular data modalities. We apply ATOMICA to construct five modality-specific interfaceome networks, termed ATOMICAN et s, which connect proteins based on interaction similarity with ions, small molecules, nucleic acids, lipids, and proteins. These networks identify disease pathways across 27 conditions and predict disease-associated proteins in autoimmune neuropathies and lymphoma. Finally, we use ATOMICA to annotate the dark proteome-proteins lacking known structure or function-by predicting 2,646 previously uncharacterized ligand-binding sites. These include putative zinc finger motifs and transmembrane cytochrome subunits, demonstrating that ATOMICA enables systematic annotation of molecular interactions across the proteome.
Collapse
|
5
|
Zhang M, Deng Y, Zhou Q, Gao J, Zhang D, Pan X. Advancing micro-nano supramolecular assembly mechanisms of natural organic matter by machine learning for unveiling environmental geochemical processes. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2025; 27:24-45. [PMID: 39745028 DOI: 10.1039/d4em00662c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
The nano-self-assembly of natural organic matter (NOM) profoundly influences the occurrence and fate of NOM and pollutants in large-scale complex environments. Machine learning (ML) offers a promising and robust tool for interpreting and predicting the processes, structures and environmental effects of NOM self-assembly. This review seeks to provide a tutorial-like compilation of data source determination, algorithm selection, model construction, interpretability analyses, applications and challenges for big-data-based ML aiming at elucidating NOM self-assembly mechanisms in environments. The results from advanced nano-submicron-scale spatial chemical analytical technologies are suggested as input data which provide the combined information of molecular interactions and structural visualization. The existing ML algorithms need to handle multi-scale and multi-modal data, necessitating the development of new algorithmic frameworks. Interpretable supervised models are crucial owing to their strong capacity of quantifying the structure-property-effect relationships and bridging the gap between simply data-driven ML and complicated NOM assembly practice. Then, the necessity and challenges are discussed and emphasized on adopting ML to understand the geochemical behaviors and bioavailability of pollutants as well as the elemental cycling processes in environments resulting from the NOM self-assembly patterns. Finally, a research framework integrating ML, experiments and theoretical simulation is proposed for comprehensively and efficiently understanding the NOM self-assembly-involved environmental issues.
Collapse
Affiliation(s)
- Ming Zhang
- College of Geoinformatics, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Yihui Deng
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Qianwei Zhou
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, P. R. China
| | - Jing Gao
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Daoyong Zhang
- College of Geoinformatics, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| | - Xiangliang Pan
- College of Environment, Zhejiang University of Technology, Hangzhou, 310014, P. R. China.
| |
Collapse
|
6
|
Jin X, Chen Z, Yu D, Jiang Q, Chen Z, Yan B, Qin J, Liu Y, Wang J. TPepPro: a deep learning model for predicting peptide-protein interactions. Bioinformatics 2024; 41:btae708. [PMID: 39585721 PMCID: PMC11681936 DOI: 10.1093/bioinformatics/btae708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 10/23/2024] [Accepted: 11/24/2024] [Indexed: 11/26/2024] Open
Abstract
MOTIVATION Peptides and their derivatives hold potential as therapeutic agents. The rising interest in developing peptide drugs is evidenced by increasing approval rates by the FDA of USA. To identify the most potential peptides, study on peptide-protein interactions (PepPIs) presents a very important approach but poses considerable technical challenges. In experimental aspects, the transient nature of PepPIs and the high flexibility of peptides contribute to elevated costs and inefficiency. Traditional docking and molecular dynamics simulation methods require substantial computational resources, and the predictive accuracy of their results remain unsatisfactory. RESULTS To address this gap, we proposed TPepPro, a Transformer-based model for PepPI prediction. We trained TPepPro on a dataset of 19,187 pairs of peptide-protein complexes with both sequential and structural features. TPepPro utilizes a strategy that combines local protein sequence feature extraction with global protein structure feature extraction. Moreover, TPepPro optimizes the architecture of structural featuring neural network in BN-ReLU arrangement, which notably reduced the amount of computing resources required for PepPIs prediction. According to comparison analysis, the accuracy reached 0.855 in TPepPro, achieving an 8.1% improvement compared to the second-best model TAGPPI. TPepPro achieved an AUC of 0.922, surpassing the second-best model TAGPPI with 0.844. Moreover, the newly developed TPepPro identify certain PepPIs that can be validated according to previous experimental evidence, thus indicating the efficiency of TPepPro to detect high potential PepPIs that would be helpful for amino acid drug applications. AVAILABILITY AND IMPLEMENTATION The source code of TPepPro is available at https://github.com/wanglabhku/TPepPro.
Collapse
Affiliation(s)
- Xiaohong Jin
- School of Electronic Information, Guangxi University for Nationalities, Nanning 530000, China
| | - Zimeng Chen
- Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Dan Yu
- Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Qianhui Jiang
- Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Zhuobin Chen
- School of Pharmaceutical Sciences (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Bin Yan
- Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Jing Qin
- School of Pharmaceutical Sciences (Shenzhen), Shenzhen Campus of Sun Yat-sen University, Shenzhen, Guangdong 518107, China
| | - Yong Liu
- School of Artificial Intelligence, Guangxi University for Nationalities, Nanning 530000, China
| | - Junwen Wang
- Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory of Pharmaceutical Biotechnology, The University of Hong Kong, Hong Kong SAR, China
- HKU Shenzhen Hospital, Shenzhen 518000, China
| |
Collapse
|
7
|
Bunne C, Roohani Y, Rosen Y, Gupta A, Zhang X, Roed M, Alexandrov T, AlQuraishi M, Brennan P, Burkhardt DB, Califano A, Cool J, Dernburg AF, Ewing K, Fox EB, Haury M, Herr AE, Horvitz E, Hsu PD, Jain V, Johnson GR, Kalil T, Kelley DR, Kelley SO, Kreshuk A, Mitchison T, Otte S, Shendure J, Sofroniew NJ, Theis F, Theodoris CV, Upadhyayula S, Valer M, Wang B, Xing E, Yeung-Levy S, Zitnik M, Karaletsos T, Regev A, Lundberg E, Leskovec J, Quake SR. How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell 2024; 187:7045-7063. [PMID: 39672099 DOI: 10.1016/j.cell.2024.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 11/02/2024] [Accepted: 11/12/2024] [Indexed: 12/15/2024]
Abstract
Cells are essential to understanding health and disease, yet traditional models fall short of modeling and simulating their function and behavior. Advances in AI and omics offer groundbreaking opportunities to create an AI virtual cell (AIVC), a multi-scale, multi-modal large-neural-network-based model that can represent and simulate the behavior of molecules, cells, and tissues across diverse states. This Perspective provides a vision on their design and how collaborative efforts to build AIVCs will transform biological research by allowing high-fidelity simulations, accelerating discoveries, and guiding experimental studies, offering new opportunities for understanding cellular functions and fostering interdisciplinary collaborations in open science.
Collapse
Affiliation(s)
- Charlotte Bunne
- Department of Computer Science, Stanford University, Stanford, CA, USA; Genentech, South San Francisco, CA, USA; Chan Zuckerberg Initiative, Redwood City, CA, USA; School of Computer and Communication Sciences and School of Life Sciences, EPFL, Lausanne, Switzerland
| | - Yusuf Roohani
- Department of Computer Science, Stanford University, Stanford, CA, USA; Chan Zuckerberg Initiative, Redwood City, CA, USA; Arc Institute, Palo Alto, CA, USA
| | - Yanay Rosen
- Department of Computer Science, Stanford University, Stanford, CA, USA; Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Ankit Gupta
- Chan Zuckerberg Initiative, Redwood City, CA, USA; Department of Protein Science, Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Xikun Zhang
- Department of Computer Science, Stanford University, Stanford, CA, USA; Chan Zuckerberg Initiative, Redwood City, CA, USA; Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Marcel Roed
- Department of Computer Science, Stanford University, Stanford, CA, USA; Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Theo Alexandrov
- Department of Pharmacology, University of California, San Diego, San Diego, CA, USA; Department of Bioengineering, University of California, San Diego, San Diego, CA, USA
| | - Mohammed AlQuraishi
- Department of Bioengineering, University of California, San Diego, San Diego, CA, USA
| | | | | | - Andrea Califano
- Department of Systems Biology, Columbia University, New York, NY, USA; Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, USA; Chan Zuckerberg Biohub, New York, NY, USA
| | - Jonah Cool
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Abby F Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Kirsty Ewing
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Emily B Fox
- Department of Computer Science, Stanford University, Stanford, CA, USA; Department of Statistics, Stanford University, Stanford, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Matthias Haury
- Chan Zuckerberg Institute for Advanced Biological Imaging, Redwood City, CA, USA
| | - Amy E Herr
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | | | - Patrick D Hsu
- Arc Institute, Palo Alto, CA, USA; Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | | | | | | | | | - Shana O Kelley
- Chan Zuckerberg Biohub, Chicago, IL, USA; Northwestern University, Evanston, IL, USA
| | - Anna Kreshuk
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tim Mitchison
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Stephani Otte
- Chan Zuckerberg Institute for Advanced Biological Imaging, Redwood City, CA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA, USA; Seattle Hub for Synthetic Biology, Seattle, WA, USA; Howard Hughes Medical Institute, Seattle, WA, USA
| | | | - Fabian Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; School of Computing, Information and Technology, Technical University of Munich, Munich, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Christina V Theodoris
- Gladstone Institute of Cardiovascular Disease, Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA; Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA
| | - Srigokul Upadhyayula
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA; Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Marc Valer
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Bo Wang
- Department of Computer Science, University of Toronto, Toronto, ON, Canada; Vector Institute, Toronto, ON, Canada
| | - Eric Xing
- Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, USA; Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Serena Yeung-Levy
- Department of Computer Science, Stanford University, Stanford, CA, USA; Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Aviv Regev
- Genentech, South San Francisco, CA, USA.
| | - Emma Lundberg
- Chan Zuckerberg Initiative, Redwood City, CA, USA; Department of Protein Science, Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden; Department of Bioengineering, Stanford University, Stanford, CA, USA; Department of Pathology, Stanford University, Stanford, CA, USA.
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA; Chan Zuckerberg Initiative, Redwood City, CA, USA.
| | - Stephen R Quake
- Chan Zuckerberg Initiative, Redwood City, CA, USA; Department of Bioengineering, Stanford University, Stanford, CA, USA; Department of Applied Physics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
8
|
Luo Y, Zhao C, Chen F. Multiomics Research: Principles and Challenges in Integrated Analysis. BIODESIGN RESEARCH 2024; 6:0059. [PMID: 39990095 PMCID: PMC11844812 DOI: 10.34133/bdr.0059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Revised: 10/24/2024] [Accepted: 10/28/2024] [Indexed: 02/25/2025] Open
Abstract
Multiomics research is a transformative approach in the biological sciences that integrates data from genomics, transcriptomics, proteomics, metabolomics, and other omics technologies to provide a comprehensive understanding of biological systems. This review elucidates the fundamental principles of multiomics, emphasizing the necessity of data integration to uncover the complex interactions and regulatory mechanisms underlying various biological processes. We explore the latest advances in computational methodologies, including deep learning, graph neural networks (GNNs), and generative adversarial networks (GANs), which facilitate the effective synthesis and interpretation of multiomics data. Additionally, this review addresses the critical challenges in this field, such as data heterogeneity, scalability, and the need for robust, interpretable models. We highlight the potential of large language models to enhance multiomics analysis through automated feature extraction, natural language generation, and knowledge integration. Despite the important promise of multiomics, the review acknowledges the substantial computational resources required and the complexity of model tuning, underscoring the need for ongoing innovation and collaboration in the field. This comprehensive analysis aims to guide researchers in navigating the principles and challenges of multiomics research to foster advances in integrative biological analysis.
Collapse
Affiliation(s)
- Yunqing Luo
- National Key Laboratory for Tropical Crop Breeding, College of Breeding and Multiplication, Sanya Institute of Breeding and Multiplication, Hainan University, Sanya 572025, China
- College of Tropical Agriculture and Forestry, Hainan University, Danzhou 571700, China
| | - Chengjun Zhao
- National Key Laboratory for Tropical Crop Breeding, College of Breeding and Multiplication, Sanya Institute of Breeding and Multiplication, Hainan University, Sanya 572025, China
- College of Tropical Agriculture and Forestry, Hainan University, Danzhou 571700, China
| | - Fei Chen
- National Key Laboratory for Tropical Crop Breeding, College of Breeding and Multiplication, Sanya Institute of Breeding and Multiplication, Hainan University, Sanya 572025, China
- College of Tropical Agriculture and Forestry, Hainan University, Danzhou 571700, China
| |
Collapse
|
9
|
Bunne C, Roohani Y, Rosen Y, Gupta A, Zhang X, Roed M, Alexandrov T, AlQuraishi M, Brennan P, Burkhardt DB, Califano A, Cool J, Dernburg AF, Ewing K, Fox EB, Haury M, Herr AE, Horvitz E, Hsu PD, Jain V, Johnson GR, Kalil T, Kelley DR, Kelley SO, Kreshuk A, Mitchison T, Otte S, Shendure J, Sofroniew NJ, Theis F, Theodoris CV, Upadhyayula S, Valer M, Wang B, Xing E, Yeung-Levy S, Zitnik M, Karaletsos T, Regev A, Lundberg E, Leskovec J, Quake SR. How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities. ARXIV 2024:arXiv:2409.11654v2. [PMID: 39398201 PMCID: PMC11468656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
The cell is arguably the most fundamental unit of life and is central to understanding biology. Accurate modeling of cells is important for this understanding as well as for determining the root causes of disease. Recent advances in artificial intelligence (AI), combined with the ability to generate large-scale experimental data, present novel opportunities to model cells. Here we propose a vision of leveraging advances in AI to construct virtual cells, high-fidelity simulations of cells and cellular systems under different conditions that are directly learned from biological data across measurements and scales. We discuss desired capabilities of such AI Virtual Cells, including generating universal representations of biological entities across scales, and facilitating interpretable in silico experiments to predict and understand their behavior using Virtual Instruments. We further address the challenges, opportunities and requirements to realize this vision including data needs, evaluation strategies, and community standards and engagement to ensure biological accuracy and broad utility. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration. With open science collaborations across the biomedical ecosystem that includes academia, philanthropy, and the biopharma and AI industries, a comprehensive predictive understanding of cell mechanisms and interactions has come into reach.
Collapse
Affiliation(s)
- Charlotte Bunne
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Genentech, South San Francisco, CA, USA
- Chan Zuckerberg Initiative, Redwood City, CA, USA
- School of Computer and Communication Sciences and School of Life Sciences, EPFL, Lausanne, Switzerland
| | - Yusuf Roohani
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Initiative, Redwood City, CA, USA
- Arc Institute, Palo Alto, CA, USA
| | - Yanay Rosen
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Ankit Gupta
- Chan Zuckerberg Initiative, Redwood City, CA, USA
- KTH Royal Institute of Technology, Science for Life Laboratory, Department of Protein Science, Stockholm, Sweden
| | - Xikun Zhang
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Initiative, Redwood City, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Marcel Roed
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Theo Alexandrov
- Department of Pharmacology, University of California, San Diego, CA, USA
- Department of Bioengineering, University of California, San Diego, CA, USA
| | | | | | | | - Andrea Califano
- Department of Systems Biology, Columbia University, New York, NY, USA
- Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, NY, USA
- Chan Zuckerberg Biohub New York, NY, USA
| | - Jonah Cool
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Abby F Dernburg
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Kirsty Ewing
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Emily B Fox
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Statistics, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub San Francisco, CA, USA
| | - Matthias Haury
- Chan Zuckerberg Institute for Advanced Biological Imaging, Redwood City, CA, USA
| | - Amy E Herr
- Chan Zuckerberg Biohub San Francisco, CA, USA
- Department of Bioengineering, University of California, Berkeley, CA, USA
| | | | - Patrick D Hsu
- Arc Institute, Palo Alto, CA, USA
- Department of Bioengineering, University of California, Berkeley, CA, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | | | | | | | | | - Shana O Kelley
- Chan Zuckerberg Biohub Chicago, IL, USA
- Northwestern University, Evanston, IL, USA
| | - Anna Kreshuk
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Tim Mitchison
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Stephani Otte
- Chan Zuckerberg Institute for Advanced Biological Imaging, Redwood City, CA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | | | - Fabian Theis
- Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany
- School of Computing, Information and Technology, Technical University of Munich, Munich, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany
| | - Christina V Theodoris
- Gladstone Institute of Cardiovascular Disease, Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
- Department of Pediatrics, University of California, San Francisco, CA, USA
| | - Srigokul Upadhyayula
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub San Francisco, CA, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Marc Valer
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Bo Wang
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | - Eric Xing
- Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, USA
- Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Serena Yeung-Levy
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Emma Lundberg
- Chan Zuckerberg Initiative, Redwood City, CA, USA
- KTH Royal Institute of Technology, Science for Life Laboratory, Department of Protein Science, Stockholm, Sweden
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | - Stephen R Quake
- Chan Zuckerberg Initiative, Redwood City, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
| |
Collapse
|
10
|
Reys V, Pons JL, Labesse G. SLiMAn 2.0: meaningful navigation through peptide-protein interaction networks. Nucleic Acids Res 2024; 52:W313-W317. [PMID: 38783158 PMCID: PMC11223867 DOI: 10.1093/nar/gkae398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/17/2024] [Accepted: 04/30/2024] [Indexed: 05/25/2024] Open
Abstract
Among the myriad of protein-protein interactions occurring in living organisms, a substantial amount involves small linear motifs (SLiMs) recognized by structured domains. However, predictions of SLiM-based networks are tedious, due to the abundance of such motifs and a high portion of false positive hits. For this reason, a webserver SLiMAn (Short Linear Motif Analysis) was developed to focus the search on the most relevant SLiMs. Using SLiMAn, one can navigate into a given (meta-)interactome and tune a variety of parameters associated to each type of SLiMs in attempt to identify functional ELM motifs and their recognition domains. The IntAct and BioGRID databases bring experimental information, while IUPred and AlphaFold provide boundaries of folded and disordered regions. Post-translational modifications listed in PhosphoSite+ are highlighted. Links to PubMed accelerate scrutiny into the literature, to support (or not) putative pairings. Dedicated visualization features are also incorporated, such as Cytoscape for macromolecular networks and BINANA for intermolecular contacts within structural models generated by SCWRL 3.0. The use of SLiMAn 2.0 is illustrated on a simple example. It is freely available at https://sliman2.cbs.cnrs.fr.
Collapse
Affiliation(s)
- Victor Reys
- Centre de Biologie Structurale, CNRS, INSERM, Univ. Montpellier, Montpellier, France
| | - Jean-Luc Pons
- Centre de Biologie Structurale, CNRS, INSERM, Univ. Montpellier, Montpellier, France
| | - Gilles Labesse
- Centre de Biologie Structurale, CNRS, INSERM, Univ. Montpellier, Montpellier, France
| |
Collapse
|
11
|
Travassos R, Martins SA, Fernandes A, Correia JDG, Melo R. Tailored Viral-like Particles as Drivers of Medical Breakthroughs. Int J Mol Sci 2024; 25:6699. [PMID: 38928403 PMCID: PMC11204272 DOI: 10.3390/ijms25126699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 06/11/2024] [Accepted: 06/12/2024] [Indexed: 06/28/2024] Open
Abstract
Despite the recognized potential of nanoparticles, only a few formulations have progressed to clinical trials, and an even smaller number have been approved by the regulatory authorities and marketed. Virus-like particles (VLPs) have emerged as promising alternatives to conventional nanoparticles due to their safety, biocompatibility, immunogenicity, structural stability, scalability, and versatility. Furthermore, VLPs can be surface-functionalized with small molecules to improve circulation half-life and target specificity. Through the functionalization and coating of VLPs, it is possible to optimize the response properties to a given stimulus, such as heat, pH, an alternating magnetic field, or even enzymes. Surface functionalization can also modulate other properties, such as biocompatibility, stability, and specificity, deeming VLPs as potential vaccine candidates or delivery systems. This review aims to address the different types of surface functionalization of VLPs, highlighting the more recent cutting-edge technologies that have been explored for the design of tailored VLPs, their importance, and their consequent applicability in the medical field.
Collapse
Affiliation(s)
- Rafael Travassos
- Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, Estrada Nacional 10 (km 139.7), 2695-066 Bobadela, Portugal; (R.T.); (S.A.M.); (A.F.)
| | - Sofia A. Martins
- Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, Estrada Nacional 10 (km 139.7), 2695-066 Bobadela, Portugal; (R.T.); (S.A.M.); (A.F.)
| | - Ana Fernandes
- Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, Estrada Nacional 10 (km 139.7), 2695-066 Bobadela, Portugal; (R.T.); (S.A.M.); (A.F.)
| | - João D. G. Correia
- Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, Estrada Nacional 10 (km 139.7), 2695-066 Bobadela, Portugal; (R.T.); (S.A.M.); (A.F.)
- Departamento de Engenharia e Ciências Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, Estrada Nacional 10 (km 139.7), 2695-066 Bobadela, Portugal
| | - Rita Melo
- Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, Estrada Nacional 10 (km 139.7), 2695-066 Bobadela, Portugal; (R.T.); (S.A.M.); (A.F.)
| |
Collapse
|
12
|
Rosenberger G, Li W, Turunen M, He J, Subramaniam PS, Pampou S, Griffin AT, Karan C, Kerwin P, Murray D, Honig B, Liu Y, Califano A. Network-based elucidation of colon cancer drug resistance mechanisms by phosphoproteomic time-series analysis. Nat Commun 2024; 15:3909. [PMID: 38724493 PMCID: PMC11082183 DOI: 10.1038/s41467-024-47957-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Aberrant signaling pathway activity is a hallmark of tumorigenesis and progression, which has guided targeted inhibitor design for over 30 years. Yet, adaptive resistance mechanisms, induced by rapid, context-specific signaling network rewiring, continue to challenge therapeutic efficacy. Leveraging progress in proteomic technologies and network-based methodologies, we introduce Virtual Enrichment-based Signaling Protein-activity Analysis (VESPA)-an algorithm designed to elucidate mechanisms of cell response and adaptation to drug perturbations-and use it to analyze 7-point phosphoproteomic time series from colorectal cancer cells treated with clinically-relevant inhibitors and control media. Interrogating tumor-specific enzyme/substrate interactions accurately infers kinase and phosphatase activity, based on their substrate phosphorylation state, effectively accounting for signal crosstalk and sparse phosphoproteome coverage. The analysis elucidates time-dependent signaling pathway response to each drug perturbation and, more importantly, cell adaptive response and rewiring, experimentally confirmed by CRISPR knock-out assays, suggesting broad applicability to cancer and other diseases.
Collapse
Affiliation(s)
- George Rosenberger
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Wenxue Li
- Yale Cancer Biology Institute, Yale University, West Haven, CT, USA
| | - Mikko Turunen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Jing He
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Prem S Subramaniam
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Sergey Pampou
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Aaron T Griffin
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Medical Scientist Training Program, Columbia University Irving Medical Center, New York, NY, USA
| | - Charles Karan
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Patrick Kerwin
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biochemistry & Molecular Biophysics, Columbia University Irving Medical Center, New York, NY, USA
- Zuckerman Mind Brain and Behavior Institute, Columbia University, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Yansheng Liu
- Yale Cancer Biology Institute, Yale University, West Haven, CT, USA.
- Department of Pharmacology, Yale University School of Medicine, New Haven, CT, USA.
| | - Andrea Califano
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Biochemistry & Molecular Biophysics, Columbia University Irving Medical Center, New York, NY, USA.
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
- Chan Zuckerberg Biohub New York, New York, NY, USA.
| |
Collapse
|
13
|
Yin S, Mi X, Shukla D. Leveraging machine learning models for peptide-protein interaction prediction. RSC Chem Biol 2024; 5:401-417. [PMID: 38725911 PMCID: PMC11078210 DOI: 10.1039/d3cb00208j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 02/07/2024] [Indexed: 05/12/2024] Open
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as docking and molecular dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
- Department of Bioengineering, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| |
Collapse
|
14
|
Liu JX, Zhang X, Huang YQ, Hao GF, Yang GF. Multi-level bioinformatics resources support drug target discovery of protein-protein interactions. Drug Discov Today 2024; 29:103979. [PMID: 38608830 DOI: 10.1016/j.drudis.2024.103979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/14/2024] [Accepted: 04/05/2024] [Indexed: 04/14/2024]
Abstract
Drug discovery often begins with a new target. Protein-protein interactions (PPIs) are crucial to multitudinous cellular processes and offer a promising avenue for drug-target discovery. PPIs are characterized by multi-level complexity: at the protein level, interaction networks can be used to identify potential targets, whereas at the residue level, the details of the interactions of individual PPIs can be used to examine a target's druggability. Much great progress has been made in target discovery through multi-level PPI-related computational approaches, but these resources have not been fully discussed. Here, we systematically survey bioinformatics tools for identifying and assessing potential drug targets, examining their characteristics, limitations and applications. This work will aid the integration of the broader protein-to-network context with the analysis of detailed binding mechanisms to support the discovery of drug targets.
Collapse
Affiliation(s)
- Jia-Xin Liu
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China
| | - Xiao Zhang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Yuan-Qin Huang
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China; State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals, Guizhou University, Guiyang 550025, PR China.
| | - Guang-Fu Yang
- National Key Laboratory of Green Pesticide, Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, PR China.
| |
Collapse
|
15
|
Wan F, Wong F, Collins JJ, de la Fuente-Nunez C. Machine learning for antimicrobial peptide identification and design. NATURE REVIEWS BIOENGINEERING 2024; 2:392-407. [PMID: 39850516 PMCID: PMC11756916 DOI: 10.1038/s44222-024-00152-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2025]
Abstract
Artificial intelligence (AI) and machine learning (ML) models are being deployed in many domains of society and have recently reached the field of drug discovery. Given the increasing prevalence of antimicrobial resistance, as well as the challenges intrinsic to antibiotic development, there is an urgent need to accelerate the design of new antimicrobial therapies. Antimicrobial peptides (AMPs) are therapeutic agents for treating bacterial infections, but their translation into the clinic has been slow owing to toxicity, poor stability, limited cellular penetration and high cost, among other issues. Recent advances in AI and ML have led to breakthroughs in our abilities to predict biomolecular properties and structures and to generate new molecules. The ML-based modelling of peptides may overcome some of the disadvantages associated with traditional drug discovery and aid the rapid development and translation of AMPs. Here, we provide an introduction to this emerging field and survey ML approaches that can be used to address issues currently hindering AMP development. We also outline important limitations that can be addressed for the broader adoption of AMPs in clinical practice, as well as new opportunities in data-driven peptide design.
Collapse
Affiliation(s)
- Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally: Fangping Wan, Felix Wong
| | - Felix Wong
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- These authors contributed equally: Fangping Wan, Felix Wong
| | - James J. Collins
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
- These authors jointly supervised this work: James J. Collins, Cesar de la Fuente-Nunez
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors jointly supervised this work: James J. Collins, Cesar de la Fuente-Nunez
| |
Collapse
|
16
|
Yin S, Mi X, Shukla D. Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction. ARXIV 2024:arXiv:2310.18249v2. [PMID: 37961736 PMCID: PMC10635286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as Docking and Molecular Dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- These authors contributed to the work equally
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- These authors contributed to the work equally
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| |
Collapse
|
17
|
Mezghrani A, Simon J, Reys V, Labesse G. Detection and Analysis of Short Linear Motif-Based Protein-Protein Interactions with SLiMAn2 Web Server. Methods Mol Biol 2024; 2836:253-281. [PMID: 38995545 DOI: 10.1007/978-1-0716-4007-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Interactomics is bringing a deluge of data regarding protein-protein interactions (PPIs) which are involved in various molecular processes in all types of cells. However, this information does not easily translate into direct and precise molecular interfaces. This limits our understanding of each interaction network and prevents their efficient modulation. A lot of the detected interactions involve recognition of short linear motifs (SLiMs) by a folded domain while others rely on domain-domain interactions. Functional SLiMs hide among a lot of spurious ones, making deeper analysis of interactomes tedious. Hence, actual contacts and direct interactions are difficult to identify.Consequently, there is a need for user-friendly bioinformatic tools, enabling rapid molecular and structural analysis of SLiM-based PPIs in a protein network. In this chapter, we describe the use of the new webserver SLiMAn to help digging into SLiM-based PPIs in an interactive fashion.
Collapse
Affiliation(s)
- Alexandre Mezghrani
- Centre de Biologie Structurale (CBS), CNRS, INSERM, University of Montpellier, Montpellier, France
| | - Juliette Simon
- Centre de Biologie Structurale (CBS), CNRS, INSERM, University of Montpellier, Montpellier, France
| | - Victor Reys
- Centre de Biologie Structurale (CBS), CNRS, INSERM, University of Montpellier, Montpellier, France.
| | - Gilles Labesse
- Centre de Biologie Structurale (CBS), CNRS, INSERM, University of Montpellier, Montpellier, France.
| |
Collapse
|
18
|
Cheng H, Wang GG, Chen L, Wang R. A dual-population multi-objective evolutionary algorithm driven by generative adversarial networks for benchmarking and protein-peptide docking. Comput Biol Med 2024; 168:107727. [PMID: 38029532 DOI: 10.1016/j.compbiomed.2023.107727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 09/28/2023] [Accepted: 11/15/2023] [Indexed: 12/01/2023]
Abstract
Multi-objective optimization problems (MOPs) are characterized as optimization problems in which multiple conflicting objective functions are optimized simultaneously. To solve MOPs, some algorithms used machine learning models to drive the evolutionary algorithms, leading to the design of a variety of model-based evolutionary algorithms. However, model collapse occurs during the generation of candidate solutions, which results in local optima and poor diversity in model-based evolutionary algorithms. To address this problem, we propose a dual-population multi-objective evolutionary algorithm driven by Wasserstein generative adversarial network with gradient penalty (DGMOEA), where the dual-populations coordinate and cooperate to generate high-quality solutions, thus improving the performance of the evolutionary algorithm. We compare the proposed algorithm with the 7 state-of-the-art algorithms on 20 multi-objective benchmark functions. Experimental results indicate that DGMOEA achieves significant results in solving MOPs, where the metrics IGD and HV outperform the other comparative algorithms on 15 and 18 out of 20 benchmarks, respectively. Our algorithm is evaluated on the LEADS-PEP dataset containing 53 protein-peptide complexes, and the experimental results on solving the protein-peptide docking problem indicated that DGMOEA can effectively reduce the RMSD between the generated and the original peptide's 3D poses and achieve more competitive results.
Collapse
Affiliation(s)
- Honglei Cheng
- School of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Gai-Ge Wang
- School of Computer Science and Technology, Ocean University of China, Qingdao, China.
| | - Liyan Chen
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou, China
| | - Rui Wang
- College of Systems Engineering, National University of Defense Technology, Changsha, China; Xiangjiang Laboratory, Changsha, China
| |
Collapse
|
19
|
Toussaint PA, Leiser F, Thiebes S, Schlesner M, Brors B, Sunyaev A. Explainable artificial intelligence for omics data: a systematic mapping study. Brief Bioinform 2023; 25:bbad453. [PMID: 38113073 PMCID: PMC10729786 DOI: 10.1093/bib/bbad453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 07/28/2023] [Accepted: 11/08/2023] [Indexed: 12/21/2023] Open
Abstract
Researchers increasingly turn to explainable artificial intelligence (XAI) to analyze omics data and gain insights into the underlying biological processes. Yet, given the interdisciplinary nature of the field, many findings have only been shared in their respective research community. An overview of XAI for omics data is needed to highlight promising approaches and help detect common issues. Toward this end, we conducted a systematic mapping study. To identify relevant literature, we queried Scopus, PubMed, Web of Science, BioRxiv, MedRxiv and arXiv. Based on keywording, we developed a coding scheme with 10 facets regarding the studies' AI methods, explainability methods and omics data. Our mapping study resulted in 405 included papers published between 2010 and 2023. The inspected papers analyze DNA-based (mostly genomic), transcriptomic, proteomic or metabolomic data by means of neural networks, tree-based methods, statistical methods and further AI methods. The preferred post-hoc explainability methods are feature relevance (n = 166) and visual explanation (n = 52), while papers using interpretable approaches often resort to the use of transparent models (n = 83) or architecture modifications (n = 72). With many research gaps still apparent for XAI for omics data, we deduced eight research directions and discuss their potential for the field. We also provide exemplary research questions for each direction. Many problems with the adoption of XAI for omics data in clinical practice are yet to be resolved. This systematic mapping study outlines extant research on the topic and provides research directions for researchers and practitioners.
Collapse
Affiliation(s)
- Philipp A Toussaint
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
- HIDSS4Health – Helmholtz Information and Data Science School for Health, Karlsruhe, Heidelberg, Germany
| | - Florian Leiser
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Scott Thiebes
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Matthias Schlesner
- Biomedical Informatics, Data Mining and Data Analytics, Faculty of Applied Computer Science and Medical Faculty, University of Augsburg, Augsburg, Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Translational Oncology, National Center for Tumor Diseases, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ali Sunyaev
- Department of Economics and Management, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|
20
|
Feifei W, Wenrou S, Sining K, Siyu Z, Xiaolei F, Junxiang L, Congfen H, Xuhui L. A novel functional peptide, named EQ-9 (ESETRILLQ), identified by virtual screening from regenerative cell secretome and its potential anti-aging and restoration effects in topical applications. Peptides 2023; 169:171078. [PMID: 37579838 DOI: 10.1016/j.peptides.2023.171078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/10/2023] [Accepted: 08/11/2023] [Indexed: 08/16/2023]
Abstract
Skin aging refers to a degenerative process that can be affected and regulated by intrinsic and extrinsic factors. The mesenchymal stem cell secretome covers a considerable number of regenerative molecules with anti-aging effects in a wide variety of circumstances. However, it is complex, time-consuming, and costly to identify specific compounds from thousands of natural molecules using conventional methods. With the development of computational biology and machine learning, an efficient workflow was generated to identify novel peptides with anti-aging and skin restoration potential. One of the candidate peptides was discovered and subsequently truncated to a novel peptide named EQ-9, with promising anti-aging effects for topical applications at a concentration of 10 ppm validated by experimental validation. The above-described paradigm is expected to be further applied to the virtual screening of novel peptide molecules targeting specific biological functions from a wide variety of natural resources.
Collapse
Affiliation(s)
- Wang Feifei
- Yunnan Botanee Bio-technology Group Co., Ltd., Yunnan, China; Yunnan Yunke Characteristic Plant Extraction Laboratory Co., Ltd., Yunnan, China
| | - Su Wenrou
- Yunnan Botanee Bio-technology Group Co., Ltd., Yunnan, China; Yunnan Yunke Characteristic Plant Extraction Laboratory Co., Ltd., Yunnan, China
| | - Kang Sining
- AGECODE R&D Center, Yangtze Delta Region Institute of Tsinghua University, Zhejiang, China; Harvest Biotech (Zhejiang) Co., Ltd., Zhejiang, China
| | - Zhu Siyu
- AGECODE R&D Center, Yangtze Delta Region Institute of Tsinghua University, Zhejiang, China; Harvest Biotech (Zhejiang) Co., Ltd., Zhejiang, China
| | - Fu Xiaolei
- AGECODE R&D Center, Yangtze Delta Region Institute of Tsinghua University, Zhejiang, China; Harvest Biotech (Zhejiang) Co., Ltd., Zhejiang, China
| | - Li Junxiang
- AGECODE R&D Center, Yangtze Delta Region Institute of Tsinghua University, Zhejiang, China; Harvest Biotech (Zhejiang) Co., Ltd., Zhejiang, China
| | - He Congfen
- Beijing Technology and Business University, Beijing Key Lab of Plant Resources Research and Development, Beijing, China
| | - Li Xuhui
- AGECODE R&D Center, Yangtze Delta Region Institute of Tsinghua University, Zhejiang, China; Zhejiang Provincial Key Laboratory of Applied Enzymology, Yangtze Delta Region Institute of Tsinghua University, Zhejiang, China.
| |
Collapse
|
21
|
Ye J, Li A, Zheng H, Yang B, Lu Y. Machine Learning Advances in Predicting Peptide/Protein-Protein Interactions Based on Sequence Information for Lead Peptides Discovery. Adv Biol (Weinh) 2023; 7:e2200232. [PMID: 36775876 DOI: 10.1002/adbi.202200232] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 12/30/2022] [Indexed: 02/14/2023]
Abstract
Peptides have shown increasing advantages and significant clinical value in drug discovery and development. With the development of high-throughput technologies and artificial intelligence (AI), machine learning (ML) methods for discovering new lead peptides have been expanded and incorporated into rational drug design. Predictions of peptide-protein interactions (PepPIs) and protein-protein interactions (PPIs) are both opportunities and challenges in computational biology, which will help to better understand the mechanisms of disease and provide the impetus for the discovery of lead peptides. This paper comprehensively reviews computational models for PepPI and PPI predictions. It begins with an introduction of various databases of peptide ligands and target proteins. Then it discusses data formats and feature representations for proteins and peptides. Furthermore, classical ML methods and emerging deep learning (DL) methods that can be used to train prediction models of PepPI and PPI are classified into four categories, and their advantages and disadvantages are analyzed. To assess the relative performance of different models, different validation protocols and evaluation indexes are discussed. The goal of this review is to help researchers quickly get started to develop computational frameworks using these integrated resources and eventually promote the discovery of lead peptides.
Collapse
Affiliation(s)
- Jiahao Ye
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - An Li
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| | - Hao Zheng
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Banghua Yang
- School of Medicine, Shanghai University, Shanghai, 200444, China
| | - Yiming Lu
- School of Medicine, Shanghai University, Shanghai, 200444, China
- Department of Critical Care Medicine, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
- Department of Biochemical Pharmacy, School of Pharmacy, Second Military Medical University, Shanghai, 200433, China
| |
Collapse
|
22
|
Jiang J, Li J, Li J, Pei H, Li M, Zou Q, Lv Z. A Machine Learning Method to Identify Umami Peptide Sequences by Using Multiplicative LSTM Embedded Features. Foods 2023; 12:foods12071498. [PMID: 37048319 PMCID: PMC10094688 DOI: 10.3390/foods12071498] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 04/05/2023] Open
Abstract
Umami peptides enhance the umami taste of food and have good food processing properties, nutritional value, and numerous potential applications. Wet testing for the identification of umami peptides is a time-consuming and expensive process. Here, we report the iUmami-DRLF that uses a logistic regression (LR) method solely based on the deep learning pre-trained neural network feature extraction method, unified representation (UniRep based on multiplicative LSTM), for feature extraction from the peptide sequences. The findings demonstrate that deep learning representation learning significantly enhanced the capability of models in identifying umami peptides and predictive precision solely based on peptide sequence information. The newly validated taste sequences were also used to test the iUmami-DRLF and other predictors, and the result indicates that the iUmami-DRLF has better robustness and accuracy and remains valid at higher probability thresholds. The iUmami-DRLF method can aid further studies on enhancing the umami flavor of food for satisfying the need for an umami-flavored diet.
Collapse
Affiliation(s)
- Jici Jiang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Junxian Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
- Wu Yuzhang Honors College, Sichuan University, Chengdu 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
23
|
Brink KR, Hunt MG, Mu AM, Groszman K, Hoang KV, Lorch KP, Pogostin BH, Gunn JS, Tabor JJ. An E. coli display method for characterization of peptide-sensor kinase interactions. Nat Chem Biol 2023; 19:451-459. [PMID: 36482094 PMCID: PMC10065900 DOI: 10.1038/s41589-022-01207-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 10/10/2022] [Indexed: 12/13/2022]
Abstract
Bacteria use two-component system (TCS) signaling pathways to sense and respond to peptides involved in host defense, quorum sensing and inter-bacterial warfare. However, little is known about the broad peptide-sensing capabilities of TCSs. In this study, we developed an Escherichia coli display method to characterize the effects of human antimicrobial peptides (AMPs) on the pathogenesis-regulating TCS PhoPQ of Salmonella Typhimurium with much higher throughput than previously possible. We found that PhoPQ senses AMPs with diverse sequences, structures and biological functions. We further combined thousands of displayed AMP variants with machine learning to identify peptide sub-domains and biophysical features linked to PhoPQ activation. Most of the newfound AMP activators induce PhoPQ in S. Typhimurium, suggesting possible roles in virulence regulation. Finally, we present evidence that PhoPQ peptide-sensing specificity has evolved across commensal and pathogenic bacteria. Our method enables new insights into the specificities, mechanisms and evolutionary dynamics of TCS-mediated peptide sensing in bacteria.
Collapse
Affiliation(s)
- Kathryn R Brink
- Ph.D. Program in Systems, Synthetic, and Physical Biology, Rice University, Houston, TX, USA
| | - Maxwell G Hunt
- Ph.D. Program in Systems, Synthetic, and Physical Biology, Rice University, Houston, TX, USA
| | - Andrew M Mu
- Department of Biosciences, Rice University, Houston, TX, USA
| | - Ken Groszman
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ky V Hoang
- Center for Microbial Pathogenesis, Nationwide Children's Hospital, Columbus, OH, USA
- Infectious Diseases Institute, The Ohio State University, Columbus, OH, USA
| | - Kevin P Lorch
- Department of Bioengineering, Rice University, Houston, TX, USA
| | | | - John S Gunn
- Center for Microbial Pathogenesis, Nationwide Children's Hospital, Columbus, OH, USA
- Infectious Diseases Institute, The Ohio State University, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Jeffrey J Tabor
- Ph.D. Program in Systems, Synthetic, and Physical Biology, Rice University, Houston, TX, USA.
- Department of Biosciences, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
24
|
An B, Wang Y, Huang Y, Wang X, Liu Y, Xun D, Church GM, Dai Z, Yi X, Tang TC, Zhong C. Engineered Living Materials For Sustainability. Chem Rev 2023; 123:2349-2419. [PMID: 36512650 DOI: 10.1021/acs.chemrev.2c00512] [Citation(s) in RCA: 70] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Recent advances in synthetic biology and materials science have given rise to a new form of materials, namely engineered living materials (ELMs), which are composed of living matter or cell communities embedded in self-regenerating matrices of their own or artificial scaffolds. Like natural materials such as bone, wood, and skin, ELMs, which possess the functional capabilities of living organisms, can grow, self-organize, and self-repair when needed. They also spontaneously perform programmed biological functions upon sensing external cues. Currently, ELMs show promise for green energy production, bioremediation, disease treatment, and fabricating advanced smart materials. This review first introduces the dynamic features of natural living systems and their potential for developing novel materials. We then summarize the recent research progress on living materials and emerging design strategies from both synthetic biology and materials science perspectives. Finally, we discuss the positive impacts of living materials on promoting sustainability and key future research directions.
Collapse
Affiliation(s)
- Bolin An
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yanyi Wang
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yuanyuan Huang
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xinyu Wang
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yuzhu Liu
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Dongmin Xun
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - George M Church
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston 02115, Massachusetts United States.,Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston 02115, Massachusetts United States
| | - Zhuojun Dai
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xiao Yi
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Tzu-Chieh Tang
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston 02115, Massachusetts United States.,Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston 02115, Massachusetts United States
| | - Chao Zhong
- Center for Materials Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| |
Collapse
|
25
|
Lin J, Wang S, Wen L, Ye H, Shang S, Li J, Shu J, Zhou P. Targeting peptide-mediated interactions in omics. Proteomics 2023; 23:e2200175. [PMID: 36461811 DOI: 10.1002/pmic.202200175] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 11/28/2022] [Accepted: 11/28/2022] [Indexed: 12/05/2022]
Abstract
Peptide-mediated interactions (PMIs) play a crucial role in cell signaling network, which are responsible for about half of cellular protein-protein associations in the human interactome and have recently been recognized as a new kind of promising druggable target for drug development and disease therapy. In this article, we give a systematic review regarding the proteome-wide discovery of PMIs and targeting druggable PMIs (dPMIs) with chemical drugs, self-inhibitory peptides (SIPs) and protein agents, particularly focusing on their implications and applications for therapeutic purpose in omics. We also introduce computational peptidology strategies used to model, analyze, and design PMI-targeted molecular entities and further extend the concepts of protein context, direct/indirect readout, and enthalpy/entropy effect involved in PMIs. Current issues and future perspective on this topic are discussed. There is still a long way to go before establishment of efficient therapeutic strategies to target PMIs on the omics scale.
Collapse
Affiliation(s)
- Jing Lin
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Shaozhou Wang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Li Wen
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Haiyang Ye
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Shuyong Shang
- Institute of Ecological Environment Protection, Chengdu Normal University, Chengdu, China
| | - Juelin Li
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Jianping Shu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Peng Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| |
Collapse
|
26
|
Motmaen A, Dauparas J, Baek M, Abedi MH, Baker D, Bradley P. Peptide-binding specificity prediction using fine-tuned protein structure prediction networks. Proc Natl Acad Sci U S A 2023; 120:e2216697120. [PMID: 36802421 PMCID: PMC9992841 DOI: 10.1073/pnas.2216697120] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 01/09/2023] [Indexed: 02/23/2023] Open
Abstract
Peptide-binding proteins play key roles in biology, and predicting their binding specificity is a long-standing challenge. While considerable protein structural information is available, the most successful current methods use sequence information alone, in part because it has been a challenge to model the subtle structural changes accompanying sequence substitutions. Protein structure prediction networks such as AlphaFold model sequence-structure relationships very accurately, and we reasoned that if it were possible to specifically train such networks on binding data, more generalizable models could be created. We show that placing a classifier on top of the AlphaFold network and fine-tuning the combined network parameters for both classification and structure prediction accuracy leads to a model with strong generalizable performance on a wide range of Class I and Class II peptide-MHC interactions that approaches the overall performance of the state-of-the-art NetMHCpan sequence-based method. The peptide-MHC optimized model shows excellent performance in distinguishing binding and non-binding peptides to SH3 and PDZ domains. This ability to generalize well beyond the training set far exceeds that of sequence-only models and should be particularly powerful for systems where less experimental data are available.
Collapse
Affiliation(s)
- Amir Motmaen
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- Bioengineering Graduate Program, University of Washington, Seattle, WA98195
| | - Justas Dauparas
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Minkyung Baek
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
| | - Mohamad H. Abedi
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- Howard Hughes Medical Institute, University of Washington, Seattle, WA98195
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- Howard Hughes Medical Institute, University of Washington, Seattle, WA98195
| | - Philip Bradley
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA98109
| |
Collapse
|
27
|
Rosenberger G, Li W, Turunen M, He J, Subramaniam PS, Pampou S, Griffin AT, Karan C, Kerwin P, Murray D, Honig B, Liu Y, Califano A. Network-based elucidation of colon cancer drug resistance by phosphoproteomic time-series analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528736. [PMID: 36824919 PMCID: PMC9949144 DOI: 10.1101/2023.02.15.528736] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Aberrant signaling pathway activity is a hallmark of tumorigenesis and progression, which has guided targeted inhibitor design for over 30 years. Yet, adaptive resistance mechanisms, induced by rapid, context-specific signaling network rewiring, continue to challenge therapeutic efficacy. By leveraging progress in proteomic technologies and network-based methodologies, over the past decade, we developed VESPA-an algorithm designed to elucidate mechanisms of cell response and adaptation to drug perturbations-and used it to analyze 7-point phosphoproteomic time series from colorectal cancer cells treated with clinically-relevant inhibitors and control media. Interrogation of tumor-specific enzyme/substrate interactions accurately inferred kinase and phosphatase activity, based on their inferred substrate phosphorylation state, effectively accounting for signal cross-talk and sparse phosphoproteome coverage. The analysis elucidated time-dependent signaling pathway response to each drug perturbation and, more importantly, cell adaptive response and rewiring that was experimentally confirmed by CRISPRko assays, suggesting broad applicability to cancer and other diseases.
Collapse
Affiliation(s)
- George Rosenberger
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Wenxue Li
- Yale Cancer Biology Institute, Yale University, West Haven, CT, USA
| | - Mikko Turunen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Jing He
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Present address: Regeneron Genetics Center, Tarrytown, NY, USA
| | - Prem S Subramaniam
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Sergey Pampou
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Aaron T Griffin
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Medical Scientist Training Program, Columbia University Irving Medical Center, New York, NY, USA
| | - Charles Karan
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Patrick Kerwin
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Yansheng Liu
- Yale Cancer Biology Institute, Yale University, West Haven, CT, USA
- Department of Pharmacology, Yale University School of Medicine, New Haven, CT, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
- Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biochemistry & Molecular Biophysics, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| |
Collapse
|
28
|
Dixit R, Khambhati K, Supraja KV, Singh V, Lederer F, Show PL, Awasthi MK, Sharma A, Jain R. Application of machine learning on understanding biomolecule interactions in cellular machinery. BIORESOURCE TECHNOLOGY 2023; 370:128522. [PMID: 36565819 DOI: 10.1016/j.biortech.2022.128522] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/17/2022] [Accepted: 12/20/2022] [Indexed: 06/17/2023]
Abstract
Machine learning (ML) applications have become ubiquitous in all fields of research including protein science and engineering. Apart from protein structure and mutation prediction, scientists are focusing on knowledge gaps with respect to the molecular mechanisms involved in protein binding and interactions with other components in the experimental setups or the human body. Researchers are working on several wet-lab techniques and generating data for a better understanding of concepts and mechanics involved. The information like biomolecular structure, binding affinities, structure fluctuations and movements are enormous which can be handled and analyzed by ML. Therefore, this review highlights the significance of ML in understanding the biomolecular interactions while assisting in various fields of research such as drug discovery, nanomedicine, nanotoxicity and material science. Hence, the way ahead would be to force hand-in hand of laboratory work and computational techniques.
Collapse
Affiliation(s)
- Rewati Dixit
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Khushal Khambhati
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Kolli Venkata Supraja
- Waste Treatment Laboratory, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Haus-khas, New Delhi 110016, India
| | - Vijai Singh
- Department of Biosciences, School of Science, Indrashil University, Rajpur, Mehsana 382715, Gujarat, India
| | - Franziska Lederer
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany
| | - Pau-Loke Show
- Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; Department of Sustainable Engineering, Saveetha School of Engineering, SIMATS, Chennai 602105, India; Department of Chemical and Environmental Engineering, University of Nottingham, Malaysia, 43500 Semenyih, Selangor Darul Ehsan, Malaysia
| | - Mukesh Kumar Awasthi
- College of Natural Resources and Environment, Northwest A&F University, Yangling 712100, China
| | - Abhinav Sharma
- Institute Theory of Polymers, Leibniz Institute for Polymer Research, Hohe Strasse 6, 01069 Dresden, Germany
| | - Rohan Jain
- Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Bautzner landstrasse 400, 01328 Dresden, Germany.
| |
Collapse
|
29
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
30
|
Kumar K, Kumar P, Deb D, Unguresan ML, Muresan V. Artificial Intelligence and Machine Learning Based Intervention in Medical Infrastructure: A Review and Future Trends. Healthcare (Basel) 2023; 11:healthcare11020207. [PMID: 36673575 PMCID: PMC9859198 DOI: 10.3390/healthcare11020207] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/01/2023] [Accepted: 01/04/2023] [Indexed: 01/13/2023] Open
Abstract
People in the life sciences who work with Artificial Intelligence (AI) and Machine Learning (ML) are under increased pressure to develop algorithms faster than ever. The possibility of revealing innovative insights and speeding breakthroughs lies in using large datasets integrated on several levels. However, even if there is more data at our disposal than ever, only a meager portion is being filtered, interpreted, integrated, and analyzed. The subject of this technology is the study of how computers may learn from data and imitate human mental processes. Both an increase in the learning capacity and the provision of a decision support system at a size that is redefining the future of healthcare are enabled by AI and ML. This article offers a survey of the uses of AI and ML in the healthcare industry, with a particular emphasis on clinical, developmental, administrative, and global health implementations to support the healthcare infrastructure as a whole, along with the impact and expectations of each component of healthcare. Additionally, possible future trends and scopes of the utilization of this technology in medical infrastructure have also been discussed.
Collapse
Affiliation(s)
- Kamlesh Kumar
- Department of Electrical and Computer Science Engineering, Institute of Infrastructure Technology Research And Management, Ahmedabad 380026, India
| | - Prince Kumar
- Department of Electrical and Computer Science Engineering, Institute of Infrastructure Technology Research And Management, Ahmedabad 380026, India
| | - Dipankar Deb
- Department of Electrical and Computer Science Engineering, Institute of Infrastructure Technology Research And Management, Ahmedabad 380026, India
- Correspondence:
| | | | - Vlad Muresan
- Department of Automation, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania
| |
Collapse
|
31
|
Rechciński T, Kasprzak JD. A systematic review of nonsynonymous single nucleotide polymorphisms in the renin-angiotensin-aldosterone system. Cardiol J 2022; 29:1020-1027. [PMID: 34060646 PMCID: PMC9788732 DOI: 10.5603/cj.a2021.0055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 05/05/2021] [Indexed: 01/04/2023] Open
Abstract
In this recent publication review the authors aimed to collect evidence of impact of nonsynonymous single nucleotide polymorphisms (nsSNP) in the renin-angiotensin-aldosterone system on patients' phenotype not only regarding arterial hypertension and its complications, but also the impact on other diseases of interest outside the field of cardiovascular medicine. PubMed database records published between 2017-2020 were searched and all positive case-control studies or positive studies with human DNA were selected. The search identified 104 articles, of which 22 were included on the basis of the inclusion criteria. This paper presents the impact of 44 nsSNPs in panels for genes of renin, angiotensinogen, angiotensin-converting enzyme, angiotensin receptor and aldosterone on the clinical picture of investigated cohorts or on the peptide-protein interactions as consequence of nsSNPs. Genetic variability in nsSNPs of the RAAS is involved in the pathogenesis of arterial hypertension and its complications, and surprisingly also in the pathogenesis of conditions not associated with elevated blood pressure, like neoplasms or inflammatory diseases.
Collapse
|
32
|
Chang L, Mondal A, Perez A. Towards rational computational peptide design. FRONTIERS IN BIOINFORMATICS 2022; 2:1046493. [PMID: 36338806 PMCID: PMC9634169 DOI: 10.3389/fbinf.2022.1046493] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 10/11/2022] [Indexed: 11/16/2022] Open
Abstract
Peptides are prevalent in biology, mediating as many as 40% of protein-protein interactions, and involved in other cellular functions such as transport and signaling. Their ability to bind with high specificity make them promising therapeutical agents with intermediate properties between small molecules and large biologics. Beyond their biological role, peptides can be programmed to self-assembly, and they are already being used for functions as diverse as oligonuclotide delivery, tissue regeneration or as drugs. However, the transient nature of their interactions has limited the number of structures and knowledge of binding affinities available-and their flexible nature has limited the success of computational pipelines that predict the structures and affinities of these molecules. Fortunately, recent advances in experimental and computational pipelines are creating new opportunities for this field. We are starting to see promising predictions of complex structures, thermodynamic and kinetic properties. We believe in the following years this will lead to robust rational peptide design pipelines with success similar to those applied for small molecule drug discovery.
Collapse
Affiliation(s)
- Liwei Chang
- Department of Chemistry, University of Florida, Gainesville, FL, United States
- Quantum Theory Project, University of Florida, Gainesville, FL, United States
| | - Arup Mondal
- Department of Chemistry, University of Florida, Gainesville, FL, United States
- Quantum Theory Project, University of Florida, Gainesville, FL, United States
| | - Alberto Perez
- Department of Chemistry, University of Florida, Gainesville, FL, United States
- Quantum Theory Project, University of Florida, Gainesville, FL, United States
| |
Collapse
|
33
|
Protein-protein interaction and non-interaction predictions using gene sequence natural vector. Commun Biol 2022; 5:652. [PMID: 35780196 PMCID: PMC9250521 DOI: 10.1038/s42003-022-03617-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/21/2022] [Indexed: 12/02/2022] Open
Abstract
Predicting protein–protein interaction and non-interaction are two important different aspects of multi-body structure predictions, which provide vital information about protein function. Some computational methods have recently been developed to complement experimental methods, but still cannot effectively detect real non-interacting protein pairs. We proposed a gene sequence-based method, named NVDT (Natural Vector combine with Dinucleotide and Triplet nucleotide), for the prediction of interaction and non-interaction. For protein–protein non-interactions (PPNIs), the proposed method obtained accuracies of 86.23% for Homo sapiens and 85.34% for Mus musculus, and it performed well on three types of non-interaction networks. For protein-protein interactions (PPIs), we obtained accuracies of 99.20, 94.94, 98.56, 95.41, and 94.83% for Saccharomyces cerevisiae, Drosophila melanogaster, Helicobacter pylori, Homo sapiens, and Mus musculus, respectively. Furthermore, NVDT outperformed established sequence-based methods and demonstrated high prediction results for cross-species interactions. NVDT is expected to be an effective approach for predicting PPIs and PPNIs. Protein-protein non-interactions and interactions are distinguished and predicted by gene sequence using single nucleotide and contiguous nucleotides combined with machine learning models.
Collapse
|
34
|
Rickert CA, Lieleg O. Machine learning approaches for biomolecular, biophysical, and biomaterials research. BIOPHYSICS REVIEWS 2022; 3:021306. [PMID: 38505413 PMCID: PMC10914139 DOI: 10.1063/5.0082179] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 05/12/2022] [Indexed: 03/21/2024]
Abstract
A fluent conversation with a virtual assistant, person-tailored news feeds, and deep-fake images created within seconds-all those things that have been unthinkable for a long time are now a part of our everyday lives. What these examples have in common is that they are realized by different means of machine learning (ML), a technology that has fundamentally changed many aspects of the modern world. The possibility to process enormous amount of data in multi-hierarchical, digital constructs has paved the way not only for creating intelligent systems but also for obtaining surprising new insight into many scientific problems. However, in the different areas of biosciences, which typically rely heavily on the collection of time-consuming experimental data, applying ML methods is a bit more challenging: Here, difficulties can arise from small datasets and the inherent, broad variability, and complexity associated with studying biological objects and phenomena. In this Review, we give an overview of commonly used ML algorithms (which are often referred to as "machines") and learning strategies as well as their applications in different bio-disciplines such as molecular biology, drug development, biophysics, and biomaterials science. We highlight how selected research questions from those fields were successfully translated into machine readable formats, discuss typical problems that can arise in this context, and provide an overview of how to resolve those encountered difficulties.
Collapse
|
35
|
Misiura M, Shroff R, Thyer R, Kolomeisky AB. DLPacker: Deep learning for prediction of amino acid side chain conformations in proteins. Proteins 2022; 90:1278-1290. [PMID: 35122328 DOI: 10.1002/prot.26311] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/03/2021] [Accepted: 12/07/2021] [Indexed: 12/20/2022]
Abstract
Prediction of side chain conformations of amino acids in proteins (also termed "packing") is an important and challenging part of protein structure prediction with many interesting applications in protein design. A variety of methods for packing have been developed but more accurate ones are still needed. Machine learning (ML) methods have recently become a powerful tool for solving various problems in diverse areas of science, including structural biology. In this study, we evaluate the potential of deep neural networks (DNNs) for prediction of amino acid side chain conformations. We formulate the problem as image-to-image transformation and train a U-net style DNN to solve the problem. We show that our method outperforms other physics-based methods by a significant margin: reconstruction RMSDs for most amino acids are about 20% smaller compared to SCWRL4 and Rosetta Packer with RMSDs for bulky hydrophobic amino acids Phe, Tyr, and Trp being up to 50% smaller.
Collapse
Affiliation(s)
- Mikita Misiura
- Department of Chemistry, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
| | | | - Ross Thyer
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Anatoly B Kolomeisky
- Department of Chemistry, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA.,Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA.,Department of Physics and Astronomy, Center for Theoretical Biological Physics, Rice University, Houston, Texas, USA
| |
Collapse
|
36
|
Abdin O, Nim S, Wen H, Kim PM. PepNN: a deep attention model for the identification of peptide binding sites. Commun Biol 2022; 5:503. [PMID: 35618814 PMCID: PMC9135736 DOI: 10.1038/s42003-022-03445-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 05/03/2022] [Indexed: 11/09/2022] Open
Abstract
Protein-peptide interactions play a fundamental role in many cellular processes, but remain underexplored experimentally and difficult to model computationally. Here, we present PepNN-Struct and PepNN-Seq, structure and sequence-based approaches for the prediction of peptide binding sites on a protein. A main difficulty for the prediction of peptide-protein interactions is the flexibility of peptides and their tendency to undergo conformational changes upon binding. Motivated by this, we developed reciprocal attention to simultaneously update the encodings of peptide and protein residues while enforcing symmetry, allowing for information flow between the two inputs. PepNN integrates this module with modern graph neural network layers and a series of transfer learning steps are used during training to compensate for the scarcity of peptide-protein complex information. We show that PepNN-Struct achieves consistently high performance across different benchmark datasets. We also show that PepNN makes reasonable peptide-agnostic predictions, allowing for the identification of novel peptide binding proteins.
Collapse
Affiliation(s)
- Osama Abdin
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Satra Nim
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Han Wen
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Philip M Kim
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, M5S 3E1, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 3E1, Canada.
| |
Collapse
|
37
|
Dionne U, Percival LJ, Chartier FJM, Landry CR, Bisson N. SRC homology 3 domains: multifaceted binding modules. Trends Biochem Sci 2022; 47:772-784. [PMID: 35562294 DOI: 10.1016/j.tibs.2022.04.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/30/2022] [Accepted: 04/11/2022] [Indexed: 12/15/2022]
Abstract
The assembly of complexes following the detection of extracellular signals is often controlled by signaling proteins comprising multiple peptide binding modules. The SRC homology (SH)3 family represents the archetypical modular protein interaction module, with ~300 annotated SH3 domains in humans that regulate an impressive array of signaling processes. We review recent findings regarding the allosteric contributions of SH3 domains host protein context, their phosphoregulation, and their roles in phase separation that challenge the simple model in which SH3s are considered to be portable domains binding to specific proline-rich peptide motifs.
Collapse
Affiliation(s)
- Ugo Dionne
- Centre de recherche sur le cancer et Centre de recherche du CHU de Québec - Université Laval, QC, Canada; Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), QC, Canada
| | - Lily J Percival
- Centre de recherche sur le cancer et Centre de recherche du CHU de Québec - Université Laval, QC, Canada; Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), QC, Canada; School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, The Michael Smith Building, Manchester, UK
| | - François J M Chartier
- Centre de recherche sur le cancer et Centre de recherche du CHU de Québec - Université Laval, QC, Canada; Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), QC, Canada
| | - Christian R Landry
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), QC, Canada; Institute of Integrative and Systems Biology, Université Laval, Quebec, QC, Canada; Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Quebec, QC, Canada; Department of Biology, Université Laval, Quebec, QC, Canada.
| | - Nicolas Bisson
- Centre de recherche sur le cancer et Centre de recherche du CHU de Québec - Université Laval, QC, Canada; Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), QC, Canada; Department of Molecular Biology, Medical Biochemistry and Pathology, Université Laval, Quebec, QC, Canada.
| |
Collapse
|
38
|
Xia W, Zheng L, Fang J, Li F, Zhou Y, Zeng Z, Zhang B, Li Z, Li H, Zhu F. PFmulDL: a novel strategy enabling multi-class and multi-label protein function annotation by integrating diverse deep learning methods. Comput Biol Med 2022; 145:105465. [PMID: 35366467 DOI: 10.1016/j.compbiomed.2022.105465] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 03/22/2022] [Accepted: 03/25/2022] [Indexed: 02/06/2023]
Abstract
Bioinformatic annotation of protein function is essential but extremely sophisticated, which asks for extensive efforts to develop effective prediction method. However, the existing methods tend to amplify the representativeness of the families with large number of proteins by misclassifying the proteins in the families with small number of proteins. That is to say, the ability of the existing methods to annotate proteins in the 'rare classes' remains limited. Herein, a new protein function annotation strategy, PFmulDL, integrating multiple deep learning methods, was thus constructed. First, the recurrent neural network was integrated, for the first time, with the convolutional neural network to facilitate the function annotation. Second, a transfer learning method was introduced to the model construction for further improving the prediction performances. Third, based on the latest data of Gene Ontology, the newly constructed model could annotate the largest number of protein families comparing with the existing methods. Finally, this newly constructed model was found capable of significantly elevating the prediction performance for the 'rare classes' without sacrificing that for the 'major classes'. All in all, due to the emerging requirements on improving the prediction performance for the proteins in 'rare classes', this new strategy would become an essential complement to the existing methods for protein function prediction. All the models and source codes are freely available and open to all users at: https://github.com/idrblab/PFmulDL.
Collapse
Affiliation(s)
- Weiqi Xia
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Jiebin Fang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Honglin Li
- School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| |
Collapse
|
39
|
Generation of functional oligopeptides that promote osteogenesis based on unsupervised deep learning of protein IDRs. Bone Res 2022; 10:23. [PMID: 35228528 PMCID: PMC8885677 DOI: 10.1038/s41413-022-00193-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 12/21/2021] [Indexed: 12/30/2022] Open
Abstract
Deep learning (DL) is currently revolutionizing peptide drug development due to both computational advances and the substantial recent expansion of digitized biological data. However, progress in oligopeptide drug development has been limited, likely due to the lack of suitable datasets and difficulty in identifying informative features to use as inputs for DL models. Here, we utilized an unsupervised deep learning model to learn a semantic pattern based on the intrinsically disordered regions of ~171 known osteogenic proteins. Subsequently, oligopeptides were generated from this semantic pattern based on Monte Carlo simulation, followed by in vivo functional characterization. A five amino acid oligopeptide (AIB5P) had strong bone-formation-promoting effects, as determined in multiple mouse models (e.g., osteoporosis, fracture, and osseointegration of implants). Mechanistically, we showed that AIB5P promotes osteogenesis by binding to the integrin α5 subunit and thereby activating FAK signaling. In summary, we successfully established an oligopeptide discovery strategy based on a DL model and demonstrated its utility from cytological screening to animal experimental verification.
Collapse
|
40
|
Sharifi Tabar M, Francis H, Yeo D, Bailey CG, Rasko JEJ. Mapping oncogenic protein interactions for precision medicine. Int J Cancer 2022; 151:7-19. [PMID: 35113472 PMCID: PMC9306658 DOI: 10.1002/ijc.33954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 01/25/2022] [Accepted: 01/26/2022] [Indexed: 11/10/2022]
Abstract
Normal protein‐protein interactions (normPPIs) occur with high fidelity to regulate almost every physiological process. In cancer, this highly organised and precisely regulated network is disrupted, hijacked or reprogrammed resulting in oncogenic protein‐protein interactions (oncoPPIs). OncoPPIs, which can result from genomic alterations, are a hallmark of many types of cancers. Recent technological advances in the field of mass spectrometry (MS)‐based interactomics, structural biology and drug discovery have prompted scientists to identify and characterise oncoPPIs. Disruption of oncoPPI interfaces has become a major focus of drug discovery programs and has resulted in the use of PPI‐specific drugs clinically. However, due to several technical hurdles, studies to build a reference oncoPPI map for various cancer types have not been undertaken. Therefore, there is an urgent need for experimental workflows to overcome the existing challenges in studying oncoPPIs in various cancers and to build comprehensive reference maps. Here, we discuss the important hurdles for characterising oncoPPIs and propose a three‐phase multidisciplinary workflow to identify and characterise oncoPPIs. Systematic identification of cancer‐type‐specific oncogenic interactions will spur new opportunities for PPI‐focused drug discovery projects and precision medicine.
Collapse
Affiliation(s)
- Mehdi Sharifi Tabar
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.,Cancer & Gene Regulation Laboratory Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.,Faculty of Medicine & Health, The University of Sydney, Sydney, NSW, Australia
| | - Habib Francis
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.,Cancer & Gene Regulation Laboratory Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.,Faculty of Medicine & Health, The University of Sydney, Sydney, NSW, Australia
| | - Dannel Yeo
- Faculty of Medicine & Health, The University of Sydney, Sydney, NSW, Australia.,Li Ka Shing Cell & Gene Therapy Program, The University of Sydney, Camperdown, NSW, Australia.,Cell & Molecular Therapies, Royal Prince Alfred Hospital, Sydney Local Health District, Camperdown, NSW, Australia
| | - Charles G Bailey
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.,Cancer & Gene Regulation Laboratory Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.,Faculty of Medicine & Health, The University of Sydney, Sydney, NSW, Australia
| | - John E J Rasko
- Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW, Australia.,Faculty of Medicine & Health, The University of Sydney, Sydney, NSW, Australia.,Li Ka Shing Cell & Gene Therapy Program, The University of Sydney, Camperdown, NSW, Australia.,Cell & Molecular Therapies, Royal Prince Alfred Hospital, Sydney Local Health District, Camperdown, NSW, Australia
| |
Collapse
|
41
|
Li R, Li L, Xu Y, Yang J. Machine learning meets omics: applications and perspectives. Brief Bioinform 2021; 23:6425809. [PMID: 34791021 DOI: 10.1093/bib/bbab460] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 09/29/2021] [Accepted: 10/07/2021] [Indexed: 02/07/2023] Open
Abstract
The innovation of biotechnologies has allowed the accumulation of omics data at an alarming rate, thus introducing the era of 'big data'. Extracting inherent valuable knowledge from various omics data remains a daunting problem in bioinformatics. Better solutions often need some kind of more innovative methods for efficient handlings and effective results. Recent advancements in integrated analysis and computational modeling of multi-omics data helped address such needs in an increasingly harmonious manner. The development and application of machine learning have largely advanced our insights into biology and biomedicine and greatly promoted the development of therapeutic strategies, especially for precision medicine. Here, we propose a comprehensive survey and discussion on what happened, is happening and will happen when machine learning meets omics. Specifically, we describe how artificial intelligence can be applied to omics studies and review recent advancements at the interface between machine learning and the ever-widest range of omics including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well as those at the single-cell resolution. We also discuss and provide a synthesis of ideas, new insights, current challenges and perspectives of machine learning in omics.
Collapse
Affiliation(s)
- Rufeng Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Lixin Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Yungang Xu
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Juan Yang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China.,Key Laboratory of Environment and Genes Related to Diseases (Xi'an Jiaotong University), Ministry of Education of China, Xi'an 710061, P. R. China
| |
Collapse
|
42
|
AlQuraishi M, Sorger PK. Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms. Nat Methods 2021; 18:1169-1180. [PMID: 34608321 PMCID: PMC8793939 DOI: 10.1038/s41592-021-01283-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 08/27/2021] [Indexed: 02/08/2023]
Abstract
Deep learning using neural networks relies on a class of machine-learnable models constructed using 'differentiable programs'. These programs can combine mathematical equations specific to a particular domain of natural science with general-purpose, machine-learnable components trained on experimental data. Such programs are having a growing impact on molecular and cellular biology. In this Perspective, we describe an emerging 'differentiable biology' in which phenomena ranging from the small and specific (for example, one experimental assay) to the broad and complex (for example, protein folding) can be modeled effectively and efficiently, often by exploiting knowledge about basic natural phenomena to overcome the limitations of sparse, incomplete and noisy data. By distilling differentiable biology into a small set of conceptual primitives and illustrative vignettes, we show how it can help to address long-standing challenges in integrating multimodal data from diverse experiments across biological scales. This promises to benefit fields as diverse as biophysics and functional genomics.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Laboratory of Systems Pharmacology, Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| | - Peter K Sorger
- Laboratory of Systems Pharmacology, Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
43
|
Kim S, Jana B, Go EM, Lee JE, Jin S, An EK, Hwang J, Sim Y, Son S, Kim D, Kim C, Jin JO, Kwak SK, Ryu JH. Intramitochondrial Disulfide Polymerization Controls Cancer Cell Fate. ACS NANO 2021; 15:14492-14508. [PMID: 34478266 DOI: 10.1021/acsnano.1c04015] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Recent advances in supramolecular chemistry research have led to the development of artificial chemical systems that can form self-assembled structures that imitate proteins involved in the regulation of cellular function. However, intracellular polymerization systems that operate inside living cells have been seldom reported. In this study, we developed an intramitochondrial polymerization-induced self-assembly system for regulating the cellular fate of cancer cells. It showed that polymeric disulfide formation inside cells occurred due to the high reactive oxygen species (ROS) concentration of cancer mitochondria. This polymerization barely occurs elsewhere in the cell owing to the reductive intracellular environment. The polymerization of the thiol-containing monomers further increases the ROS level inside the mitochondria, thereby autocatalyzing the polymerization process and creating fibrous polymeric structures. This process induces dysfunction of the mitochondria, which in turn activates cell necroptosis. Thus, this in situ polymerization system shows great potential for cancer treatment, including that of drug-resistant cancers.
Collapse
Affiliation(s)
- Sangpil Kim
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Batakrishna Jana
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Eun Min Go
- Department of Energy Engineering, School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Ji Eun Lee
- Department of Energy Engineering, School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Seongeon Jin
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Eun-Koung An
- Department of Medical Biotechnology, Yeungnam University, Gyeongsan 38541, South Korea
- Research Institute of Cell Culture, Yeungnam University, Gyeongsan 38541, South Korea
| | - Juyoung Hwang
- Department of Medical Biotechnology, Yeungnam University, Gyeongsan 38541, South Korea
- Research Institute of Cell Culture, Yeungnam University, Gyeongsan 38541, South Korea
| | - Youjung Sim
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Sehee Son
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Dohyun Kim
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Chaekyu Kim
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Jun-O Jin
- Department of Medical Biotechnology, Yeungnam University, Gyeongsan 38541, South Korea
| | - Sang Kyu Kwak
- Department of Energy Engineering, School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Ja-Hyoung Ryu
- Department of Chemistry, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| |
Collapse
|
44
|
Lei Y, Li S, Liu Z, Wan F, Tian T, Li S, Zhao D, Zeng J. A deep-learning framework for multi-level peptide-protein interaction prediction. Nat Commun 2021; 12:5465. [PMID: 34526500 PMCID: PMC8443569 DOI: 10.1038/s41467-021-25772-4] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 08/27/2021] [Indexed: 12/12/2022] Open
Abstract
Peptide-protein interactions are involved in various fundamental cellular functions and their identification is crucial for designing efficacious peptide therapeutics. Recently, a number of computational methods have been developed to predict peptide-protein interactions. However, most of the existing prediction approaches heavily depend on high-resolution structure data. Here, we present a deep learning framework for multi-level peptide-protein interaction prediction, called CAMP, including binary peptide-protein interaction prediction and corresponding peptide binding residue identification. Comprehensive evaluation demonstrated that CAMP can successfully capture the binary interactions between peptides and proteins and identify the binding residues along the peptides involved in the interactions. In addition, CAMP outperformed other state-of-the-art methods on binary peptide-protein interaction prediction. CAMP can serve as a useful tool in peptide-protein interaction prediction and identification of important binding residues in the peptides, which can thus facilitate the peptide drug discovery process.
Collapse
Affiliation(s)
- Yipin Lei
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Shuya Li
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, China
| | - Ziyi Liu
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, China
| | - Fangping Wan
- Machine Learning Department, Silexon AI Technology Co., Ltd., Nanjing, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Shao Li
- Institute of TCM-X, MOE Key Laboratory of Bioinformatics, Bioinformatics Division, BNRist, Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
45
|
Jandova Z, Vargiu AV, Bonvin AMJJ. Native or Non-Native Protein-Protein Docking Models? Molecular Dynamics to the Rescue. J Chem Theory Comput 2021; 17:5944-5954. [PMID: 34342983 PMCID: PMC8444332 DOI: 10.1021/acs.jctc.1c00336] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Indexed: 11/29/2022]
Abstract
Molecular docking excels at creating a plethora of potential models of protein-protein complexes. To correctly distinguish the favorable, native-like models from the remaining ones remains, however, a challenge. We assessed here if a protocol based on molecular dynamics (MD) simulations would allow distinguishing native from non-native models to complement scoring functions used in docking. To this end, the first models for 25 protein-protein complexes were generated using HADDOCK. Next, MD simulations complemented with machine learning were used to discriminate between native and non-native complexes based on a combination of metrics reporting on the stability of the initial models. Native models showed higher stability in almost all measured properties, including the key ones used for scoring in the Critical Assessment of PRedicted Interaction (CAPRI) competition, namely the positional root mean square deviations and fraction of native contacts from the initial docked model. A random forest classifier was trained, reaching a 0.85 accuracy in correctly distinguishing native from non-native complexes. Reasonably modest simulation lengths of the order of 50-100 ns are sufficient to reach this accuracy, which makes this approach applicable in practice.
Collapse
Affiliation(s)
- Zuzana Jandova
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| | - Attilio Vittorio Vargiu
- Physics
Department, University of Cagliari, Cittadella
Universitaria, S.P. 8 km 0.700, 09042 Monserrato, Italy
| | - Alexandre M. J. J. Bonvin
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| |
Collapse
|
46
|
Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. J Mol Biol 2021; 433:167196. [PMID: 34390736 DOI: 10.1016/j.jmb.2021.167196] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 08/03/2021] [Accepted: 08/04/2021] [Indexed: 11/29/2022]
Abstract
Intrinsically disordered proteins (IDPs) constitute a broad set of proteins with few uniting and many diverging properties. IDPs-and intrinsically disordered regions (IDRs) interspersed between folded domains-are generally characterized as having no persistent tertiary structure; instead they interconvert between a large number of different and often expanded structures. IDPs and IDRs are involved in an enormously wide range of biological functions and reveal novel mechanisms of interactions, and while they defy the common structure-function paradigm of folded proteins, their structural preferences and dynamics are important for their function. We here discuss open questions in the field of IDPs and IDRs, focusing on areas where machine learning and other computational methods play a role. We discuss computational methods aimed to predict transiently formed local and long-range structure, including methods for integrative structural biology. We discuss the many different ways in which IDPs and IDRs can bind to other molecules, both via short linear motifs, as well as in the formation of larger dynamic complexes such as biomolecular condensates. We discuss how experiments are providing insight into such complexes and may enable more accurate predictions. Finally, we discuss the role of IDPs in disease and how new methods are needed to interpret the mechanistic effects of genomic variants in IDPs.
Collapse
Affiliation(s)
- Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory & Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen. Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
47
|
Schistocins: Novel antimicrobial peptides encrypted in the Schistosoma mansoni Kunitz Inhibitor SmKI-1. Biochim Biophys Acta Gen Subj 2021; 1865:129989. [PMID: 34389467 DOI: 10.1016/j.bbagen.2021.129989] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/30/2021] [Accepted: 08/06/2021] [Indexed: 02/06/2023]
Abstract
BACKGROUND Here we describe a new class of cryptides (peptides encrypted within a larger protein) with antimicrobial properties, named schistocins, derived from SmKI-1, a key protein in Shistosoma mansoni survival. This is a multi-functional protein with biotechnological potential usage as a therapeutic molecule in inflammatory diseases and to control schistosomiasis. METHODS We used our algorithm enCrypted, to perform an in silico proteolysis of SmKI-1 and a screening for potential antimicrobial activity. The selected peptides were chemically synthesized, tested in vitro and evaluated by both structural (CD, NMR) and biophysical (ITC) studies to access their structure-function relationship. RESULTS EnCrypted was capable of predicting AMPs in SmKI-1. Our biophysical analyses described a membrane-induced conformational change from random coil-to-α-helix and a peptide-membrane equilibrium for all schistocins. Our structural data allowed us to suggest a well-known mode of peptide-membrane interaction in which electrostatic attraction between the cationic peptides and anionic membranes results in the bilayer disordering. Moreover, the NMR exchange H/D data with the higher entropic contribution observed for the peptide-membrane interaction showed that shistocins have different orientations upon the membrane. CONCLUSIONS This work demonstrate the robustness for using the physicochemical features of predicted peptides in the identification of new bioactive cryptides besides the relevance of combining these analyses with biophysical methods to understand the peptide-membrane affinity and improve further algorithms. GENERAL SIGNIFICANCE Bioprospecting cryptides can be conducted through data mining of protein databases demonstrating the success of our strategy. The peptides-based agents derived from SmKI-1 might have high impact for system-biology and biotechnology.
Collapse
|
48
|
Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 2021; 25:1315-1360. [PMID: 33844136 PMCID: PMC8040371 DOI: 10.1007/s11030-021-10217-3] [Citation(s) in RCA: 426] [Impact Index Per Article: 106.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 03/22/2021] [Indexed: 02/06/2023]
Abstract
Drug designing and development is an important area of research for pharmaceutical companies and chemical scientists. However, low efficacy, off-target delivery, time consumption, and high cost impose a hurdle and challenges that impact drug design and discovery. Further, complex and big data from genomics, proteomics, microarray data, and clinical trials also impose an obstacle in the drug discovery pipeline. Artificial intelligence and machine learning technology play a crucial role in drug discovery and development. In other words, artificial neural networks and deep learning algorithms have modernized the area. Machine learning and deep learning algorithms have been implemented in several drug discovery processes such as peptide synthesis, structure-based virtual screening, ligand-based virtual screening, toxicity prediction, drug monitoring and release, pharmacophore modeling, quantitative structure-activity relationship, drug repositioning, polypharmacology, and physiochemical activity. Evidence from the past strengthens the implementation of artificial intelligence and deep learning in this field. Moreover, novel data mining, curation, and management techniques provided critical support to recently developed modeling algorithms. In summary, artificial intelligence and deep learning advancements provide an excellent opportunity for rational drug design and discovery process, which will eventually impact mankind. The primary concern associated with drug design and development is time consumption and production cost. Further, inefficiency, inaccurate target delivery, and inappropriate dosage are other hurdles that inhibit the process of drug delivery and development. With advancements in technology, computer-aided drug design integrating artificial intelligence algorithms can eliminate the challenges and hurdles of traditional drug design and development. Artificial intelligence is referred to as superset comprising machine learning, whereas machine learning comprises supervised learning, unsupervised learning, and reinforcement learning. Further, deep learning, a subset of machine learning, has been extensively implemented in drug design and development. The artificial neural network, deep neural network, support vector machines, classification and regression, generative adversarial networks, symbolic learning, and meta-learning are examples of the algorithms applied to the drug design and discovery process. Artificial intelligence has been applied to different areas of drug design and development process, such as from peptide synthesis to molecule design, virtual screening to molecular docking, quantitative structure-activity relationship to drug repositioning, protein misfolding to protein-protein interactions, and molecular pathway identification to polypharmacology. Artificial intelligence principles have been applied to the classification of active and inactive, monitoring drug release, pre-clinical and clinical development, primary and secondary drug screening, biomarker development, pharmaceutical manufacturing, bioactivity identification and physiochemical properties, prediction of toxicity, and identification of mode of action.
Collapse
Affiliation(s)
- Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Devesh Srivastava
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Swati Tiwari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University (Formerly DCE), Shahbad Daulatpur, Bawana Road, Delhi, 110042, India.
| |
Collapse
|
49
|
A face recognition software framework based on principal component analysis. PLoS One 2021; 16:e0254965. [PMID: 34293012 PMCID: PMC8384131 DOI: 10.1371/journal.pone.0254965] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 07/07/2021] [Indexed: 12/01/2022] Open
Abstract
Face recognition, as one of the major biometrics identification methods, has been
applied in different fields involving economics, military, e-commerce, and
security. Its touchless identification process and non-compulsory rule to users
are irreplaceable by other approaches, such as iris recognition or fingerprint
recognition. Among all face recognition techniques, principal component analysis
(PCA), proposed in the earliest stage, still attracts researchers because of its
property of reducing data dimensionality without losing important information.
Nevertheless, establishing a PCA-based face recognition system is still
time-consuming, since there are different problems that need to be considered in
practical applications, such as illumination, facial expression, or shooting
angle. Furthermore, it still costs a lot of effort for software developers to
integrate toolkit implementations in applications. This paper provides a
software framework for PCA-based face recognition aimed at assisting software
developers to customize their applications efficiently. The framework describes
the complete process of PCA-based face recognition, and in each step, multiple
variations are offered for different requirements. Some of the variations in the
same step can work collaboratively and some steps can be omitted in specific
situations; thus, the total number of variations exceeds 150. The implementation
of all approaches presented in the framework is provided.
Collapse
|
50
|
Siedhoff NE, Illig AM, Schwaneberg U, Davari MD. PyPEF-An Integrated Framework for Data-Driven Protein Engineering. J Chem Inf Model 2021; 61:3463-3476. [PMID: 34260225 DOI: 10.1021/acs.jcim.1c00099] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Data-driven strategies are gaining increased attention in protein engineering due to recent advances in access to large experimental databanks of proteins, next-generation sequencing (NGS), high-throughput screening (HTS) methods, and the development of artificial intelligence algorithms. However, the reliable prediction of beneficial amino acid substitutions, their combination, and the effect on functional properties remain the most significant challenges in protein engineering, which is applied to develop proteins and enzymes for biocatalysis, biomedicine, and life sciences. Here, we present a general-purpose framework (PyPEF: pythonic protein engineering framework) for performing data-driven protein engineering using machine learning methods combined with techniques from signal processing and statistical physics. PyPEF guides the identification and selection of beneficial proteins of a defined sequence space by systematically or randomly exploring the fitness of variants and by sampling random evolution pathways. The performance of PyPEF was evaluated concerning its predictive accuracy and throughput on four public protein and enzyme data sets using common regression models. It was proved that the program could efficiently predict the fitness of protein sequences for different target properties (predictive models with coefficient of determination values ranging from 0.58 to 0.92). By combining machine learning and protein evolution, PyPEF enabled the screening of proteins with various functions, reaching a screening capacity of more than 500,000 protein sequence variants in the timeframe of only a few minutes on a personal computer. PyPEF displayed significant accuracies on four public data sets (different proteins and properties) and underlined the potential of integrating data-driven technologies for covering different philosophies by either predicting the fitness of the variants to the highest accuracy accounting for epistatic effects or capturing the general trend of introduced mutations on the fitness in directed protein evolution campaigns. In essence, PyPEF can provide a powerful solution to current sequence exploration and combinatorial problems faced in protein engineering through exhaustive in silico screening of the sequence space.
Collapse
Affiliation(s)
- Niklas E Siedhoff
- Institute of Biotechnology, RWTH Aachen University, Worringer Weg 3, 52074 Aachen, Germany
| | | | - Ulrich Schwaneberg
- Institute of Biotechnology, RWTH Aachen University, Worringer Weg 3, 52074 Aachen, Germany.,DWI-Leibniz Institute for Interactive Materials, Forckenbeckstraße 50, 52074 Aachen, Germany
| | - Mehdi D Davari
- Institute of Biotechnology, RWTH Aachen University, Worringer Weg 3, 52074 Aachen, Germany
| |
Collapse
|