1
|
Kong D, Qian J, Gao C, Wang Y, Shi T, Ye C. Machine Learning Empowering Microbial Cell Factory: A Comprehensive Review. Appl Biochem Biotechnol 2025:10.1007/s12010-025-05260-x. [PMID: 40397295 DOI: 10.1007/s12010-025-05260-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2025] [Indexed: 05/22/2025]
Abstract
The wide application of machine learning has provided more possibilities for biological manufacturing, and the combination of machine learning and synthetic biology technology has ignited even more brilliant sparks, which has created an unpredictable value for the upgrading of microbial cell factories. The review delves into the synergies between machine learning and synthetic biology to create research worth investigating in biotechnology. We explore relevant databases, toolboxes, and machine learning-derived models. Furthermore, we examine specific applications of this combined approach in chemical production, human health, and environmental remediation. By elucidating these successful integrations, this review aims to provide valuable guidance for future research at the intersection of biomanufacturing and artificial intelligence.
Collapse
Affiliation(s)
- Dechun Kong
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing, 210023, People's Republic of China
| | - Jinyi Qian
- Ministry of Education Key Laboratory of NSLSCS, Nanjing Normal University, Nanjing, 210023, People's Republic of China
| | - Cong Gao
- School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi, 214122, People's Republic of China
| | - Yuetong Wang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing, 210023, People's Republic of China.
| | - Tianqiong Shi
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing, 210023, People's Republic of China.
- State Key Laboratory of Microbial Technology, Nanjing Normal University, Nanjing, 210023, People's Republic of China.
| | - Chao Ye
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing, 210023, People's Republic of China.
- Ministry of Education Key Laboratory of NSLSCS, Nanjing Normal University, Nanjing, 210023, People's Republic of China.
| |
Collapse
|
2
|
Agoni C, Fernández-Díaz R, Timmons PB, Adelfio A, Gómez H, Shields DC. Molecular Modelling in Bioactive Peptide Discovery and Characterisation. Biomolecules 2025; 15:524. [PMID: 40305228 PMCID: PMC12025251 DOI: 10.3390/biom15040524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2024] [Revised: 03/12/2025] [Accepted: 04/01/2025] [Indexed: 05/02/2025] Open
Abstract
Molecular modelling is a vital tool in the discovery and characterisation of bioactive peptides, providing insights into their structural properties and interactions with biological targets. Many models predicting bioactive peptide function or structure rely on their intrinsic properties, including the influence of amino acid composition, sequence, and chain length, which impact stability, folding, aggregation, and target interaction. Homology modelling predicts peptide structures based on known templates. Peptide-protein interactions can be explored using molecular docking techniques, but there are challenges related to the inherent flexibility of peptides, which can be addressed by more computationally intensive approaches that consider their movement over time, called molecular dynamics (MD). Virtual screening of many peptides, usually against a single target, enables rapid identification of potential bioactive peptides from large libraries, typically using docking approaches. The integration of artificial intelligence (AI) has transformed peptide discovery by leveraging large amounts of data. AlphaFold is a general protein structure prediction tool based on deep learning that has greatly improved the predictions of peptide conformations and interactions, in addition to providing estimates of model accuracy at each residue which greatly guide interpretation. Peptide function and structure prediction are being further enhanced using Protein Language Models (PLMs), which are large deep-learning-derived statistical models that learn computer representations useful to identify fundamental patterns of proteins. Recent methodological developments are discussed in the context of canonical peptides, as well as those with modifications and cyclisations. In designing potential peptide therapeutics, the main outstanding challenge for these methods is the incorporation of diverse non-canonical amino acids and cyclisations.
Collapse
Affiliation(s)
- Clement Agoni
- School of Medicine, University College Dublin, D04 C1P1 Dublin, Ireland;
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, D04 C1P Dublin, Ireland
- Discipline of Pharmaceutical Sciences, School of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Raúl Fernández-Díaz
- School of Medicine, University College Dublin, D04 C1P1 Dublin, Ireland;
- IBM Research, D15 HN66 Dublin, Ireland
| | | | - Alessandro Adelfio
- Nuritas Ltd., Joshua Dawson House, D02 RY95 Dublin, Ireland; (P.B.T.); (A.A.); (H.G.)
| | - Hansel Gómez
- Nuritas Ltd., Joshua Dawson House, D02 RY95 Dublin, Ireland; (P.B.T.); (A.A.); (H.G.)
| | - Denis C. Shields
- School of Medicine, University College Dublin, D04 C1P1 Dublin, Ireland;
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, D04 C1P Dublin, Ireland
| |
Collapse
|
3
|
Grover A, Singh S, Sindhu S, Lath A, Kumar S. Advances in cyclotide research: bioactivity to cyclotide-based therapeutics. Mol Divers 2025:10.1007/s11030-025-11113-w. [PMID: 39862350 DOI: 10.1007/s11030-025-11113-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 01/07/2025] [Indexed: 01/27/2025]
Abstract
Cyclotides are a class of plant-derived cyclic peptides having a distinctive structure with a cyclic cystine knot (CCK) motif. They are stable molecules that naturally play a role in plant defense. Till date, more than 750 cyclotides have been reported among diverse plant taxa belonging to Cucurbitaceae, Violaceae, Rubiaceae, Solanaceae, and Fabaceae. These native cyclotides exhibit several bioactivities, such as anti-bacterial, anti-HIV, anti-fungal, pesticidal, cytotoxic, and hemolytic activities which have immense significance in agriculture and therapeutics. The general mode of action of cyclotides is related to their structure, where their hydrophobic face penetrates the cell membrane and disrupts it to exhibit anti-microbial, cytotoxic, or hemolytic activities. Thus, the structure-activity relationship is of significance in cyclotides. Further, owing to their, small size, stability, and potential to interact and cross the membrane barrier of cells, they make promising choices for developing peptide-based biologics. However, challenges, such as production complexity, pharmacokinetic limitations, and off-target effects hinder their development. Advancements in cyclotide engineering, such as peptide grafting, ligand conjugation, and nanocarrier integration, heterologous production along with computational design optimization, can help overcome these challenges. Given the potential of these cyclic peptides, the present review focuses on the diversity, bioactivities, and structure-activity relationships of cyclotides, and advancements in cyclotides engineering emphasizing their unique attributes for diverse medical and biotechnological applications.
Collapse
Affiliation(s)
- Ankita Grover
- Department of Microbiology, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Sawraj Singh
- Department of Microbiology, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Sonal Sindhu
- Department of Medical Biotechnology, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Amit Lath
- Department of Biotechnology, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Sanjay Kumar
- Department of Microbiology, Maharshi Dayanand University, Rohtak, Haryana, 124001, India.
| |
Collapse
|
4
|
de Azevedo ALK, Gomig THB, Batista M, de Oliveira JC, Cavalli IJ, Gradia DF, Ribeiro EMDSF. Peptidomics and Machine Learning-based Evaluation of Noncoding RNA-Derived Micropeptides in Breast Cancer: Expression Patterns and Functional/Therapeutic Insights. J Transl Med 2024; 104:102150. [PMID: 39393531 DOI: 10.1016/j.labinv.2024.102150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 09/20/2024] [Accepted: 10/03/2024] [Indexed: 10/13/2024] Open
Abstract
Breast cancer is a highly heterogeneous disease characterized by different subtypes arising from molecular alterations that give the disease different phenotypes, clinical behaviors, and prognostic. The noncoding RNA (ncRNA)-derived micropeptides (MPs) represent a novel layer of complexity in cancer study once they can be biologically active and can present potential as biomarkers and also in therapeutics. However, few large-scale studies address the expression of these peptides at the peptidomics level or evaluate their functions and potential in peptide-based therapeutics for breast cancer. In this study, we propose deepening the landscape of ncRNA-derived MPs in breast cancer subtypes and advance the comprehension of the relevance of these molecules to the disease. First, we constructed a 16,349 unique putative MP sequence data set by integrating 2 previously published lists of predicted ncRNA-derived MPs. We evaluated its expression on high-throughput mass spectrometry data of breast tumor samples from different subtypes. Next, we applied several machine and deep learning tools, such as AntiCP 2.0, MULocDeep, PEPstrMOD, Peptipedia, and PreAIP, to predict its functions, cellular localization, tertiary structure, physicochemical features, and other properties related to therapeutics. We identified 58 peptides expressed on breast tissue, including 27 differentially expressed MPs in tumor compared with nontumor samples and MPs exhibiting tumor or subtype specificity. These peptides presented physicochemical features compatible with the canonical proteome and were predicted to influence the tumor immune environment and participate in cell communication, metabolism, and signaling processes. In addition, some MPs presented potential as anticancer, antiinflammatory, and antiangiogenic molecules. Our data demonstrate that MPs derived from ncRNAs have expression patterns associated with specific breast cancer subtypes and tumor specificity, thus highlighting their potential as biomarkers for molecular classification. We also reinforce the relevance of MPs as biologically active molecules that play a role in breast tumorigenesis, besides their potential in peptide-based therapeutics.
Collapse
Affiliation(s)
| | | | - Michel Batista
- Laboratory of Applied Sciences and Technologies in Health, Carlos Chagas Institute, Fiocruz, Curitiba, Brazil; Mass Spectrometry Facility-RPT02H, Carlos Chagas Institute, Fiocruz, Curitiba, Brazil
| | | | - Iglenir João Cavalli
- Genetics Post-Graduation Program, Genetics Department, Federal University of Paraná, Curitiba, Brazil
| | - Daniela Fiori Gradia
- Genetics Post-Graduation Program, Genetics Department, Federal University of Paraná, Curitiba, Brazil
| | | |
Collapse
|
5
|
Hashemi S, Vosough P, Taghizadeh S, Savardashtaki A. Therapeutic peptide development revolutionized: Harnessing the power of artificial intelligence for drug discovery. Heliyon 2024; 10:e40265. [PMID: 39605829 PMCID: PMC11600032 DOI: 10.1016/j.heliyon.2024.e40265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 10/07/2024] [Accepted: 11/07/2024] [Indexed: 11/29/2024] Open
Abstract
Due to the spread of antibiotic resistance, global attention is focused on its inhibition and the expansion of effective medicinal compounds. The novel functional properties of peptides have opened up new horizons in personalized medicine. With artificial intelligence methods combined with therapeutic peptide products, pharmaceuticals and biotechnology advance drug development rapidly and reduce costs. Short-chain peptides inhibit a wide range of pathogens and have great potential for targeting diseases. To address the challenges of synthesis and sustainability, artificial intelligence methods, namely machine learning, must be integrated into their production. Learning methods can use complicated computations to select the active and toxic compounds of the drug and its metabolic activity. Through this comprehensive review, we investigated the artificial intelligence method as a potential tool for finding peptide-based drugs and providing a more accurate analysis of peptides through the introduction of predictable databases for effective selection and development.
Collapse
Affiliation(s)
- Samaneh Hashemi
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Parisa Vosough
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Saeed Taghizadeh
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
- Pharmaceutical Science Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Amir Savardashtaki
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
- Infertility Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
6
|
Cabas-Mora G, Daza A, Soto-García N, Garrido V, Alvarez D, Navarrete M, Sarmiento-Varón L, Sepúlveda Yañez JH, Davari MD, Cadet F, Olivera-Nappa Á, Uribe-Paredes R, Medina-Ortiz D. Peptipedia v2.0: a peptide sequence database and user-friendly web platform. A major update. Database (Oxford) 2024; 2024:baae113. [PMID: 39514414 PMCID: PMC11734279 DOI: 10.1093/database/baae113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 08/23/2024] [Accepted: 09/27/2024] [Indexed: 11/16/2024]
Abstract
In recent years, peptides have gained significant relevance due to their therapeutic properties. The surge in peptide production and synthesis has generated vast amounts of data, enabling the creation of comprehensive databases and information repositories. Advances in sequencing techniques and artificial intelligence have further accelerated the design of tailor-made peptides. However, leveraging these techniques requires versatile and continuously updated storage systems, along with tools that facilitate peptide research and the implementation of machine learning for predictive systems. This work introduces Peptipedia v2.0, one of the most comprehensive public repositories of peptides, supporting biotechnological research by simplifying peptide study and annotation. Peptipedia v2.0 has expanded its collection by over 45% with peptide sequences that have reported biological activities. The functional biological activity tree has been revised and enhanced, incorporating new categories such as cosmetic and dermatological activities, molecular binding, and antiageing properties. Utilizing protein language models and machine learning, more than 90 binary classification models have been trained, validated, and incorporated into Peptipedia v2.0. These models exhibit average sensitivities and specificities of 0.877±0.0530 and 0.873±0.054, respectively, facilitating the annotation of more than 3.6 million peptide sequences with unknown biological activities, also registered in Peptipedia v2.0. Additionally, Peptipedia v2.0 introduces description tools based on structural and ontological properties and user-friendly machine learning tools to facilitate the application of machine learning strategies to study peptide sequences. Database URL: https://peptipedia.cl/.
Collapse
Affiliation(s)
- Gabriel Cabas-Mora
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, Punta Arenas 6210427, Chile
| | - Anamaría Daza
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Avenida Beauchef 851, Santiago 8320000, Chile
| | - Nicole Soto-García
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, Punta Arenas 6210427, Chile
| | - Valentina Garrido
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, Punta Arenas 6210427, Chile
| | - Diego Alvarez
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, Punta Arenas 6210005,Chile
| | - Marcelo Navarrete
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, Punta Arenas 6210005,Chile
- Escuela de Medicina, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, Punta Arenas 6210427, Chile
| | - Lindybeth Sarmiento-Varón
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, Punta Arenas 6210005,Chile
| | - Julieta H Sepúlveda Yañez
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, Punta Arenas 6210005,Chile
- Facultad de Ciencias de la Salud, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, Punta Arenas 6210427, Chile
| | - Mehdi D Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, Halle 06120, Germany
| | - Frederic Cadet
- PEACCEL, Artificial Intelligence Department, AI for Biologics, Paris 75013, France
| | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Avenida Beauchef 851, Santiago 8320000, Chile
| | - Roberto Uribe-Paredes
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, Punta Arenas 6210427, Chile
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Avenida Beauchef 851, Santiago 8320000, Chile
| | - David Medina-Ortiz
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, Punta Arenas 6210427, Chile
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Avenida Beauchef 851, Santiago 8320000, Chile
| |
Collapse
|
7
|
Zareei S, Khorsand B, Dantism A, Zareei N, Asgharzadeh F, Zahraee SS, Mashreghi Kashan S, Hekmatirad S, Amini S, Ghasemi F, Moradnia M, Vaghf A, Hemmatpour A, Hourfar H, Niknia S, Johari A, Salimi F, Fariborzi N, Shojaei Z, Asiaei E, Shabani H. PeptiHub: a curated repository of precisely annotated cancer-related peptides with advanced utilities for peptide exploration and discovery. Database (Oxford) 2024; 2024:baae092. [PMID: 39308247 PMCID: PMC11417155 DOI: 10.1093/database/baae092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 08/07/2024] [Accepted: 09/07/2024] [Indexed: 09/26/2024]
Abstract
Peptihub (https://bioinformaticscollege.ir/peptihub/) is a meticulously curated repository of cancer-related peptides (CRPs) that have been documented in scientific literature. A diverse collection of CRPs is included in the PeptiHub, showcasing a spectrum of effects and activities. While some peptides demonstrated significant anticancer efficacy, others exhibited no discernible impact, and some even possessed alternative non-drug functionalities, including drug carrier or carcinogenic attributes. Presently, Peptihub houses 874 CRPs, subjected to evaluation across 10 distinct organism categories, 26 organs, and 438 cell lines. Each entry in the database is accompanied by easily accessible 3D conformations, obtained either experimentally or through predictive methodology. Users are provided with three search frameworks offering basic, advanced, and BLAST sequence search options. Furthermore, precise annotations of peptides enable users to explore CRPs based on their specific activities (anticancer, no effect, insignificant effect, carcinogen, and others) and their effectiveness (rate and IC50) under cancer conditions, specifically within individual organs. This unique property facilitates the construction of robust training and testing datasets. Additionally, PeptiHub offers 1141 features with the convenience of selecting the most pertinent features to address their specific research questions. Features include aaindex1 (in six main subcategories: alpha propensities, beta propensity, composition indices, hydrophobicity, physicochemical properties, and other properties), amino acid composition (Amino acid Composition and Dipeptide Composition), and Grouped Amino Acid Composition (Grouped amino acid composition, Grouped dipeptide composition, and Conjoint triad) categories. These utilities not only speed up machine learning-based peptide design but also facilitate peptide classification. Database URL: https://bioinformaticscollege.ir/peptihub/.
Collapse
Affiliation(s)
- Sara Zareei
- Department of Cell & Molecular Biology, Faculty of Biological Sciences, Kharazmi University, South Mofateh Ave. , Tehran 15719-14911, Iran
| | - Babak Khorsand
- Department of Neurology, University of California, 200 S. Manchester Ave., Suite 206 Orange, Irvine, CA 92868-4280, USA
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Square , Mashhad 9177948974, Iran
| | - Alireza Dantism
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Jalal AlAhmad HWY, Tehran 14115-111, Iran
| | - Neda Zareei
- Transplant Research Center, Shiraz University of Medical Sciences, Khalili Str, Shiraz 7193711351, Iran
| | - Fereshteh Asgharzadeh
- Department of Medical Physiology, Faculty of Medicine, Mashhad University of Medical Sciences, Azadi Sq., Mashhad 9177948564, Iran
| | - Shadi Shams Zahraee
- Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Dr. Shahriari Sq., Tehran 1983969411, Iran
| | - Samane Mashreghi Kashan
- Department of Medicinal Biotechnology, Faculty of Advanced Technology in Medicine, Golestan University of Medical Sciences, Shast Kola Road, Gorgan 4918936316, Iran
| | - Shirin Hekmatirad
- Department of Toxicology and Pharmacology, Faculty of Pharmacy, Tehran University of Medical Sciences, 16 Azar Ave, Tehran 1416753955, Iran
| | - Shila Amini
- Department of Genetics, Faculty of Advanced Science and Technology, Medical Sciences Branch, Islamic Azad University, Shariati St., Tehran 19395/1495, Iran
| | - Fatemeh Ghasemi
- Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Azadi Square , Mashhad 9177948974, Iran
| | - Maryam Moradnia
- Division of Occupational and Environmental Medicine, Department of Laboratory Medicine, Faculty of Medicine, Lund University, Lund BOX 117,221 00, Sweden
| | - Atena Vaghf
- Department of Medical Biotechnology, Faculty of Advanced Technologies, Shahrekord University of Medical Science, Kashani BLVD., Shahrekord 8815713471, Iran
| | - Anahid Hemmatpour
- Department of Clinical Biochemistry, Faculty of Medicine, Shahid Sadoughi University of Medical Sciences and Health Services, Aalam Sq., Yazd 8915173149, Iran
| | - Hamdam Hourfar
- Bioprocess Engineering Research Group, Department of Industrial and Environmental Biotechnology, National Institute for Genetic Engineering and Biotechnology, Tehran-Karaj HWY, Tehran 14965/161, Iran
| | - Soudabeh Niknia
- Department of Biology, Kavian Institute of Higher Education, Elahiyeh Blv., Mashhad 91863-74915, Iran
| | - Ali Johari
- Department of Biology, Kavian Institute of Higher Education, Elahiyeh Blv., Mashhad 91863-74915, Iran
| | - Fatemeh Salimi
- Department of Clinical Science, Faculty of Veterinary Medicine, Razi University, Taq-e Bostan, Kermanshah 6714414971, Iran
| | - Neda Fariborzi
- Department of Biology and Biotechnology, Faculty of Molecular Biology and Genetics, University of Pavia, S.da Nuova, Pavia 65, 27100, Italy
| | - Zohreh Shojaei
- Department of Cell & Molecular Biology, Faculty of Biological Sciences, Kharazmi University, South Mofateh Ave. , Tehran 15719-14911, Iran
| | - Elaheh Asiaei
- Systems Biotechnology Research Group, Department of Industrial and Environmental Biotechnology, National Institute for Genetic Engineering and Biotechnology, Tehran-Karaj HWY., Tehran 14965/161, Iran
| | - Hossein Shabani
- Department of Biology, Faculty of Biosciences, Tehran North Branch, Islamic Azad University, Vafadar Blv., Tehran 1651153311, Iran
| |
Collapse
|
8
|
Isaac KS, Combe M, Potter G, Sokolenko S. Machine learning tools for peptide bioactivity evaluation - Implications for cell culture media optimization and the broader cultivated meat industry. Curr Res Food Sci 2024; 9:100842. [PMID: 39435450 PMCID: PMC11491887 DOI: 10.1016/j.crfs.2024.100842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 09/07/2024] [Indexed: 10/23/2024] Open
Abstract
Although bioactive peptides have traditionally been studied for their health-promoting qualities in the context of nutrition and medicine, the past twenty years have seen a steady increase in their application to cell culture media optimization. Complex natural sources of bioactive peptides, such as hydrolysates, offer a sustainable and cost-effective means of promoting cellular growth, making them an essential component of scaling-up cultivated meat production. However, the sheer diversity of hydrolysates makes product selection difficult, highlighting the need for functional characterization. Traditional wet-lab techniques for isolating and estimating peptide bioactivity cannot keep pace with peptide identification using high-throughput tools such as mass spectrometry, requiring the development and use of machine learning-based classifiers. This review provides a comprehensive list of available software tools to evaluate peptide bioactivity, classified and compared based on the algorithm, training set, functionality, and limitations of the underlying models. We curated independent test sets to compare the predictive performance of different models based on specific bioactivity classification relevant to promoting cell culture growth: antioxidant and anti-inflammatory. A comprehensive screening of all bioactivity classifiers revealed that while there are approximately fifty tools to elucidate antimicrobial activity and sixteen that predict anti-inflammatory activity, fewer tools are available for other functionalities related to cell growth - five that predict antioxidant activity and two for growth factor and/or cell signaling prediction. A thorough evaluation of the available tools revealed significant issues with sensitivity, specificity, and overall accuracy. Despite the overall interest in estimating peptide bioactivity, our work highlights key gaps in the broader adoption of existing software for the specific application of cell culture media optimization in the context of cultivated meat and beyond.
Collapse
Affiliation(s)
- Kathy Sharon Isaac
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| | - Michelle Combe
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| | | | - Stanislav Sokolenko
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| |
Collapse
|
9
|
Fernández-Díaz R, Cossio-Pérez R, Agoni C, Lam HT, Lopez V, Shields DC. AutoPeptideML: a study on how to build more trustworthy peptide bioactivity predictors. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae555. [PMID: 39292535 PMCID: PMC11438549 DOI: 10.1093/bioinformatics/btae555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/08/2024] [Accepted: 09/17/2024] [Indexed: 09/20/2024]
Abstract
MOTIVATION Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build their own custom models. We examine different steps in the development life-cycle of peptide bioactivity binary predictors and identify key steps where automation cannot only result in a more accessible method, but also more robust and interpretable evaluation leading to more trustworthy models. RESULTS We present a new automated method for drawing negative peptides that achieves better balance between specificity and generalization than current alternatives. We study the effect of homology-based partitioning for generating the training and testing data subsets and demonstrate that model performance is overestimated when no such homology correction is used, which indicates that prior studies may have overestimated their performance when applied to new peptide sequences. We also conduct a systematic analysis of different protein language models as peptide representation methods and find that they can serve as better descriptors than a naive alternative, but that there is no significant difference across models with different sizes or algorithms. Finally, we demonstrate that an ensemble of optimized traditional machine learning algorithms can compete with more complex neural network models, while being more computationally efficient. We integrate these findings into AutoPeptideML, an easy-to-use AutoML tool to allow researchers without a computational background to build new predictive models for peptide bioactivity in a matter of minutes. AVAILABILITY AND IMPLEMENTATION Source code, documentation, and data are available at https://github.com/IBM/AutoPeptideML and a dedicated web-server at http://peptide.ucd.ie/AutoPeptideML. A static version of the software to ensure the reproduction of the results is available at https://zenodo.org/records/13363975.
Collapse
Affiliation(s)
- Raúl Fernández-Díaz
- IBM Research, Dublin, Dublin D15 HN66, Ireland
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- The SFI Centre for Research Training in Genomics Data Science, Ireland
| | - Rodrigo Cossio-Pérez
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- Department of Science and Technology, National University of Quilmes, Bernal B1876, Provincia de Buenos Aires, Argentina
| | - Clement Agoni
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- Discipline of Pharmaceutical Sciences, School of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | | | | | - Denis C Shields
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
| |
Collapse
|
10
|
Goles M, Daza A, Cabas-Mora G, Sarmiento-Varón L, Sepúlveda-Yañez J, Anvari-Kazemabad H, Davari MD, Uribe-Paredes R, Olivera-Nappa Á, Navarrete MA, Medina-Ortiz D. Peptide-based drug discovery through artificial intelligence: towards an autonomous design of therapeutic peptides. Brief Bioinform 2024; 25:bbae275. [PMID: 38856172 PMCID: PMC11163380 DOI: 10.1093/bib/bbae275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/23/2024] [Accepted: 06/04/2024] [Indexed: 06/11/2024] Open
Abstract
With their diverse biological activities, peptides are promising candidates for therapeutic applications, showing antimicrobial, antitumour and hormonal signalling capabilities. Despite their advantages, therapeutic peptides face challenges such as short half-life, limited oral bioavailability and susceptibility to plasma degradation. The rise of computational tools and artificial intelligence (AI) in peptide research has spurred the development of advanced methodologies and databases that are pivotal in the exploration of these complex macromolecules. This perspective delves into integrating AI in peptide development, encompassing classifier methods, predictive systems and the avant-garde design facilitated by deep-generative models like generative adversarial networks and variational autoencoders. There are still challenges, such as the need for processing optimization and careful validation of predictive models. This work outlines traditional strategies for machine learning model construction and training techniques and proposes a comprehensive AI-assisted peptide design and validation pipeline. The evolving landscape of peptide design using AI is emphasized, showcasing the practicality of these methods in expediting the development and discovery of novel peptides within the context of peptide-based drug discovery.
Collapse
Affiliation(s)
- Montserrat Goles
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Anamaría Daza
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Gabriel Cabas-Mora
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Lindybeth Sarmiento-Varón
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, 6210005, Punta Arenas, Chile
| | - Julieta Sepúlveda-Yañez
- Facultad de Ciencias de la Salud, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Hoda Anvari-Kazemabad
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Mehdi D Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120, Halle, Germany
| | - Roberto Uribe-Paredes
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Marcelo A Navarrete
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, 6210005, Punta Arenas, Chile
- Escuela de Medicina, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - David Medina-Ortiz
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| |
Collapse
|
11
|
Hesamzadeh P, Seif A, Mahmoudzadeh K, Ganjali Koli M, Mostafazadeh A, Nayeri K, Mirjafary Z, Saeidian H. De novo antioxidant peptide design via machine learning and DFT studies. Sci Rep 2024; 14:6473. [PMID: 38499731 PMCID: PMC10948870 DOI: 10.1038/s41598-024-57247-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 03/15/2024] [Indexed: 03/20/2024] Open
Abstract
Antioxidant peptides (AOPs) are highly valued in food and pharmaceutical industries due to their significant role in human function. This study introduces a novel approach to identifying robust AOPs using a deep generative model based on sequence representation. Through filtration with a deep-learning classification model and subsequent clustering via the Butina cluster algorithm, twelve peptides (GP1-GP12) with potential antioxidant capacity were predicted. Density functional theory (DFT) calculations guided the selection of six peptides for synthesis and biological experiments. Molecular orbital representations revealed that the HOMO for these peptides is primarily localized on the indole segment, underscoring its pivotal role in antioxidant activity. All six synthesized peptides exhibited antioxidant activity in the DPPH assay, while the hydroxyl radical test showed suboptimal results. A hemolysis assay confirmed the non-hemolytic nature of the generated peptides. Additionally, an in silico investigation explored the potential inhibitory interaction between the peptides and the Keap1 protein. Analysis revealed that ligands GP3, GP4, and GP12 induced significant structural changes in proteins, affecting their stability and flexibility. These findings highlight the capability of machine learning approaches in generating novel antioxidant peptides.
Collapse
Affiliation(s)
- Parsa Hesamzadeh
- Department of Chemistry, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Abdolvahab Seif
- Dipartimento di Fisica, Universita' di Padova, Via Marzolo 8, 35131, Padua, Italy
- Department of Chemistry, University of Turin, Via Pietro Giuria 7, 10125, Turin, Italy
| | - Kazem Mahmoudzadeh
- Department of Organic Chemistry and Oil, Faculty of Chemistry, Shahid Beheshti University, Tehran, Iran
| | | | - Amrollah Mostafazadeh
- Cellular and Molecular Biology Research Center, Health Research Institute, Babol University of Medical Sciences, Babol, Iran
| | - Kosar Nayeri
- Student Research Committee, Babol University of Medical Sciences, Babol, Iran
| | - Zohreh Mirjafary
- Department of Chemistry, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Hamid Saeidian
- Department of Science, Payame Noor University (PNU), PO Box: 19395-4697, Tehran, Iran.
| |
Collapse
|
12
|
Iwaniak A, Minkiewicz P, Darewicz M. Bioinformatics and bioactive peptides from foods: Do they work together? ADVANCES IN FOOD AND NUTRITION RESEARCH 2024; 108:35-111. [PMID: 38461003 DOI: 10.1016/bs.afnr.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2024]
Abstract
We live in the Big Data Era which affects many aspects of science, including research on bioactive peptides derived from foods, which during the last few decades have been a focus of interest for scientists. These two issues, i.e., the development of computer technologies and progress in the discovery of novel peptides with health-beneficial properties, are closely interrelated. This Chapter presents the example applications of bioinformatics for studying biopeptides, focusing on main aspects of peptide analysis as the starting point, including: (i) the role of peptide databases; (ii) aspects of bioactivity prediction; (iii) simulation of peptide release from proteins. Bioinformatics can also be used for predicting other features of peptides, including ADMET, QSAR, structure, and taste. To answer the question asked "bioinformatics and bioactive peptides from foods: do they work together?", currently it is almost impossible to find examples of peptide research with no bioinformatics involved. However, theoretical predictions are not equivalent to experimental work and always require critical scrutiny. The aspects of compatibility of in silico and in vitro results are also summarized herein.
Collapse
Affiliation(s)
- Anna Iwaniak
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland.
| | - Piotr Minkiewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| | - Małgorzata Darewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| |
Collapse
|
13
|
Asega AF, Barros BCSC, Chaves AFA, Oliveira AK, Bertholim L, Kitano ES, Serrano SMT. Mouse skin peptidomic analysis of the hemorrhage induced by a snake venom metalloprotease. Amino Acids 2023; 55:1103-1119. [PMID: 37389729 DOI: 10.1007/s00726-023-03299-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 06/22/2023] [Indexed: 07/01/2023]
Abstract
Hemorrhage induced by snake venom metalloproteases (SVMPs) results from proteolysis, capillary disruption, and blood extravasation. HF3, a potent SVMP of Bothrops jararaca, induces hemorrhage at pmol doses in the mouse skin. To gain insight into the hemorrhagic process, the main goal of this study was to analyze changes in the skin peptidome generated by injection of HF3, using approaches of mass spectrometry-based untargeted peptidomics. The results revealed that the sets of peptides found in the control and HF3-treated skin samples were distinct and derived from the cleavage of different proteins. Peptide bond cleavage site identification in the HF3-treated skin showed compatibility with trypsin-like serine proteases and cathepsins, suggesting the activation of host proteinases. Acetylated peptides, which originated from the cleavage at positions in the N-terminal region of proteins in both samples, were identified for the first time in the mouse skin peptidome. The number of peptides acetylated at the residue after the first Met residue, mostly Ser and Ala, was higher than that of peptides acetylated at the initial Met. Proteins cleaved in the hemorrhagic skin participate in cholesterol metabolism, PPAR signaling, and in the complement and coagulation cascades, indicating the impairment of these biological processes. The peptidomic analysis also indicated the emergence of peptides with potential biological activities, including pheromone, cell penetrating, quorum sensing, defense, and cell-cell communication in the mouse skin. Interestingly, peptides generated in the hemorrhagic skin promoted the inhibition of collagen-induced platelet aggregation and could act synergistically in the local tissue damage induced by HF3.
Collapse
Affiliation(s)
- Amanda F Asega
- Laboratory of Applied Toxinology, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, Av. Vital Brasil 1500, São Paulo, 05503-000, Brazil
| | - Bianca C S C Barros
- Laboratory of Applied Toxinology, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, Av. Vital Brasil 1500, São Paulo, 05503-000, Brazil
| | - Alison F A Chaves
- Laboratory of Applied Toxinology, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, Av. Vital Brasil 1500, São Paulo, 05503-000, Brazil
| | - Ana K Oliveira
- Laboratory of Applied Toxinology, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, Av. Vital Brasil 1500, São Paulo, 05503-000, Brazil
| | - Luciana Bertholim
- Laboratory of Applied Toxinology, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, Av. Vital Brasil 1500, São Paulo, 05503-000, Brazil
| | - Eduardo S Kitano
- Laboratory of Applied Toxinology, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, Av. Vital Brasil 1500, São Paulo, 05503-000, Brazil
| | - Solange M T Serrano
- Laboratory of Applied Toxinology, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, Av. Vital Brasil 1500, São Paulo, 05503-000, Brazil.
| |
Collapse
|
14
|
Li L, Wu J, Lyon CJ, Jiang L, Hu TY. Clinical Peptidomics: Advances in Instrumentation, Analyses, and Applications. BME FRONTIERS 2023; 4:0019. [PMID: 37849662 PMCID: PMC10521655 DOI: 10.34133/bmef.0019] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 04/19/2023] [Indexed: 10/19/2023] Open
Abstract
Extensive effort has been devoted to the discovery, development, and validation of biomarkers for early disease diagnosis and prognosis as well as rapid evaluation of the response to therapeutic interventions. Genomic and transcriptomic profiling are well-established means to identify disease-associated biomarkers. However, analysis of disease-associated peptidomes can also identify novel peptide biomarkers or signatures that provide sensitive and specific diagnostic and prognostic information for specific malignant, chronic, and infectious diseases. Growing evidence also suggests that peptidomic changes in liquid biopsies may more effectively detect changes in disease pathophysiology than other molecular methods. Knowledge gained from peptide-based diagnostic, therapeutic, and imaging approaches has led to promising new theranostic applications that can increase their bioavailability in target tissues at reduced doses to decrease side effects and improve treatment responses. However, despite major advances, multiple factors can still affect the utility of peptidomic data. This review summarizes several remaining challenges that affect peptide biomarker discovery and their use as diagnostics, with a focus on technological advances that can improve the detection, identification, and monitoring of peptide biomarkers for personalized medicine.
Collapse
Affiliation(s)
- Lin Li
- Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, LA, USA
- Department of Laboratory Medicine and Sichuan Provincial Key Laboratory for Human Disease Gene Study, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China
| | - Jing Wu
- Department of Clinical Laboratory, Third Central Hospital of Tianjin, Tianjin Institute of Hepatobiliary Disease, Tianjin Key Laboratory of Artificial Cell, Artificial Cell Engineering Technology Research Center of Public Health Ministry, Tianjin, China
| | - Christopher J. Lyon
- Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, LA, USA
| | - Li Jiang
- Department of Laboratory Medicine and Sichuan Provincial Key Laboratory for Human Disease Gene Study, Sichuan Academy of Medical Sciences and Sichuan Provincial People’s Hospital, Chengdu, China
| | - Tony Y. Hu
- Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, LA, USA
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA, USA
| |
Collapse
|
15
|
Sowers A, Wang G, Xing M, Li B. Advances in Antimicrobial Peptide Discovery via Machine Learning and Delivery via Nanotechnology. Microorganisms 2023; 11:1129. [PMID: 37317103 PMCID: PMC10223199 DOI: 10.3390/microorganisms11051129] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/13/2023] [Accepted: 04/19/2023] [Indexed: 06/16/2023] Open
Abstract
Antimicrobial peptides (AMPs) have been investigated for their potential use as an alternative to antibiotics due to the increased demand for new antimicrobial agents. AMPs, widely found in nature and obtained from microorganisms, have a broad range of antimicrobial protection, allowing them to be applied in the treatment of infections caused by various pathogenic microorganisms. Since these peptides are primarily cationic, they prefer anionic bacterial membranes due to electrostatic interactions. However, the applications of AMPs are currently limited owing to their hemolytic activity, poor bioavailability, degradation from proteolytic enzymes, and high-cost production. To overcome these limitations, nanotechnology has been used to improve AMP bioavailability, permeation across barriers, and/or protection against degradation. In addition, machine learning has been investigated due to its time-saving and cost-effective algorithms to predict AMPs. There are numerous databases available to train machine learning models. In this review, we focus on nanotechnology approaches for AMP delivery and advances in AMP design via machine learning. The AMP sources, classification, structures, antimicrobial mechanisms, their role in diseases, peptide engineering technologies, currently available databases, and machine learning techniques used to predict AMPs with minimal toxicity are discussed in detail.
Collapse
Affiliation(s)
- Alexa Sowers
- Department of Orthopaedics, School of Medicine, West Virginia University, Morgantown, WV 26506, USA
- School of Pharmacy, West Virginia University, Morgantown, WV 26506, USA
| | - Guangshun Wang
- Department of Pathology and Microbiology, College of Medicine, University of Nebraska Medical Center, 985900 Nebraska Medical Center, Omaha, NE 68198, USA
| | - Malcolm Xing
- Department of Mechanical Engineering, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | - Bingyun Li
- Department of Orthopaedics, School of Medicine, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
16
|
Thi Phan L, Woo Park H, Pitti T, Madhavan T, Jeon YJ, Manavalan B. MLACP 2.0: An updated machine learning tool for anticancer peptide prediction. Comput Struct Biotechnol J 2022; 20:4473-4480. [PMID: 36051870 PMCID: PMC9421197 DOI: 10.1016/j.csbj.2022.07.043] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 07/25/2022] [Accepted: 07/25/2022] [Indexed: 12/24/2022] Open
Abstract
We present a novel meta-approach, MLACP 2.0, and implement it as a user-friendly webserver for the accurate identification of ACPs. MLACP 2.0 employed 11 different encoding schemes and eight different classifiers, including convolutional neural networks, to create a stable meta-model. Benchmarking study has demonstrated that MLACP 2.0 achieves superior performance in ACP prediction compared to publicly available state-of-the-art predictors.
Anticancer peptides are emerging anticancer drug that offers fewer side effects and is more effective than chemotherapy and targeted therapy. Predicting anticancer peptides from sequence information is one of the most challenging tasks in immunoinformatics. In the past ten years, machine learning-based approaches have been proposed for identifying ACP activity from peptide sequences. These methods include our previous method MLACP (developed in 2017) which made a significant impact on anticancer research. MLACP tool has been widely used by the research community, however, its robustness must be improved significantly for its continued practical application. In this study, the first large non-redundant training and independent datasets were constructed for ACP research. Using the training dataset, the study explored a wide range of feature encodings and developed their respective models using seven different conventional classifiers. Subsequently, a subset of encoding-based models was selected for each classifier based on their performance, whose predicted scores were concatenated and trained through a convolutional neural network (CNN), whose corresponding predictor is named MLACP 2.0. The evaluation of MLACP 2.0 with a very diverse independent dataset showed excellent performance and significantly outperformed the recent ACP prediction tools. Additionally, MLACP 2.0 exhibits superior performance during cross-validation and independent assessment when compared to CNN-based embedding models and conventional single models. Consequently, we anticipate that our proposed MLACP 2.0 will facilitate the design of hypothesis-driven experiments by making it easier to discover novel ACPs. The MLACP 2.0 is freely available at https://balalab-skku.org/mlacp2.
Collapse
|
17
|
Medina-Ortiz D, Contreras S, Amado-Hinojosa J, Torres-Almonacid J, Asenjo JA, Navarrete M, Olivera-Nappa Á. Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering. Front Mol Biosci 2022; 9:898627. [PMID: 35911960 PMCID: PMC9329607 DOI: 10.3389/fmolb.2022.898627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 06/23/2022] [Indexed: 11/13/2022] Open
Abstract
Computational methods in protein engineering often require encoding amino acid sequences, i.e., converting them into numeric arrays. Physicochemical properties are a typical choice to define encoders, where we replace each amino acid by its value for a given property. However, what property (or group thereof) is best for a given predictive task remains an open problem. In this work, we generalize property-based encoding strategies to maximize the performance of predictive models in protein engineering. First, combining text mining and unsupervised learning, we partitioned the AAIndex database into eight semantically-consistent groups of properties. We then applied a non-linear PCA within each group to define a single encoder to represent it. Then, in several case studies, we assess the performance of predictive models for protein and peptide function, folding, and biological activity, trained using the proposed encoders and classical methods (One Hot Encoder and TAPE embeddings). Models trained on datasets encoded with our encoders and converted to signals through the Fast Fourier Transform (FFT) increased their precision and reduced their overfitting substantially, outperforming classical approaches in most cases. Finally, we propose a preliminary methodology to create de novo sequences with desired properties. All these results offer simple ways to increase the performance of general and complex predictive tasks in protein engineering without increasing their complexity.
Collapse
Affiliation(s)
- David Medina-Ortiz
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas, Chile
| | - Sebastian Contreras
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany
- *Correspondence: Sebastian Contreras, ; Álvaro Olivera-Nappa,
| | - Juan Amado-Hinojosa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | - Jorge Torres-Almonacid
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas, Chile
| | - Juan A. Asenjo
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | | | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, Universidad de Chile, Santiago, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
- *Correspondence: Sebastian Contreras, ; Álvaro Olivera-Nappa,
| |
Collapse
|
18
|
Li Y, Li X, Liu Y, Yao Y, Huang G. MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides. Pharmaceuticals (Basel) 2022; 15:707. [PMID: 35745625 PMCID: PMC9231127 DOI: 10.3390/ph15060707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
Bioactive peptides are typically small functional peptides with 2-20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.
Collapse
Affiliation(s)
- You Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Xueyong Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Yuewu Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China;
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China;
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| |
Collapse
|
19
|
Grønning AGB, Kacprowski T, Schéele C. MultiPep: a hierarchical deep learning approach for multi-label classification of peptide bioactivities. Biol Methods Protoc 2021; 6:bpab021. [PMID: 34909478 PMCID: PMC8665375 DOI: 10.1093/biomethods/bpab021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/28/2021] [Accepted: 11/17/2021] [Indexed: 11/14/2022] Open
Abstract
Peptide-based therapeutics are here to stay and will prosper in the future. A key step in identifying novel peptide-drugs is the determination of their bioactivities. Recent advances in peptidomics screening approaches hold promise as a strategy for identifying novel drug targets. However, these screenings typically generate an immense number of peptides and tools for ranking these peptides prior to planning functional studies are warranted. Whereas a couple of tools in the literature predict multiple classes, these are constructed using multiple binary classifiers. We here aimed to use an innovative deep learning approach to generate an improved peptide bioactivity classifier with capacity of distinguishing between multiple classes. We present MultiPep: a deep learning multi-label classifier that assigns peptides to zero or more of 20 bioactivity classes. We train and test MultiPep on data from several publically available databases. The same data are used for a hierarchical clustering, whose dendrogram shapes the architecture of MultiPep. We test a new loss function that combines a customized version of Matthews correlation coefficient with binary cross entropy (BCE), and show that this is better than using class-weighted BCE as loss function. Further, we show that MultiPep surpasses state-of-the-art peptide bioactivity classifiers and that it predicts known and novel bioactivities of FDA-approved therapeutic peptides. In conclusion, we present innovative machine learning techniques used to produce a peptide prediction tool to aid peptide-based therapy development and hypothesis generation.
Collapse
Affiliation(s)
- Alexander G B Grønning
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, 38106 Braunschweig, Germany.,Braunschweig Integrated Centre for Systems Biology (BRICS), 38106 Braunschweig, Germany
| | - Camilla Schéele
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|