1
|
Su L, Ma Z, Ji H, Kong J, Yan W, Zhang Q, Li J, Zuo M. From prediction to design: Revealing the mechanisms of umami peptides using interpretable deep learning, quantum chemical simulations, and module substitution. Food Chem 2025; 483:144301. [PMID: 40233511 DOI: 10.1016/j.foodchem.2025.144301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 03/24/2025] [Accepted: 04/08/2025] [Indexed: 04/17/2025]
Abstract
This study screened and designed umami peptides using deep learning model and module substitution strategies. The predictive model, which integrates pre-training, enhanced feature, and contrastive learning module, achieved an accuracy of 0.94, outperforming other models by 2-9 %. Umami peptides were identified through virtual hydrolysis, model predictions, and sensory evaluation. Peptides EN, ETR, GK4, RK5, ER6, EF7, IL8, VR9, DL10, and PK14 demonstrated umami taste and exhibited umami-enhancing effects with MSG. Module substitution strategy, where highly contributive module from umami peptides replace corresponding module in bitter peptides, facilitates peptide design and modification. The mechanism underlying module substitution and taste presentation were elucidated via molecular docking and active site analysis, revealing that substituted peptides form more hydrogen bonds and hydrophobic interactions with T1R1/T1R3. Amino acids D, E, Q, K, and R were critical for umami taste. This study provides an efficient tool for rapid umami peptide screening and expands the repository.
Collapse
Affiliation(s)
- Lijun Su
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China; School of Food and Health, Beijing Technology and Business University, Beijing 100048, China
| | - Zhenren Ma
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China
| | - Huizhuo Ji
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China; School of Food and Health, Beijing Technology and Business University, Beijing 100048, China
| | - Jianlei Kong
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China.
| | - Wenjing Yan
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China
| | - Qingchuan Zhang
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China
| | - Jian Li
- School of Food and Health, Beijing Technology and Business University, Beijing 100048, China
| | - Min Zuo
- School of Information, Beijing Wuzi University, Beijing 101126, China.
| |
Collapse
|
2
|
Gradl K, Richter P, Somoza V. Bitter peptides formed during in-vitro gastric digestion induce mechanisms of gastric acid secretion and release satiating serotonin via bitter taste receptors TAS2R4 and TAS2R43 in human parietal cells in culture. Food Chem 2025; 482:144174. [PMID: 40184744 DOI: 10.1016/j.foodchem.2025.144174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 03/02/2025] [Accepted: 03/30/2025] [Indexed: 04/07/2025]
Abstract
A key barrier in transitioning to plant-based, more satiating diets, is the bitter taste of plant proteins. We hypothesize that both, a more bitter tasting (MBT) and a less bitter tasting (LBT) pea protein hydrolysate (PPH) can be digested in the stomach into bitter tasting peptides that stimulate proton secretion (PS) and serotonin release, as two of the key gastric satiety signals, via the functional involvement of bitter taste receptors (TAS2Rs). Using a sensory-guided LC-MS approach, we identified six bitter peptides that were released from LBT-PPH and MBT-PPH during gastric digestion in vitro. TAS2R4 and TAS2R43 involvement in PS and serotonin release was confirmed via CRISPR-Cas9 knockout experiments. Our hypothesis was proven with all six peptides equally stimulating PS in immortalized human gastric HGT-1 cells, and LBT-PPH-derived peptides eliciting a higher serotonin release in HGT-1 cells than MBT-PPH peptides, indicating a satiating potential of less bitter tasting protein hydrolysates.
Collapse
Affiliation(s)
- Katrin Gradl
- TUM School of Life Sciences, Technical University of Munich, Alte Akademie 8, 85354 Freising, Germany; Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Straße 34, 85354 Freising, Germany
| | - Phil Richter
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Straße 34, 85354 Freising, Germany
| | - Veronika Somoza
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Straße 34, 85354 Freising, Germany; Chair of Nutritional Systems Biology, Technical University of Munich, Lise-Meitner-Straße 34, 85354 Freising, Germany; Department of Physiological Chemistry, Faculty of Chemistry, University of Vienna, Althanstrasse 14, 1090 Vienna, Austria.
| |
Collapse
|
3
|
Ahmed S, Schaduangrat N, Chumnanpuen P, Shoombuatong W. GRU4ACE: Enhancing ACE inhibitory peptide prediction by integrating gated recurrent unit with multi-source feature embeddings. Protein Sci 2025; 34:e70026. [PMID: 40371738 PMCID: PMC12079467 DOI: 10.1002/pro.70026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 12/12/2024] [Accepted: 12/19/2024] [Indexed: 05/16/2025]
Abstract
Accurate identification of angiotensin-I-converting enzyme (ACE) inhibitory peptides is essential for understanding the primary factor regulating the renin-angiotensin system and guiding the development of new drug candidates. Given the inherent challenges in experimental processes, computational methods for in silico peptide identification can be invaluable for enabling high-throughput characterization of ACE inhibitory peptides. This study introduces GRU4ACE, an innovative deep learning framework based on multi-view information for identifying ACE inhibitory peptides. First, GRU4ACE utilizes multi-source feature encoding methods to capture the information embedded in ACE inhibitory peptides, including sequential information, graphical information, semantic information, and contextual information. Specifically, the feature representations used herein are derived from conventional feature descriptors, natural language processing (NLP)-based embeddings, and pre-trained protein language model (PLM)-based embeddings. Next, multiple feature embeddings were fused, and the elastic net was employed for feature optimization. Finally, the optimal feature subset with strong feature representation was input into a gated recurrent unit (GRU). The proposed GRU4ACE approach demonstrated superior performance over existing methods in terms of the independent test. To be specific, the balanced accuracy, sensitivity, and MCC scores of GRU4ACE reached 0.948, 0.934, and 0.895, which were 6.46%, 8.92%, and 12.51% higher than those of the compared methods, respectively. In addition, when comparing well-regarded feature descriptors, we found that the proposed multi-view features effectively captured crucial information, leading to improved ACE inhibitory peptide prediction performance. These comprehensive results highlight that GRU4ACE enhances prediction accuracy and significantly narrows down the search for new potential antihypertensive drugs.
Collapse
Affiliation(s)
- Saeed Ahmed
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical TechnologyMahidol UniversityBangkokThailand
- Department of Computer ScienceUniversity of SwabiSwabisPakistan
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical TechnologyMahidol UniversityBangkokThailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of ScienceKasetsart UniversityBangkokThailand
- Kasetsart University International College (KUIC)Kasetsart UniversityBangkokThailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical TechnologyMahidol UniversityBangkokThailand
| |
Collapse
|
4
|
Lv J, Geng A, Pan Z, Wei L, Zou Q, Zhang Z, Cui F. iBitter-GRE: A Novel Stacked Bitter Peptide Predictor with ESM-2 and Multi-View Features. J Mol Biol 2025; 437:169005. [PMID: 39954778 DOI: 10.1016/j.jmb.2025.169005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 01/08/2025] [Accepted: 02/10/2025] [Indexed: 02/17/2025]
Abstract
Accurate identification of bitter peptides is essential for research. Although models using sequence information have evolved in the context of bitter peptides, there is still room for improvement in their predictive performance. In the present study, we introduced a novel predictive tool, iBitter-GRE, designed to improve the accuracy of bitter peptide identification. Our model uses ESM-2 and traditional descriptors capture the physical and biochemical properties of bitter peptides for feature extraction. To expand the model's learning capabilities, we adopted a stacking approach to integrate multiple learners. Feature contributions were analyzed using SHAP values. Validation by domain experts confirmed that our model effectively identifies the key biochemical characteristics of bitter peptides. Benchmark experiments showed that iBitter-GRE achieves higher accuracy than existing methods. To assist the researchers, we created a web server accessible at http://www.bioai-lab.com/iBitter-GRE. We believe that iBitter-GRE is a valuable tool for the discovery and identification of bitter peptides.
Collapse
Affiliation(s)
- Jingwei Lv
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Aoyun Geng
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zhuoyu Pan
- International Business School, Hainan University, Haikou 570228, China
| | - Leyi Wei
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR, China; School of Informatics, Xiamen University, Xiamen, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| |
Collapse
|
5
|
Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025; 188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany.
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Faiza Mehmood
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Institute of Data Sciences, University of Engineering and Technology, Lahore, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
6
|
Li F, Bin Y, Zhao J, Zheng C. DeepPD: A Deep Learning Method for Predicting Peptide Detectability Based on Multi-feature Representation and Information Bottleneck. Interdiscip Sci 2025; 17:200-214. [PMID: 39661307 DOI: 10.1007/s12539-024-00665-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 10/07/2024] [Accepted: 10/09/2024] [Indexed: 12/12/2024]
Abstract
Peptide detectability measures the relationship between the protein composition and abundance in the sample and the peptides identified during the analytical procedure. This relationship has significant implications for the fundamental tasks of proteomics. Existing methods primarily rely on a single type of feature representation, which limits their ability to capture the intricate and diverse characteristics of peptides. In response to this limitation, we introduce DeepPD, an innovative deep learning framework incorporating multi-feature representation and the information bottleneck principle (IBP) to predict peptide detectability. DeepPD extracts semantic information from peptides using evolutionary scale modeling 2 (ESM-2) and integrates sequence and evolutionary information to construct the feature space collaboratively. The IBP effectively guides the feature learning process, minimizing redundancy in the feature space. Experimental results across various datasets demonstrate that DeepPD outperforms state-of-the-art methods. Furthermore, we demonstrate that DeepPD exhibits competitive generalization and transfer learning capabilities across diverse datasets and species. In conclusion, DeepPD emerges as the most effective method for predicting peptide detectability, showcasing its potential applicability to other protein sequence prediction tasks.
Collapse
Affiliation(s)
- Fenglin Li
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, and School of Artificial Intelligence, Anhui University, Hefei, 230601, China
| | - Jianping Zhao
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China.
| | - Chunhou Zheng
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China.
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, and School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
| |
Collapse
|
7
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Deepstack-ACE: A deep stacking-based ensemble learning framework for the accelerated discovery of ACE inhibitory peptides. Methods 2025; 234:131-140. [PMID: 39709069 DOI: 10.1016/j.ymeth.2024.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 11/27/2024] [Accepted: 12/07/2024] [Indexed: 12/23/2024] Open
Abstract
Identifying angiotensin-I-converting enzyme (ACE) inhibitory peptides accurately is crucial for understanding the primary factor that regulates the renin-angiotensin system and for providing guidance in developing new potential drugs. Given the inherent experimental complexities, using computational methods for in silico peptide identification could be indispensable for facilitating the high-throughput characterization of ACE inhibitory peptides. In this paper, we propose a novel deep stacking-based ensemble learning framework, termed Deepstack-ACE, to precisely identify ACE inhibitory peptides. In Deepstack-ACE, the input peptide sequences are fed into the word2vec embedding technique to generate sequence representations. Then, these representations were employed to train five powerful deep learning methods, including long short-term memory, convolutional neural network, multi-layer perceptron, gated recurrent unit network, and recurrent neural network, for the construction of base-classifiers. Finally, the optimized stacked model was constructed based on the best combination of selected base-classifiers. Benchmarking experiments showed that Deepstack-ACE attained a more accurate and robust identification of ACE inhibitory peptides compared to its base-classifiers and several conventional machine learning classifiers. Remarkably, in the independent test, our proposed model significantly outperformed the current state-of-the-art methods, with a balanced accuracy of 0.916, sensitivity of 0.911, and Matthews correlation coefficient scores of 0.826. Moreover, we developed a user-friendly web server for Deepstack-ACE, which is freely available at https://pmlabqsar.pythonanywhere.com/Deepstack-ACE. We anticipate that our proposed Deepstack-ACE model can provide a faster and reasonably accurate identification of ACE inhibitory peptides.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; Kasetsart University International College (KUIC), Kasetsart University, Bangkok 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
8
|
Zhang R, Li Y, Jiang Q, Li Y, Cai Z, Zhang H. ESMR4FBP: A pLM-based regression prediction model for specific properties of food-derived peptides optimized multiple bionic metaheuristic algorithms. Food Chem 2025; 464:141840. [PMID: 39509883 DOI: 10.1016/j.foodchem.2024.141840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 09/12/2024] [Accepted: 10/27/2024] [Indexed: 11/15/2024]
Abstract
Due to the growing emphasis on food safety, peptide research is increasingly focusing on food sources. Traditional methods for determining peptide properties are expensive. While artificial intelligence (AI) models can reduce cost, existing peptide models often lack accuracy. This study aimed to develop a regression model capable of predicting peptide properties. We integrated the ESM-2 model with the LSTM architecture and optimized the model structure using three metaheuristic algorithms, including WOA, SSA, and HHO. Using an antioxidant tripeptide dataset, our model achieved an R2 of 0.9458 and RMSE of 0.3135, outperforming the state-of-the-art (SOTA) model by 11.66 % and 50.00 %, respectively. The developed model was further applied to the bitter peptide dataset, resulting in R2 of 0.8385 and RMSE of 0.4414, respectively. These results suggest that our model has the potential to accurately predict the properties of various types of peptides.
Collapse
Affiliation(s)
- Ruihao Zhang
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China; Future Food Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, PR China
| | - Yonghui Li
- Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA
| | - Qinbo Jiang
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China
| | - Yang Li
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China
| | - Zhe Cai
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China
| | - Hui Zhang
- College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, PR China.
| |
Collapse
|
9
|
Luo J, Zhao K, Chen J, Yang C, Qu F, Liu Y, Jin X, Yan K, Zhang Y, Liu B. iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2025; 22:qzae084. [PMID: 39585308 PMCID: PMC12011362 DOI: 10.1093/gpbjnl/qzae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/25/2024] [Accepted: 11/21/2024] [Indexed: 11/26/2024]
Abstract
Functional peptides are short amino acid fragments that have a wide range of beneficial functions for living organisms. The majority of previous studies have focused on mono-functional peptides, but an increasing number of multi-functional peptides have been discovered. Although there have been enormous experimental efforts to assay multi-functional peptides, only a small portion of millions of known peptides has been explored. The development of effective and accurate techniques for identifying multi-functional peptides can facilitate their discovery and mechanistic understanding. In this study, we presented iMFP-LG, a method for multi-functional peptide identification based on protein language models (pLMs) and graph attention networks (GATs). Our comparative analyses demonstrated that iMFP-LG outperformed the state-of-the-art methods in identifying both multi-functional bioactive peptides and multi-functional therapeutic peptides. The interpretability of iMFP-LG was also illustrated by visualizing attention patterns in pLMs and GATs. Regarding the outstanding performance of iMFP-LG on the identification of multi-functional peptides, we employed iMFP-LG to screen novel peptides with both anti-microbial and anti-cancer functions from millions of known peptides in the UniRef90 database. As a result, eight candidate peptides were identified, among which one candidate was validated to process both anti-bacterial and anti-cancer properties through molecular structure alignment and biological experiments. We anticipate that iMFP-LG can assist in the discovery of multi-functional peptides and contribute to the advancement of peptide drug design.
Collapse
Affiliation(s)
- Jiawei Luo
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Kejuan Zhao
- School of Science, Harbin Institute of Technology, Shenzhen 518055, China
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Caihua Yang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Fuchuan Qu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Yumeng Liu
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518055, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518055, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 10081, China
| | - Yang Zhang
- School of Science, Harbin Institute of Technology, Shenzhen 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 10081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 10081, China
| |
Collapse
|
10
|
Guan C, Fernandes FC, Franco OL, de la Fuente-Nunez C. Leveraging large language models for peptide antibiotic design. CELL REPORTS. PHYSICAL SCIENCE 2025; 6:102359. [PMID: 39949833 PMCID: PMC11823563 DOI: 10.1016/j.xcrp.2024.102359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/16/2025]
Abstract
Large language models (LLMs) have significantly impacted various domains of our society, including recent applications in complex fields such as biology and chemistry. These models, built on sophisticated neural network architectures and trained on extensive datasets, are powerful tools for designing, optimizing, and generating molecules. This review explores the role of LLMs in discovering and designing antibiotics, focusing on peptide molecules. We highlight advancements in drug design and outline the challenges of applying LLMs in these areas.
Collapse
Affiliation(s)
- Changge Guan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally
| | - Fabiano C. Fernandes
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- Departamento de Ciência da Computação, Instituto Federal de Brasília, Campus Taguatinga, Brasília, Brazil
- These authors contributed equally
| | - Octavio L. Franco
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- S-Inova Biotech, Programa de Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
11
|
Cui Z, Qi C, Zhou T, Yu Y, Wang Y, Zhang Z, Zhang Y, Wang W, Liu Y. Artificial intelligence and food flavor: How AI models are shaping the future and revolutionary technologies for flavor food development. Compr Rev Food Sci Food Saf 2025; 24:e70068. [PMID: 39783879 DOI: 10.1111/1541-4337.70068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 10/16/2024] [Accepted: 11/04/2024] [Indexed: 01/12/2025]
Abstract
The food flavor science, traditionally reliant on experimental methods, is now entering a promising era with the help of artificial intelligence (AI). By integrating existing technologies with AI, researchers can explore and develop new flavor substances in a digital environment, saving time and resources. More and more research will use AI and big data to enhance product flavor, improve product quality, meet consumer needs, and drive the industry toward a smarter and more sustainable future. In this review, we elaborate on the mechanisms of flavor recognition and their potential impact on nutritional regulation. With the increase of data accumulation and the development of internet information technology, food flavor databases and food ingredient databases have made great progress. These databases provide detailed information on the nutritional content, flavor molecules, and chemical properties of various food compounds, providing valuable data support for the rapid evaluation of flavor components and the construction of screening technology. With the popularization of AI in various fields, the field of food flavor has also ushered in new development opportunities. This review explores the mechanisms of flavor recognition and the role of AI in enhancing food flavor analysis through high-throughput omics data and screening technologies. AI algorithms offer a pathway to scientifically improve product formulations, thereby enhancing flavor and customized meals. Furthermore, it discusses the safety challenges of integrating AI into the food flavor industry.
Collapse
Affiliation(s)
- Zhiyong Cui
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Chengliang Qi
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Tianxing Zhou
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
- Department of Bioinformatics, Faculty of Science, The University of Melbourne, Melbourne, Victoria, Australia
| | - Yanyang Yu
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Yueming Wang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Zhiwei Zhang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Yin Zhang
- Key Laboratory of Meat Processing of Sichuan, Chengdu University, Chengdu, China
| | - Wenli Wang
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Yuan Liu
- Department of Food Science & Technology, School of Agriculture & Biology, Shanghai Jiao Tong University, Shanghai, China
- School of Food Science and Engineering, Ningxia University, Yinchuan, China
| |
Collapse
|
12
|
Lu Q, Xu J, Zhang R, Liu H, Wang M, Liu X, Yue Z, Gao Y. RiceSNP-ABST: a deep learning approach to identify abiotic stress-associated single nucleotide polymorphisms in rice. Brief Bioinform 2024; 26:bbae702. [PMID: 39757606 PMCID: PMC11962596 DOI: 10.1093/bib/bbae702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/16/2024] [Accepted: 12/23/2024] [Indexed: 01/07/2025] Open
Abstract
Given the adverse effects faced by rice due to abiotic stresses, the precise and rapid identification of single nucleotide polymorphisms (SNPs) associated with abiotic stress traits (ABST-SNPs) in rice is crucial for developing resistant rice varieties. The scarcity of high-quality data related to abiotic stress in rice has hindered the development of computational models and constrained research efforts aimed at rice improvement and breeding. Genome-wide association studies provide a better statistical power to consider ABST-SNPs in rice. Meanwhile, deep learning methods have shown their capability in predicting disease- or phenotype-associated loci, but have primarily focused on human species. Therefore, developing predictive models for identifying ABST-SNPs in rice is both urgent and valuable. In this paper, a model called RiceSNP-ABST is proposed for predicting ABST-SNPs in rice. Firstly, six training datasets were generated using a novel strategy for negative sample construction. Secondly, four feature encoding methods were proposed based on DNA sequence fragments, followed by feature selection. Finally, convolutional neural networks with residual connections were used to determine whether the sequences contained rice ABST-SNPs. RiceSNP-ABST outperformed traditional machine learning and state-of-the-art methods on the benchmark dataset and demonstrated consistent generalization on an independent dataset and cross-species datasets. Notably, multi-granularity causal structure learning was employed to elucidate the relationships among DNA structural features, aiming to identify key genetic variants more effectively. The web-based tool for the RiceSNP-ABST can be accessed at http://rice-snp-abst.aielab.cc.
Collapse
Affiliation(s)
- Quan Lu
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Jiajun Xu
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Renyi Zhang
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Hangcheng Liu
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Meng Wang
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Xiaoshuang Liu
- Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
- Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Yujia Gao
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
- Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| |
Collapse
|
13
|
Zheng L, Zhang Q, Luo P, Shi F, Zhang Y, He X, An Y, Cheng G, Pan X, Li Z, Zhou B, Wang J. Chemical Proteomics Approaches for Screening Small Molecule Inhibitors Covalently Binding to SARS-Cov-2. Adv Biol (Weinh) 2024; 8:e2300612. [PMID: 39410782 DOI: 10.1002/adbi.202300612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/01/2024] [Indexed: 11/13/2024]
Abstract
Although various strategies have been used to prevent and treat SARS-CoV-2, the spread and evolution of SARS-CoV-2 is still progressing rapidly. The emerging variants Omicron and its sublineage have a greater ability to spread and escape nearly all current monoclonal antibodies treatments, highlighting an urgent need to develop therapeutics targeting current and emerging Omicron variants or recombinants with breadth and potency. Here, some small molecule drugs are rapidly identified that could covalently binding to receptor binding domain (RBD) protein of Omicron through the combined application of artificial intelligence (AI) and activity-based protein profiling (ABPP) technology. The surface plasmon resonance (SPR) and pseudo-virus neutralization experiments further reveal that an FDA-approved drug gallic acid has robust neutralization potency against Omicron pseudo-virus with the IC50 values of 23.56 × 10-6 m. Taken together, a platform combining AI intelligence, biochemical, SPR, molecular docking, and pseudo-virus-based screening for rapid identification and evaluation of potential anti-SARS-CoV-2 small molecule drugs is established and the effectiveness of the platform is validated.
Collapse
Affiliation(s)
- Liuhai Zheng
- Department of Pulmonary and Critical Care Medicine, Guangdong Provincial Clinical Research Center for Geriatrics, Shenzhen Institute of Respiratory Diseases, and Shenzhen Clinical Research Centre for Geriatrics, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, 518020, China
- Integrated Chinese and Western Medicine Postdoctoral Research Station, Jinan University, Guangzhou, 510632, Guangdong, China
| | - Qian Zhang
- School of Traditional Chinese Medicine and School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510515, China
| | - Piao Luo
- School of Traditional Chinese Medicine and School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510515, China
| | - Fei Shi
- Department of Pulmonary and Critical Care Medicine, Guangdong Provincial Clinical Research Center for Geriatrics, Shenzhen Institute of Respiratory Diseases, and Shenzhen Clinical Research Centre for Geriatrics, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, 518020, China
| | - Ying Zhang
- State Key Laboratory for Quality Esurance and Sustainable Use of Dao-di Herbs, Artemisinin Research Center, and Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Xiaoxue He
- State Key Laboratory of Virology, Wuhan Institute of Virology, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Wuhan, 430207, China
| | - Yehai An
- School of Traditional Chinese Medicine and School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510515, China
| | - Guangqing Cheng
- Department of Pulmonary and Critical Care Medicine, Guangdong Provincial Clinical Research Center for Geriatrics, Shenzhen Institute of Respiratory Diseases, and Shenzhen Clinical Research Centre for Geriatrics, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, 518020, China
| | - Xiaoyan Pan
- Department of Pulmonary and Critical Care Medicine, Guangdong Provincial Clinical Research Center for Geriatrics, Shenzhen Institute of Respiratory Diseases, and Shenzhen Clinical Research Centre for Geriatrics, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, 518020, China
| | - Zhijie Li
- Department of Pulmonary and Critical Care Medicine, Guangdong Provincial Clinical Research Center for Geriatrics, Shenzhen Institute of Respiratory Diseases, and Shenzhen Clinical Research Centre for Geriatrics, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, 518020, China
| | - Boping Zhou
- Department of Pulmonary and Critical Care Medicine, Guangdong Provincial Clinical Research Center for Geriatrics, Shenzhen Institute of Respiratory Diseases, and Shenzhen Clinical Research Centre for Geriatrics, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, 518020, China
| | - Jigang Wang
- Department of Pulmonary and Critical Care Medicine, Guangdong Provincial Clinical Research Center for Geriatrics, Shenzhen Institute of Respiratory Diseases, and Shenzhen Clinical Research Centre for Geriatrics, Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, 518020, China
- School of Traditional Chinese Medicine and School of Pharmaceutical Sciences, Southern Medical University, Guangzhou, 510515, China
- State Key Laboratory for Quality Esurance and Sustainable Use of Dao-di Herbs, Artemisinin Research Center, and Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
- State Key Laboratory of Antiviral Drugs, School of Pharmacy, Henan University, Kaifeng, 475004, China
| |
Collapse
|
14
|
Tian Y, He Y, Xiong H, Sun Y. Rice Protein Peptides Alleviate Alcoholic Liver Disease via the PPARγ Signaling Pathway: Through Liver Metabolomics and Gut Microbiota Analysis. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:23790-23803. [PMID: 39406388 DOI: 10.1021/acs.jafc.4c02671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Alcoholic liver disease (ALD) is the predominant type of liver disease worldwide, resulting in significant mortality and a high disease burden. ALD damages multiple organs, including the liver, gut, and brain, causing inflammation, oxidative stress, and fat deposition. In this study, we investigated the effects of rice protein peptides (RPP) on ALD in mice with a primary focus on the gut microbiota and liver metabolites. The results showed that administration of RPP significantly alleviated the symptoms of ALD in mice including adiposity, oxidative stress, and inflammation. The KEGG pathway shows that RPP downregulates the liver metabolite of capric acid and the metabolism of fatty acid biosynthesis compared with the MOD group. Mechanistically, RPP downregulated the PPARγ signaling pathway and suppressed the expression of fatty acid biosynthesis genes (FASN, ACC1, ACSL1, and ACSL3). Furthermore, two active peptides (YLPTKQ and PKLPR) with potential therapeutic functions for ALD were screened by Caco-2 cell modeling and molecular docking techniques. In addition, RPP treatment alleviates gut microbiota dysbiosis by reversing the F/B ratio, increasing the relative abundance of Alloprevotella and Alistipes, and upregulating the level of short-chain fatty acids. In conclusion, RPP alleviates ALD steatosis through the PPARγ signaling pathway by YLPTKQ and PKLPR and regulates gut microbiota.
Collapse
Affiliation(s)
- Yue Tian
- State Key Laboratory of Food Science and Resources, Nanchang University, Nanchang, Jiangxi 330047, China
| | - Yangzheng He
- State Key Laboratory of Food Science and Resources, Nanchang University, Nanchang, Jiangxi 330047, China
| | - Hua Xiong
- State Key Laboratory of Food Science and Resources, Nanchang University, Nanchang, Jiangxi 330047, China
| | - Yong Sun
- State Key Laboratory of Food Science and Resources, Nanchang University, Nanchang, Jiangxi 330047, China
- Jiangxi Medicine Academy of Nutrition and Health Management, Nanchang, Jiangxi 330052, China
| |
Collapse
|
15
|
Srivastava P, Steuer A, Ferri F, Nicoli A, Schultz K, Bej S, Di Pizio A, Wolkenhauer O. Bitter peptide prediction using graph neural networks. J Cheminform 2024; 16:111. [PMID: 39375808 PMCID: PMC11459932 DOI: 10.1186/s13321-024-00909-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 09/22/2024] [Indexed: 10/09/2024] Open
Abstract
Bitter taste is an unpleasant taste modality that affects food consumption. Bitter peptides are generated during enzymatic processes that produce functional, bioactive protein hydrolysates or during the aging process of fermented products such as cheese, soybean protein, and wine. Understanding the underlying peptide sequences responsible for bitter taste can pave the way for more efficient identification of these peptides. This paper presents BitterPep-GCN, a feature-agnostic graph convolution network for bitter peptide prediction. The graph-based model learns the embedding of amino acids in the bitter peptide sequences and uses mixed pooling for bitter classification. BitterPep-GCN was benchmarked using BTP640, a publicly available bitter peptide dataset. The latent peptide embeddings generated by the trained model were used to analyze the activity of sequence motifs responsible for the bitter taste of the peptides. Particularly, we calculated the activity for individual amino acids and dipeptide, tripeptide, and tetrapeptide sequence motifs present in the peptides. Our analyses pinpoint specific amino acids, such as F, G, P, and R, as well as sequence motifs, notably tripeptide and tetrapeptide motifs containing FF, as key bitter signatures in peptides. This work not only provides a new predictor of bitter taste for a more efficient identification of bitter peptides in various food products but also gives a hint into the molecular basis of bitterness.Scientific ContributionOur work provides the first application of Graph Neural Networks for the prediction of peptide bitter taste. The best-developed model, BitterPep-GCN, learns the embedding of amino acids in the bitter peptide sequences and uses mixed pooling for bitter classification. The embeddings were used to analyze the sequence motifs responsible for the bitter taste.
Collapse
Affiliation(s)
- Prashant Srivastava
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany
| | - Alexandra Steuer
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Francesco Ferri
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Alessandro Nicoli
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Kristian Schultz
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany
| | - Saptarshi Bej
- Indian Institute of Science Education and Research Thiruvananthapuram, Maruthamala P. O, Vithura, 695551, Kerala, India
| | - Antonella Di Pizio
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany.
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
| | - Olaf Wolkenhauer
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany.
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany.
| |
Collapse
|
16
|
Ahmed Z, Shahzadi K, Temesgen SA, Ahmad B, Chen X, Ning L, Zulfiqar H, Lin H, Jin YT. A protein pre-trained model-based approach for the identification of the liquid-liquid phase separation (LLPS) proteins. Int J Biol Macromol 2024; 277:134146. [PMID: 39067723 DOI: 10.1016/j.ijbiomac.2024.134146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 07/06/2024] [Accepted: 07/23/2024] [Indexed: 07/30/2024]
Abstract
Liquid-liquid phase separation (LLPS) regulates many biological processes including RNA metabolism, chromatin rearrangement, and signal transduction. Aberrant LLPS potentially leads to serious diseases. Therefore, the identification of the LLPS proteins is crucial. Traditionally, biochemistry-based methods for identifying LLPS proteins are costly, time-consuming, and laborious. In contrast, artificial intelligence-based approaches are fast and cost-effective and can be a better alternative to biochemistry-based methods. Previous research methods employed word2vec in conjunction with machine learning or deep learning algorithms. Although word2vec captures word semantics and relationships, it might not be effective in capturing features relevant to protein classification, like physicochemical properties, evolutionary relationships, or structural features. Additionally, other studies often focused on a limited set of features for model training, including planar π contact frequency, pi-pi, and β-pairing propensities. To overcome such shortcomings, this study first constructed a reliable dataset containing 1206 protein sequences, including 603 LLPS and 603 non-LLPS protein sequences. Then a computational model was proposed to efficiently identify the LLPS proteins by perceiving semantic information of protein sequences directly; using an ESM2-36 pre-trained model based on transformer architecture in conjunction with a convolutional neural network. The model could achieve an accuracy of 85.68% and 89.67%, respectively on training data and test data, surpassing the accuracy of previous studies. The performance demonstrates the potential of our computational methods as efficient alternatives for identifying LLPS proteins.
Collapse
Affiliation(s)
- Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China.
| | - Kiran Shahzadi
- Department of Biotechnology, Women University of Azad Jammu and Kashmir, Bagh, Azad Kashmir, Pakistan.
| | - Sebu Aboma Temesgen
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| | - Basharat Ahmad
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| | - Xiang Chen
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China.
| | - Lin Ning
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China; School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China.
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China.
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China.
| | - Yan-Ting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China.
| |
Collapse
|
17
|
Hu Y, Badar IH, Liu Y, Zhu Y, Yang L, Kong B, Xu B. Advancements in production, assessment, and food applications of salty and saltiness-enhancing peptides: A review. Food Chem 2024; 453:139664. [PMID: 38761739 DOI: 10.1016/j.foodchem.2024.139664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/01/2024] [Accepted: 05/12/2024] [Indexed: 05/20/2024]
Abstract
Salt is important for food flavor, but excessive sodium intake leads to adverse health consequences. Thus, salty and saltiness-enhancing peptides are developed for sodium-reduction products. This review elucidates saltiness perception process and analyses correlation between the peptide structure and saltiness-enhancing ability. These peptides interact with taste receptors to produce saltiness perception, including ENaC, TRPV1, and TMC4. This review also outlines preparation, isolation, purification, characterization, screening, and assessment techniques of these peptides and discusses their potential applications. These peptides are from various sources and produced through enzymatic hydrolysis, microbial fermentation, or Millard reaction and then separated, purified, identified, and screened. Sensory evaluation, electronic tongue, bioelectronic tongue, and cell and animal models are the primary saltiness assessment approaches. These peptides can be used in sodium-reduction food products to produce "clean label" items, and the peptides with biological activity can also serve as functional ingredients, making them very promising for food industry.
Collapse
Affiliation(s)
- Yingying Hu
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, Anhui 230009, China; State Key Laboratory of Meat Quality Control and Cultured Meat Development, Jiangsu Yurun Meat Industry Group Co., Ltd, Nanjing, Jiangsu 210041, China
| | - Iftikhar Hussain Badar
- College of Food Science, Northeast Agricultural University, Harbin, Heilongjiang 150030, China; Department of Meat Science and Technology, University of Veterinary and Animal Sciences, Lahore 54000, Pakistan
| | - Yue Liu
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Yuan Zhu
- State Key Laboratory of Meat Quality Control and Cultured Meat Development, Jiangsu Yurun Meat Industry Group Co., Ltd, Nanjing, Jiangsu 210041, China
| | - Linwei Yang
- State Key Laboratory of Meat Quality Control and Cultured Meat Development, Jiangsu Yurun Meat Industry Group Co., Ltd, Nanjing, Jiangsu 210041, China
| | - Baohua Kong
- College of Food Science, Northeast Agricultural University, Harbin, Heilongjiang 150030, China.
| | - Baocai Xu
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, Anhui 230009, China.
| |
Collapse
|
18
|
Liu S, Shi T, Yu J, Li R, Lin H, Deng K. Research on Bitter Peptides in the Field of Bioinformatics: A Comprehensive Review. Int J Mol Sci 2024; 25:9844. [PMID: 39337334 PMCID: PMC11432553 DOI: 10.3390/ijms25189844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 09/06/2024] [Accepted: 09/09/2024] [Indexed: 09/30/2024] Open
Abstract
Bitter peptides are small molecular peptides produced by the hydrolysis of proteins under acidic, alkaline, or enzymatic conditions. These peptides can enhance food flavor and offer various health benefits, with attributes such as antihypertensive, antidiabetic, antioxidant, antibacterial, and immune-regulating properties. They show significant potential in the development of functional foods and the prevention and treatment of diseases. This review introduces the diverse sources of bitter peptides and discusses the mechanisms of bitterness generation and their physiological functions in the taste system. Additionally, it emphasizes the application of bioinformatics in bitter peptide research, including the establishment and improvement of bitter peptide databases, the use of quantitative structure-activity relationship (QSAR) models to predict bitterness thresholds, and the latest advancements in classification prediction models built using machine learning and deep learning algorithms for bitter peptide identification. Future research directions include enhancing databases, diversifying models, and applying generative models to advance bitter peptide research towards deepening and discovering more practical applications.
Collapse
Affiliation(s)
| | | | | | | | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| | - Kejun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| |
Collapse
|
19
|
Zhai S, Tan Y, Zhu C, Zhang C, Gao Y, Mao Q, Zhang Y, Duan H, Yin Y. PepExplainer: An explainable deep learning model for selection-based macrocyclic peptide bioactivity prediction and optimization. Eur J Med Chem 2024; 275:116628. [PMID: 38944933 DOI: 10.1016/j.ejmech.2024.116628] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024]
Abstract
Macrocyclic peptides possess unique features, making them highly promising as a drug modality. However, evaluating their bioactivity through wet lab experiments is generally resource-intensive and time-consuming. Despite advancements in artificial intelligence (AI) for bioactivity prediction, challenges remain due to limited data availability and the interpretability issues in deep learning models, often leading to less-than-ideal predictions. To address these challenges, we developed PepExplainer, an explainable graph neural network based on substructure mask explanation (SME). This model excels at deciphering amino acid substructures, translating macrocyclic peptides into detailed molecular graphs at the atomic level, and efficiently handling non-canonical amino acids and complex macrocyclic peptide structures. PepExplainer's effectiveness is enhanced by utilizing the correlation between peptide enrichment data from selection-based focused library and bioactivity data, and employing transfer learning to improve bioactivity predictions of macrocyclic peptides against IL-17C/IL-17 RE interaction. Additionally, PepExplainer underwent further validation for bioactivity prediction using an additional set of thirteen newly synthesized macrocyclic peptides. Moreover, it enabled the optimization of the IC50 of a macrocyclic peptide, reducing it from 15 nM to 5.6 nM based on the contribution score provided by PepExplainer. This achievement underscores PepExplainer's skill in deciphering complex molecular patterns, highlighting its potential to accelerate the discovery and optimization of macrocyclic peptides.
Collapse
Affiliation(s)
- Silong Zhai
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yahong Tan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Cheng Zhu
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Chengyun Zhang
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yan Gao
- Qilu Institute of Technology, Jinan, 250200, China
| | - Qingyi Mao
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China; Shandong Research Institute of Industrial Technology, Jinan, 250101, China.
| |
Collapse
|
20
|
Zhang J, Lu H, Jiang Y, Ma Y, Deng L. ncRNA Coding Potential Prediction Using BiLSTM and Transformer Encoder-Based Model. J Chem Inf Model 2024; 64:6712-6722. [PMID: 39120528 DOI: 10.1021/acs.jcim.4c01097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/10/2024]
Abstract
Many noncoding RNAs (ncRNAs) have been identified, and many of them play vital roles in various biological processes, including gene expression regulation, epigenetic regulation, transcription, and control. Recently, a few observations revealed that ncRNAs are translated into functional peptides. Moreover, many computational methods have been developed to predict the coding potential of these transcripts, which contributes to a deeper investigation of their functions. However, most of these are used to distinguish ncRNAs and mRNAs. It is important to develop a highly accurate computational tool for identifying the coding potential of ncRNAs, thereby contributing to the discovery of novel peptides. In this Article, we propose a novel BiLSTM And Transformer encoder-based model (nBAT) with intrinsic features encoded for ncRNA coding potential prediction. In nBAT, we introduce a learnable position encoding mechanism to better obtain the embeddings of the ncRNA sequence. Moreover, we extract 43 intrinsic features from different perspectives and encode these features into the Transformer encoder by calculating their distances. Our performance comparisons show that nBAT achieves a superior performance than the state-of-the-art methods for coding potential prediction on different datasets. We also apply the method to new ncRNAs for identifying the coding potential, and the results further indicate the competitive performance of nBAT. We expect the method can be exploited as a useful tool for high-throughput coding potential prediction for ncRNAs.
Collapse
Affiliation(s)
- Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| | - Hao Lu
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| | - Ying Jiang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang 441053, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| |
Collapse
|
21
|
Richter P, Sebald K, Fischer K, Schnieke A, Jlilati M, Mittermeier-Klessinger V, Somoza V. Gastric digestion of the sweet-tasting plant protein thaumatin releases bitter peptides that reduce H. pylori induced pro-inflammatory IL-17A release via the TAS2R16 bitter taste receptor. Food Chem 2024; 448:139157. [PMID: 38569411 DOI: 10.1016/j.foodchem.2024.139157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 03/08/2024] [Accepted: 03/25/2024] [Indexed: 04/05/2024]
Abstract
About half of the world's population is infected with the bacterium Helicobacter pylori. For colonization, the bacterium neutralizes the low gastric pH and recruits immune cells to the stomach. The immune cells secrete cytokines, i.e., the pro-inflammatory IL-17A, which directly or indirectly damage surface epithelial cells. Since (I) dietary proteins are known to be digested into bitter tasting peptides in the gastric lumen, and (II) bitter tasting compounds have been demonstrated to reduce the release of pro-inflammatory cytokines through functional involvement of bitter taste receptors (TAS2Rs), we hypothesized that the sweet-tasting plant protein thaumatin would be cleaved into anti-inflammatory bitter peptides during gastric digestion. Using immortalized human parietal cells (HGT-1 cells), we demonstrated a bitter taste receptor TAS2R16-dependent reduction of a H. pylori-evoked IL-17A release by up to 89.7 ± 21.9% (p ≤ 0.01). Functional involvement of TAS2R16 was demonstrated by the study of specific antagonists and siRNA knock-down experiments.
Collapse
Affiliation(s)
- Phil Richter
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354 Freising, Germany; Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany.
| | - Karin Sebald
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany.
| | - Konrad Fischer
- Livestock Biotechnology, TUM School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Str. 1, 85,354 Freising, Germany.
| | - Angelika Schnieke
- Livestock Biotechnology, TUM School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Str. 1, 85,354 Freising, Germany.
| | - Malek Jlilati
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany
| | - Verena Mittermeier-Klessinger
- Food Chemistry and Molecular Sensory Science, Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany.
| | - Veronika Somoza
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany; Nutritional Systems Biology, TUM School of Life Sciences, Technical University of Munich, Lise-Meitner-Str. 34, 85,354 Freising, Germany; Department of Physiological Chemistry, Faculty of Chemistry, University of Vienna, Josef-Holaubek-Platz 2 (UZA II), 1090 Wien, Austria.
| |
Collapse
|
22
|
Nguyen VN, Ho TT, Doan TD, Le NQK. Using a hybrid neural network architecture for DNA sequence representation: A study on N 4-methylcytosine sites. Comput Biol Med 2024; 178:108664. [PMID: 38875905 DOI: 10.1016/j.compbiomed.2024.108664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 05/11/2024] [Accepted: 05/26/2024] [Indexed: 06/16/2024]
Abstract
N4-methylcytosine (4mC) is a modified form of cytosine found in DNA, contributing to epigenetic regulation. It exists in various genomes, including the Rosaceae family encompassing significant fruit crops like apples, cherries, and roses. Previous investigations have examined the distribution and functional implications of 4mC sites within the Rosaceae genome, focusing on their potential roles in gene expression regulation, environmental adaptation, and evolution. This research aims to improve the accuracy of predicting 4mC sites within the genome of Fragaria vesca, a Rosaceae plant species. Building upon the original 4mc-w2vec method, which combines word embedding processing and a convolutional neural network (CNN), we have incorporated additional feature encoding techniques and leveraged pre-trained natural language processing (NLP) models with different deep learning architectures including different forms of CNN, recurrent neural networks (RNN) and long short-term memory (LSTM). Our assessments have shown that the best model is derived from a CNN model using fastText encoding. This model demonstrates enhanced performance, achieving a sensitivity of 0.909, specificity of 0.77, and accuracy of 0.879 on an independent dataset. Furthermore, our model surpasses previously published works on the same dataset, thus showcasing its superior predictive capabilities.
Collapse
Affiliation(s)
- Van-Nui Nguyen
- University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Viet Nam
| | - Trang-Thi Ho
- Department of Computer Science and Information Engineering, TamKang University, New Taipei, 251301, Taiwan
| | - Thu-Dung Doan
- International Degree Program in Animal Vaccine Technology, International College, National Pingtung University of Science and Technology, Pingtung, Taiwan
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 110, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, 110, Taiwan.
| |
Collapse
|
23
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
24
|
Androutsos L, Pallante L, Bompotas A, Stojceski F, Grasso G, Piga D, Di Benedetto G, Alexakos C, Kalogeras A, Theofilatos K, Deriu MA, Mavroudi S. Predicting multiple taste sensations with a multiobjective machine learning method. NPJ Sci Food 2024; 8:47. [PMID: 39054312 PMCID: PMC11272927 DOI: 10.1038/s41538-024-00287-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 07/05/2024] [Indexed: 07/27/2024] Open
Abstract
Taste perception plays a pivotal role in guiding nutrient intake and aiding in the avoidance of potentially harmful substances through five basic tastes - sweet, bitter, umami, salty, and sour. Taste perception originates from molecular interactions in the oral cavity between taste receptors and chemical tastants. Hence, the recognition of taste receptors and the subsequent perception of taste heavily rely on the physicochemical properties of food ingredients. In recent years, several advances have been made towards the development of machine learning-based algorithms to classify chemical compounds' tastes using their molecular structures. Despite the great efforts, there remains significant room for improvement in developing multi-class models to predict the entire spectrum of basic tastes. Here, we present a multi-class predictor aimed at distinguishing bitter, sweet, and umami, from other taste sensations. The development of a multi-class taste predictor paves the way for a comprehensive understanding of the chemical attributes associated with each fundamental taste. It also opens the potential for integration into the evolving realm of multi-sensory perception, which encompasses visual, tactile, and olfactory sensations to holistically characterize flavour perception. This concept holds promise for introducing innovative methodologies in the rational design of foods, including pre-determining specific tastes and engineering complementary diets to augment traditional pharmacological treatments.
Collapse
Affiliation(s)
| | - Lorenzo Pallante
- PolitoBIOMedLab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Torino, 10129, Italy
| | - Agorakis Bompotas
- Industrial Systems Institute, Athena Research Center, 265 04, Patras, Greece
| | - Filip Stojceski
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, Lugano-Viganello, 6962, Switzerland
| | - Gianvito Grasso
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, Lugano-Viganello, 6962, Switzerland
| | - Dario Piga
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, Lugano-Viganello, 6962, Switzerland
| | | | - Christos Alexakos
- Industrial Systems Institute, Athena Research Center, 265 04, Patras, Greece
| | | | | | - Marco A Deriu
- PolitoBIOMedLab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Torino, 10129, Italy
| | - Seferina Mavroudi
- InSyBio PC, Patras, 265 04, Greece
- Department of Nursing, University of Patras, 265 04, Patras, Greece
| |
Collapse
|
25
|
Xu T, Wang Q, Yang Z, Ying J. A BERT-based approach for identifying anti-inflammatory peptides using sequence information. Heliyon 2024; 10:e32951. [PMID: 38988537 PMCID: PMC11234020 DOI: 10.1016/j.heliyon.2024.e32951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 05/22/2024] [Indexed: 07/12/2024] Open
Abstract
The use of anti-inflammatory peptides (AIPs) as an alternative therapeutic approach for inflammatory diseases holds great research significance. Due to the high cost and difficulty in identifying AIPs with experimental methods, the discovery and design of peptides by computational methods before the experimental stage have become promising technology. In this study, we present BertAIP, a bidirectional encoder representation from transformers (BERT)-based method for predicting AIPs directly from their amino acid sequence without using any other information. BertAIP implements a BERT model to extract features of a protein, and uses a fully connected feed-forward network for AIP classification. It was constructed and evaluated using the AIP datasets that were reconstructed from the latest Immune Epitope Database. The experimental results showed that BertAIP achieved an accuracy of 0.751 and a Matthews correlation coefficient of 0.451, which were higher than other commonly used methods. The results of the independent test suggested that BertAIP outperformed the existing AIP predictors. In addition, to enhance the interpretability of BertAIP, we explored and visualized the amino acids that the model considered important for AIP prediction. We believe that the BertAIP proposed herein will be a useful tool for large-scale screening and identifying novel AIPs for drug development and therapeutic research related to inflammatory diseases.
Collapse
Affiliation(s)
- Teng Xu
- Institute of Translational Medicine, Baotou Central Hospital, Baotou, China
| | - Qian Wang
- Department of Clinical Laboratory, Wenzhou People's Hospital, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, China
| | - Zhigang Yang
- Institute of Translational Medicine, Baotou Central Hospital, Baotou, China
| | - Jianchao Ying
- Wenzhou Key Laboratory of Emergency, Critical Care, and Disaster Medicine, Department of Emergency, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
26
|
Zhang J, Zhao L, Wang W, Zhang Q, Wang XT, Xing DF, Ren NQ, Lee DJ, Chen C. Large language model for horizontal transfer of resistance gene: From resistance gene prevalence detection to plasmid conjugation rate evaluation. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 931:172466. [PMID: 38626826 DOI: 10.1016/j.scitotenv.2024.172466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024]
Abstract
The burgeoning issue of plasmid-mediated resistance genes (ARGs) dissemination poses a significant threat to environmental integrity. However, the prediction of ARGs prevalence is overlooked, especially for emerging ARGs that are potentially evolving gene exchange hotspot. Here, we explored to classify plasmid or chromosome sequences and detect resistance gene prevalence by using DNABERT. Initially, the DNABERT fine-tuned in plasmid and chromosome sequences followed by multilayer perceptron (MLP) classifier could achieve 0.764 AUC (Area under curve) on external datasets across 23 genera, outperforming 0.02 AUC than traditional statistic-based model. Furthermore, Escherichia, Pseudomonas single genera based model were also be trained to explore its predict performance to ARGs prevalence detection. By integrating K-mer frequency attributes, our model could boost the performance to predict the prevalence of ARGs in an external dataset in Escherichia with 0.0281-0.0615 AUC and Pseudomonas with 0.0196-0.0928 AUC. Finally, we established a random forest model aimed at forecasting the relative conjugation transfer rate of plasmids with 0.7956 AUC, drawing on data from existing literature. It identifies the plasmid's repression status, cellular density, and temperature as the most important factors influencing transfer frequency. With these two models combined, they provide useful reference for quick and low-cost integrated evaluation of resistance gene transfer, accelerating the process of computer-assisted quantitative risk assessment of ARGs transfer in environmental field.
Collapse
Affiliation(s)
- Jiabin Zhang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Lei Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Wei Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China.
| | - Quan Zhang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Xue-Ting Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - De-Feng Xing
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Nan-Qi Ren
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China; Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, China
| | - Duu-Jong Lee
- Department of Mechanical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| | - Chuan Chen
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China.
| |
Collapse
|
27
|
Fang Y, Luo M, Ren Z, Wei L, Wei DQ. CELA-MFP: a contrast-enhanced and label-adaptive framework for multi-functional therapeutic peptides prediction. Brief Bioinform 2024; 25:bbae348. [PMID: 39038935 PMCID: PMC11262836 DOI: 10.1093/bib/bbae348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/27/2024] [Accepted: 07/08/2024] [Indexed: 07/24/2024] Open
Abstract
Functional peptides play crucial roles in various biological processes and hold significant potential in many fields such as drug discovery and biotechnology. Accurately predicting the functions of peptides is essential for understanding their diverse effects and designing peptide-based therapeutics. Here, we propose CELA-MFP, a deep learning framework that incorporates feature Contrastive Enhancement and Label Adaptation for predicting Multi-Functional therapeutic Peptides. CELA-MFP utilizes a protein language model (pLM) to extract features from peptide sequences, which are then fed into a Transformer decoder for function prediction, effectively modeling correlations between different functions. To enhance the representation of each peptide sequence, contrastive learning is employed during training. Experimental results demonstrate that CELA-MFP outperforms state-of-the-art methods on most evaluation metrics for two widely used datasets, MFBP and MFTP. The interpretability of CELA-MFP is demonstrated by visualizing attention patterns in pLM and Transformer decoder. Finally, a user-friendly online server for predicting multi-functional peptides is established as the implementation of the proposed CELA-MFP and can be freely accessed at http://dreamai.cmii.online/CELA-MFP.
Collapse
Affiliation(s)
- Yitian Fang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
- Peng Cheng Laboratory, 2 Xingke 1st Street, Nanshan District, Shenzhen 518055, China
| | - Mingshuang Luo
- Peng Cheng Laboratory, 2 Xingke 1st Street, Nanshan District, Shenzhen 518055, China
| | - Zhixiang Ren
- Peng Cheng Laboratory, 2 Xingke 1st Street, Nanshan District, Shenzhen 518055, China
| | - Leyi Wei
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
- School of Informatics, Xiamen University, 422 Siming South Road, Xiamen 361005, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
- Peng Cheng Laboratory, 2 Xingke 1st Street, Nanshan District, Shenzhen 518055, China
| |
Collapse
|
28
|
Park JH, Prasad V, Newsom S, Najar F, Rajan R. IdMotif: An Interactive Motif Identification in Protein Sequences. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2024; 44:114-125. [PMID: 38127603 DOI: 10.1109/mcg.2023.3345742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
This article presents a visual analytics framework, idMotif, to support domain experts in identifying motifs in protein sequences. A motif is a short sequence of amino acids usually associated with distinct functions of a protein, and identifying similar motifs in protein sequences helps us to predict certain types of disease or infection. idMotif can be used to explore, analyze, and visualize such motifs in protein sequences. We introduce a deep-learning-based method for grouping protein sequences and allow users to discover motif candidates of protein groups based on local explanations of the decision of a deep-learning model. idMotif provides several interactive linked views for between and within protein cluster/group and sequence analysis. Through a case study and experts' feedback, we demonstrate how the framework helps domain experts analyze protein sequences and motif identification.
Collapse
|
29
|
Fang C, He J, Yamana H. MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model. J Bioinform Comput Biol 2024; 22:2450006. [PMID: 38812466 DOI: 10.1142/s0219720024500069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
Molecular recognition features (MoRFs) are particular functional segments of disordered proteins, which play crucial roles in regulating the phase transition of membrane-less organelles and frequently serve as central sites in cellular interaction networks. As the association between disordered proteins and severe diseases continues to be discovered, identifying MoRFs has gained growing significance. Due to the limited number of experimentally validated MoRFs, the performance of existing MoRF's prediction algorithms is not good enough and still needs to be improved. In this research, we present a model named MoRF_ESM, which utilizes deep-learning protein representations to predict MoRFs in disordered proteins. This approach employs a pretrained ESM-2 protein language model to generate embedding representations of residues in the form of attention map matrices. These representations are combined with a self-learned TextCNN model for feature extraction and prediction. In addition, an averaging step was incorporated at the end of the MoRF_ESM model to refine the output and generate final prediction results. In comparison to other impressive methods on benchmark datasets, the MoRF_ESM approach demonstrates state-of-the-art performance, achieving [Formula: see text] higher AUC than other methods when tested on TEST1 and achieving [Formula: see text] higher AUC than other methods when tested on TEST2. These results imply that the combination of ESM-2 and TextCNN can effectively extract deep evolutionary features related to protein structure and function, along with capturing shallow pattern features located in protein sequences, and is well qualified for the prediction task of MoRFs. Given that ESM-2 is a highly versatile protein language model, the methodology proposed in this study can be readily applied to other tasks involving the classification of protein sequences.
Collapse
Affiliation(s)
- Chun Fang
- Department of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
- Department of Computer Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku, Tokyo 169-8555, Japan
| | - Jiasheng He
- Department of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| | - Hayato Yamana
- Department of Computer Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku, Tokyo 169-8555, Japan
| |
Collapse
|
30
|
Hesamzadeh P, Seif A, Mahmoudzadeh K, Ganjali Koli M, Mostafazadeh A, Nayeri K, Mirjafary Z, Saeidian H. De novo antioxidant peptide design via machine learning and DFT studies. Sci Rep 2024; 14:6473. [PMID: 38499731 PMCID: PMC10948870 DOI: 10.1038/s41598-024-57247-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 03/15/2024] [Indexed: 03/20/2024] Open
Abstract
Antioxidant peptides (AOPs) are highly valued in food and pharmaceutical industries due to their significant role in human function. This study introduces a novel approach to identifying robust AOPs using a deep generative model based on sequence representation. Through filtration with a deep-learning classification model and subsequent clustering via the Butina cluster algorithm, twelve peptides (GP1-GP12) with potential antioxidant capacity were predicted. Density functional theory (DFT) calculations guided the selection of six peptides for synthesis and biological experiments. Molecular orbital representations revealed that the HOMO for these peptides is primarily localized on the indole segment, underscoring its pivotal role in antioxidant activity. All six synthesized peptides exhibited antioxidant activity in the DPPH assay, while the hydroxyl radical test showed suboptimal results. A hemolysis assay confirmed the non-hemolytic nature of the generated peptides. Additionally, an in silico investigation explored the potential inhibitory interaction between the peptides and the Keap1 protein. Analysis revealed that ligands GP3, GP4, and GP12 induced significant structural changes in proteins, affecting their stability and flexibility. These findings highlight the capability of machine learning approaches in generating novel antioxidant peptides.
Collapse
Affiliation(s)
- Parsa Hesamzadeh
- Department of Chemistry, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Abdolvahab Seif
- Dipartimento di Fisica, Universita' di Padova, Via Marzolo 8, 35131, Padua, Italy
- Department of Chemistry, University of Turin, Via Pietro Giuria 7, 10125, Turin, Italy
| | - Kazem Mahmoudzadeh
- Department of Organic Chemistry and Oil, Faculty of Chemistry, Shahid Beheshti University, Tehran, Iran
| | | | - Amrollah Mostafazadeh
- Cellular and Molecular Biology Research Center, Health Research Institute, Babol University of Medical Sciences, Babol, Iran
| | - Kosar Nayeri
- Student Research Committee, Babol University of Medical Sciences, Babol, Iran
| | - Zohreh Mirjafary
- Department of Chemistry, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Hamid Saeidian
- Department of Science, Payame Noor University (PNU), PO Box: 19395-4697, Tehran, Iran.
| |
Collapse
|
31
|
He Y, Liu K, Liu Y, Han W. Prediction of bitterness based on modular designed graph neural network. BIOINFORMATICS ADVANCES 2024; 4:vbae041. [PMID: 38566918 PMCID: PMC10987211 DOI: 10.1093/bioadv/vbae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/31/2024] [Accepted: 03/11/2024] [Indexed: 04/04/2024]
Abstract
Motivation Bitterness plays a pivotal role in our ability to identify and evade harmful substances in food. As one of the five tastes, it constitutes a critical component of our sensory experiences. However, the reliance on human tasting for discerning flavors presents cost challenges, rendering in silico prediction of bitterness a more practical alternative. Results In this study, we introduce the use of Graph Neural Networks (GNNs) in bitterness prediction, superseding traditional machine learning techniques. We developed an advanced model, a Hybrid Graph Neural Network (HGNN), surpassing conventional GNNs according to tests on public datasets. Using HGNN and three other GNNs, we designed BitterGNNs, a bitterness predictor that achieved an AUC value of 0.87 in both external bitter/non-bitter and bitter/sweet evaluations, outperforming the acclaimed RDKFP-MLP predictor with AUC values of 0.86 and 0.85. We further created a bitterness prediction website and database, TastePD (https://www.tastepd.com/). The BitterGNNs predictor, built on GNNs, offers accurate bitterness predictions, enhancing the efficacy of bitterness prediction, aiding advanced food testing methodology development, and deepening our understanding of bitterness origins. Availability and implementation TastePD can be available at https://www.tastepd.com, all codes are at https://github.com/heyigacu/BitterGNN.
Collapse
Affiliation(s)
- Yi He
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Kaifeng Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Yuyang Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Weiwei Han
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| |
Collapse
|
32
|
Banić M, Butorac K, Čuljak N, Butorac A, Novak J, Pavunc AL, Rušanac A, Stanečić Ž, Lovrić M, Šušković J, Kos B. An Integrated Comprehensive Peptidomics and In Silico Analysis of Bioactive Peptide-Rich Milk Fermented by Three Autochthonous Cocci Strains. Int J Mol Sci 2024; 25:2431. [PMID: 38397111 PMCID: PMC10888711 DOI: 10.3390/ijms25042431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/12/2024] [Accepted: 02/17/2024] [Indexed: 02/25/2024] Open
Abstract
Bioactive peptides (BPs) are molecules of paramount importance with great potential for the development of functional foods, nutraceuticals or therapeutics for the prevention or treatment of various diseases. A functional BP-rich dairy product was produced by lyophilisation of bovine milk fermented by the autochthonous strains Lactococcus lactis subsp. lactis ZGBP5-51, Enterococcus faecium ZGBP5-52 and Enterococcus faecalis ZGBP5-53 isolated from the same artisanal fresh cheese. The efficiency of the proteolytic system of the implemented strains in the production of BPs was confirmed by a combined high-throughput mass spectrometry (MS)-based peptidome profiling and an in silico approach. First, peptides released by microbial fermentation were identified via a non-targeted peptide analysis (NTA) comprising reversed-phase nano-liquid chromatography (RP nano-LC) coupled with matrix-assisted laser desorption/ionisation-time-of-flight/time-of-flight (MALDI-TOF/TOF) MS, and then quantified by targeted peptide analysis (TA) involving RP ultrahigh-performance LC (RP-UHPLC) coupled with triple-quadrupole MS (QQQ-MS). A combined database and literature search revealed that 10 of the 25 peptides identified in this work have bioactive properties described in the literature. Finally, by combining the output of MS-based peptidome profiling with in silico bioactivity prediction tools, three peptides (75QFLPYPYYAKPA86, 40VAPFPEVFGK49, 117ARHPHPHLSF126), whose bioactive properties have not been previously reported in the literature, were identified as potential BP candidates.
Collapse
Affiliation(s)
- Martina Banić
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Katarina Butorac
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Nina Čuljak
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Ana Butorac
- BICRO Biocentre Ltd., Borongajska cesta 83H, 10000 Zagreb, Croatia; (A.B.); (Ž.S.); (M.L.)
| | - Jasna Novak
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Andreja Leboš Pavunc
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Anamarija Rušanac
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Željka Stanečić
- BICRO Biocentre Ltd., Borongajska cesta 83H, 10000 Zagreb, Croatia; (A.B.); (Ž.S.); (M.L.)
| | - Marija Lovrić
- BICRO Biocentre Ltd., Borongajska cesta 83H, 10000 Zagreb, Croatia; (A.B.); (Ž.S.); (M.L.)
| | - Jagoda Šušković
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Blaženka Kos
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| |
Collapse
|
33
|
Zou H. iDPPIV-SI: identifying dipeptidyl peptidase IV inhibitory peptides by using multiple sequence information. J Biomol Struct Dyn 2024; 42:2144-2152. [PMID: 37125813 DOI: 10.1080/07391102.2023.2203257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 04/10/2023] [Indexed: 05/02/2023]
Abstract
Currently, diabetes has become a great threaten for people's health in the world. Recent study shows that dipeptidyl peptidase IV (DPP-IV) inhibitory peptides may be a potential pharmaceutical agent to treat diabetes. Thus, there is a need to discriminate DPP-IV inhibitory peptides from non-DPP-IV inhibitory peptides. To address this issue, a novel computational model called iDPPIV-SI was developed in this study. In the first, 50 different types of physicochemical (PC) properties were employed to denote the peptide sequences. Three different feature descriptors including the 1-order, 2-order correlation methods and discrete wavelet transform were applied to collect useful information from the PC matrix. Furthermore, the least absolute shrinkage and selection operator (LASSO) algorithm was employed to select these most discriminative features. All of these chosen features were fed into support vector machine (SVM) for identifying DPP-IV inhibitory peptides. The iDPPIV-SI achieved 91.26% and 98.12% classification accuracies on the training and independent dataset, respectively. There is a significantly improvement in the classification performance by the proposed method, as compared with the state-of-the-art predictors. The datasets and MATLAB codes (based on MATLAB2015b) used in current study are available at https://figshare.com/articles/online_resource/iDPPIV-SI/20085878.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
34
|
Yu Y, Liu S, Zhang X, Yu W, Pei X, Liu L, Jin Y. Identification and prediction of milk-derived bitter taste peptides based on peptidomics technology and machine learning method. Food Chem 2024; 433:137288. [PMID: 37683467 DOI: 10.1016/j.foodchem.2023.137288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/19/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023]
Abstract
Bitter taste peptides (BPs) are vital for drug and nutrition research, but large-scale screening of them is still time-consuming and costly. This study developed a complete workflow for screening BPs based on peptidomics technology and machine learning method. Using an expanded dataset and a new combination of BPs' characteristic factors, a novel classification prediction model (CPM-BP) based on the Light Gradient Boosting Machine algorithm was constructed with an accuracy of 90.3 % for predicting BPs. Among 724 significantly different peptides between spoiled and fresh UHT milk, 180 potential BPs were predicted using CPM-BP and eleven of them were previously reported. One known BP (FALPQYLK) and three predicted potential BPs (FALPQYL, FFVAPFPEVFGKE, EMPFPKYP) were verified by determination of calcium mobilization of HEK293T cells expressing human bitter taste receptor T2R4 (hT2R4). Three potential BPs could activate the hT2R4 and are demonstrated to be BPs, which proved the effectiveness of CPM-BP.
Collapse
Affiliation(s)
- Yang Yu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning 116023, China
| | - Shengchi Liu
- School of Information Science and Engineering, Dalian Polytechnic University, Dalian, Liaoning 116034, China
| | - Xinchen Zhang
- School of Information Science and Engineering, Dalian Polytechnic University, Dalian, Liaoning 116034, China
| | - Wenhao Yu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning 116023, China
| | - Xiaoyan Pei
- Inner Mongolia Yili Industrial Group Co., Ltd., Hohhot, Inner Mongolia 010110, China
| | - Li Liu
- School of Information Science and Engineering, Dalian Polytechnic University, Dalian, Liaoning 116034, China.
| | - Yan Jin
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning 116023, China.
| |
Collapse
|
35
|
Zulfiqar H, Guo Z, Ahmad RM, Ahmed Z, Cai P, Chen X, Zhang Y, Lin H, Shi Z. Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings. Front Med (Lausanne) 2024; 10:1291352. [PMID: 38298505 PMCID: PMC10829051 DOI: 10.3389/fmed.2023.1291352] [Citation(s) in RCA: 47] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/26/2023] [Indexed: 02/02/2024] Open
Abstract
Snake venom contains many toxic proteins that can destroy the circulatory system or nervous system of prey. Studies have found that these snake venom proteins have the potential to treat cardiovascular and nervous system diseases. Therefore, the study of snake venom protein is conducive to the development of related drugs. The research technologies based on traditional biochemistry can accurately identify these proteins, but the experimental cost is high and the time is long. Artificial intelligence technology provides a new means and strategy for large-scale screening of snake venom proteins from the perspective of computing. In this paper, we developed a sequence-based computational method to recognize snake toxin proteins. Specially, we utilized three different feature descriptors, namely g-gap, natural vector and word 2 vector, to encode snake toxin protein sequences. The analysis of variance (ANOVA), gradient-boost decision tree algorithm (GBDT) combined with incremental feature selection (IFS) were used to optimize the features, and then the optimized features were input into the deep learning model for model training. The results show that our model can achieve a prediction performance with an accuracy of 82.00% in 10-fold cross-validation. The model is further verified on independent data, and the accuracy rate reaches to 81.14%, which demonstrated that our model has excellent prediction performance and robustness.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Zhiling Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Ramala Masood Ahmad
- Department of Plant Breeding and Genetics, University of Agriculture Faisalabad, Faisalabad, Pakistan
| | - Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Peiling Cai
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Xiang Chen
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Zheng Shi
- Clinical Genetics Laboratory, Clinical Medical College & Affiliated Hospital, Chengdu University, Chengdu, China
| |
Collapse
|
36
|
Li W, Li G, Sun Y, Zhang L, Cui X, Jia Y, Zhao T. Prediction of SARS-CoV-2 Infection Phosphorylation Sites and Associations of these Modifications with Lung Cancer Development. Curr Gene Ther 2024; 24:239-248. [PMID: 37957848 DOI: 10.2174/0115665232268074231026111634] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 10/05/2023] [Accepted: 10/07/2023] [Indexed: 11/15/2023]
Abstract
INTRODUCTION Since the emergence of SARS-CoV-2 viruses, multiple mutant strains have been identified. Infection with SARS-CoV-2 virus leads to alterations in host cell phosphorylation signal, which systematically modulates the immune response. METHODS Identification and analysis of SARS-CoV-2 virus infection phosphorylation sites enable insight into the mechanisms of viral infection and effects on host cells, providing important fundamental data for the study and development of potent drugs for the treatment of immune inflammatory diseases. In this paper, we have analyzed the SARS-CoV-2 virus-infected phosphorylation region and developed a transformer-based deep learning-assisted identification method for the specific identification of phosphorylation sites in SARS-CoV-2 virus-infected host cells. RESULTS Furthermore, through association analysis with lung cancer, we found that SARS-CoV-2 infection may affect the regulatory role of the immune system, leading to an abnormal increase or decrease in the immune inflammatory response, which may be associated with the development and progression of cancer. CONCLUSION We anticipate that this study will provide an important reference for SARS-CoV-2 virus evolution as well as immune-related studies and provide a reliable complementary screening tool for anti-SARS-CoV-2 virus drug and vaccine design.
Collapse
Affiliation(s)
- Wei Li
- Institute of Bioinformatics, Harbin Institute of Technology, Harbin, China
| | - Gen Li
- Department of Radiation Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yuzhi Sun
- Institute for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Liyuan Zhang
- Institute for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xinran Cui
- Institute for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yuran Jia
- Institute for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zhao
- School of Medicine and Health, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
37
|
Dhibar S, Jana B. Accurate Prediction of Antifreeze Protein from Sequences through Natural Language Text Processing and Interpretable Machine Learning Approaches. J Phys Chem Lett 2023; 14:10727-10735. [PMID: 38009833 DOI: 10.1021/acs.jpclett.3c02817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Antifreeze proteins (AFPs) bind to growing iceplanes owing to their structural complementarity nature, thereby inhibiting the ice-crystal growth by thermal hysteresis. Classification of AFPs from sequence is a difficult task due to their low sequence similarity, and therefore, the usual sequence similarity algorithms, like Blast and PSI-Blast, are not efficient. Here, a method combining n-gram feature vectors and machine learning models to accelerate the identification of potential AFPs from sequences is proposed. All these n-gram features are extracted from the K-mer counting method. The comparative analysis reveals that, among different machine learning models, Xgboost outperforms others in predicting AFPs from sequence when penta-mers are used as a feature vector. When tested on an independent dataset, our method performed better compared to other existing ones with sensitivity of 97.50%, recall of 98.30%, and f1 score of 99.10%. Further, we used the SHAP method, which provides important insight into the functional activity of AFPs.
Collapse
Affiliation(s)
- Saikat Dhibar
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| | - Biman Jana
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| |
Collapse
|
38
|
Dutta P, Jain D, Gupta R, Rai B. Classification of tastants: A deep learning based approach. Mol Inform 2023; 42:e202300146. [PMID: 37885360 DOI: 10.1002/minf.202300146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 09/26/2023] [Accepted: 10/26/2023] [Indexed: 10/28/2023]
Abstract
Predicting the taste of molecules is of critical importance in the food and beverages, flavor, and pharmaceutical industries for the design and screening of new tastants. In this work, we have built deep learning models to classify sweet, bitter, and umami molecules- the three basic tastes whose sensation is mediated by G protein-coupled receptors. An extensive dataset containing 1466 bitter, 1764 sweet, and 238 umami tastants was curated from existing literature. We analyzed the chemical characteristics of the molecules, with special focus on the presence of different functional groups. A deep neural network model based on molecular descriptors and a graph neural network model were trained for taste prediction. The class imbalance due to fewer umami molecules was tackled using special sampling techniques. Both models show comparable performance during evaluation, but the graph-based model can learn task-specific representations from the molecular structure without requiring handcrafted features. We further explain the deep neural network predictions using Shapley additive explanations. Finally, we demonstrated the applicability of the models by screening bitter, sweet, and umami molecules from a large food database. This study develops an in-silico approach to classify molecules based on their taste by leveraging the recent progress in deep learning, which can serve as a powerful tool for tastant design.
Collapse
Affiliation(s)
- Prantar Dutta
- Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune, 411013, India
| | - Deepak Jain
- Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune, 411013, India
| | - Rakesh Gupta
- Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune, 411013, India
| | - Beena Rai
- Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune, 411013, India
| |
Collapse
|
39
|
Le NQK. Leveraging transformers-based language models in proteome bioinformatics. Proteomics 2023; 23:e2300011. [PMID: 37381841 DOI: 10.1002/pmic.202300011] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/13/2023] [Accepted: 06/13/2023] [Indexed: 06/30/2023]
Abstract
In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
| |
Collapse
|
40
|
Zhang J, Yan W, Zhang Q, Li Z, Liang L, Zuo M, Zhang Y. Umami-BERT: An interpretable BERT-based model for umami peptides prediction. Food Res Int 2023; 172:113142. [PMID: 37689906 DOI: 10.1016/j.foodres.2023.113142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/12/2023] [Accepted: 06/13/2023] [Indexed: 09/11/2023]
Abstract
Umami peptides have received extensive attention due to their ability to enhance flavors and provide nutritional benefits. The increasing demand for novel umami peptides and the vast number of peptides present in food call for more efficient methods to screen umami peptides, and further exploration is necessary. Therefore, the purpose of this study is to develop deep learning (DL) model to realize rapid screening of umami peptides. The Umami-BERT model was devised utilizing a novel two-stage training strategy with Bidirectional Encoder Representations from Transformers (BERT) and the inception network. In the pre-training stage, attention mechanisms were implemented on a large amount of bioactive peptides sequences to acquire high-dimensional generalized features. In the re-training stage, umami peptide prediction was carried out on UMP789 dataset, which is developed through the latest research. The model achieved the performance with an accuracy (ACC) of 93.23% and MCC of 0.78 on the balanced dataset, as well as an ACC of 95.00% and MCC of 0.85 on the unbalanced dataset. The results demonstrated that Umami-BERT could predict umami peptides directly from their amino acid sequences and exceeded the performance of other models. Furthermore, Umami-BERT enabled the analysis of attention pattern learned by Umami-BERT model. The amino acids Alanine (A), Cysteine (C), Aspartate (D), and Glutamicacid (E) were found to be the most significant contributors to umami peptides. Additionally, the patterns of summarized umami peptides involving A, C, D, and E were analyzed based on the learned attention weights. Consequently, Umami-BERT exhibited great potential in the large-scale screening of candidate peptides and offers novel insight for the further exploration of umami peptides.
Collapse
Affiliation(s)
- Jingcheng Zhang
- Food Laboratory of Zhongyuan, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China; Key Laboratory of Flavor Science of China Gengeral Chamber of Commerce, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Wenjing Yan
- National Engineering Research Centre for Agri-product Quality Traceability, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Qingchuan Zhang
- National Engineering Research Centre for Agri-product Quality Traceability, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Zihan Li
- National Engineering Research Centre for Agri-product Quality Traceability, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Li Liang
- Food Laboratory of Zhongyuan, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China; Key Laboratory of Flavor Science of China Gengeral Chamber of Commerce, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Min Zuo
- National Engineering Research Centre for Agri-product Quality Traceability, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Yuyu Zhang
- Food Laboratory of Zhongyuan, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China; Key Laboratory of Flavor Science of China Gengeral Chamber of Commerce, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| |
Collapse
|
41
|
He J, Zhang S, Fang C. AAindex-PPII: Predicting polyproline type II helix structure based on amino acid indexes with an improved BiGRU-TextCNN model. J Bioinform Comput Biol 2023; 21:2350022. [PMID: 37899354 DOI: 10.1142/s0219720023500221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
The polyproline-II (PPII) structure domain is crucial in organisms' signal transduction, transcription, cell metabolism, and immune response. It is also a critical structural domain for specific vital disease-associated proteins. Recognizing PPII is essential for understanding protein structure and function. To accurately predict PPII in proteins, we propose a novel method, AAindex-PPII, which only adopts amino acid index to characterize protein sequences and uses a Bidirectional Gated Recurrent Unit (BiGRU)-Improved TextCNN composite deep learning model to predict PPII in proteins. Experimental results show that, when tested on the same datasets, our method outperforms the state-of-the-art BERT-PPII method, achieving an AUC value of 0.845 on the strict data and an AUC value of 0.813 on the non-strict data, which is 0.024 and 0.03 higher than that of the BERT-PPII method. This study demonstrates that our proposed method is simple and efficient for PPII prediction without using pre-trained large models or complex features such as position-specific scoring matrices.
Collapse
Affiliation(s)
- Jiasheng He
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| | - Shun Zhang
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| | - Chun Fang
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| |
Collapse
|
42
|
Zhang X, Guo H, Zhang F, Wang X, Wu K, Qiu S, Liu B, Wang Y, Hu Y, Li J. HNetGO: protein function prediction via heterogeneous network transformer. Brief Bioinform 2023; 24:bbab556. [PMID: 37861172 PMCID: PMC10588005 DOI: 10.1093/bib/bbab556] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/18/2021] [Accepted: 12/04/2021] [Indexed: 10/21/2023] Open
Abstract
Protein function annotation is one of the most important research topics for revealing the essence of life at molecular level in the post-genome era. Current research shows that integrating multisource data can effectively improve the performance of protein function prediction models. However, the heavy reliance on complex feature engineering and model integration methods limits the development of existing methods. Besides, models based on deep learning only use labeled data in a certain dataset to extract sequence features, thus ignoring a large amount of existing unlabeled sequence data. Here, we propose an end-to-end protein function annotation model named HNetGO, which innovatively uses heterogeneous network to integrate protein sequence similarity and protein-protein interaction network information and combines the pretraining model to extract the semantic features of the protein sequence. In addition, we design an attention-based graph neural network model, which can effectively extract node-level features from heterogeneous networks and predict protein function by measuring the similarity between protein nodes and gene ontology term nodes. Comparative experiments on the human dataset show that HNetGO achieves state-of-the-art performance on cellular component and molecular function branches.
Collapse
Affiliation(s)
- Xiaoshuai Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Huannan Guo
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin 150086, China
| | - Fan Zhang
- Center NHC Key Laboratory of Cell Transplantation, The First Affiliated Hospital of Harbin Medical University, Harbin 150086, China
| | - Xuan Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Kaitao Wu
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Shizheng Qiu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Hu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| |
Collapse
|
43
|
Rojas C, Ballabio D, Consonni V, Suárez-Estrella D, Todeschini R. Classification-based machine learning approaches to predict the taste of molecules: A review. Food Res Int 2023; 171:113036. [PMID: 37330849 DOI: 10.1016/j.foodres.2023.113036] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/02/2023] [Accepted: 05/22/2023] [Indexed: 06/19/2023]
Abstract
The capacity to discriminate safe from dangerous compounds has played an important role in the evolution of species, including human beings. Highly evolved senses such as taste receptors allow humans to navigate and survive in the environment through information that arrives to the brain through electrical pulses. Specifically, taste receptors provide multiple bits of information about the substances that are introduced orally. These substances could be pleasant or not according to the taste responses that they trigger. Tastes have been classified into basic (sweet, bitter, umami, sour and salty) or non-basic (astringent, chilling, cooling, heating, pungent), while some compounds are considered as multitastes, taste modifiers or tasteless. Classification-based machine learning approaches are useful tools to develop predictive mathematical relationships in such a way as to predict the taste class of new molecules based on their chemical structure. This work reviews the history of multicriteria quantitative structure-taste relationship modelling, starting from the first ligand-based (LB) classifier proposed in 1980 by Lemont B. Kier and concluding with the most recent studies published in 2022.
Collapse
Affiliation(s)
- Cristian Rojas
- Grupo de Investigación en Quimiometría y QSAR, Facultad de Ciencia y Tecnología, Universidad del Azuay, Av. 24 de Mayo 7-77 y Hernán Malo, Cuenca 010107, Ecuador.
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| | - Diego Suárez-Estrella
- Grupo de Investigación en Quimiometría y QSAR, Facultad de Ciencia y Tecnología, Universidad del Azuay, Av. 24 de Mayo 7-77 y Hernán Malo, Cuenca 010107, Ecuador
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| |
Collapse
|
44
|
Zhai S, Tan Y, Zhang C, Hipolito CJ, Song L, Zhu C, Zhang Y, Duan H, Yin Y. PepScaf: Harnessing Machine Learning with In Vitro Selection toward De Novo Macrocyclic Peptides against IL-17C/IL-17RE Interaction. J Med Chem 2023; 66:11187-11200. [PMID: 37480587 DOI: 10.1021/acs.jmedchem.3c00627] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2023]
Abstract
The combination of library-based screening and artificial intelligence (AI) has been accelerating the discovery and optimization of hit ligands. However, the potential of AI to assist in de novo macrocyclic peptide ligand discovery has yet to be fully explored. In this study, an integrated AI framework called PepScaf was developed to extract the critical scaffold relative to bioactivity based on a vast dataset from an initial in vitro selection campaign against a model protein target, interleukin-17C (IL-17C). Taking the generated scaffold, a focused macrocyclic peptide library was rationally constructed to target IL-17C, yielding over 20 potent peptides that effectively inhibited IL-17C/IL-17RE interaction. Notably, the top two peptides displayed exceptional potency with IC50 values of 1.4 nM. This approach presents a viable methodology for more efficient macrocyclic peptide discovery, offering potential time and cost savings. Additionally, this is also the first report regarding the discovery of macrocyclic peptides against IL-17C/IL-17RE interaction.
Collapse
Affiliation(s)
- Silong Zhai
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Yahong Tan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Chengyun Zhang
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Christopher John Hipolito
- Screening & Compound Profiling, Quantitative Biosciences, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| | - Lulu Song
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Cheng Zhu
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Hongliang Duan
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
- Shandong Research Institute of Industrial Technology, Jinan 250101, China
| |
Collapse
|
45
|
Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinformatics 2023; 24:301. [PMID: 37507654 PMCID: PMC10386778 DOI: 10.1186/s12859-023-05421-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
BACKGROUND The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
46
|
Li F, Liu S, Li K, Zhang Y, Duan M, Yao Z, Zhu G, Guo Y, Wang Y, Huang L, Zhou F. EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species. Comput Biol Med 2023; 160:107030. [PMID: 37196456 DOI: 10.1016/j.compbiomed.2023.107030] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 04/21/2023] [Accepted: 05/10/2023] [Indexed: 05/19/2023]
Abstract
Methylation is a major DNA epigenetic modification for regulating the biological processes without altering the DNA sequence, and multiple types of DNA methylations have been discovered, including 6mA, 5hmC, and 4mC. Multiple computational approaches were developed to automatically identify the DNA methylation residues using machine learning or deep learning algorithms. The machine learning (ML) based methods are difficult to be transferred to the other predicting tasks of the DNA methylation sites using additional knowledge. Deep learning (DL) may facilitate the transfer learning of knowledge from similar tasks, but they are often ineffective on small datasets. This study proposes an integrated feature representation framework EpiTEAmDNA based on the strategies of transfer learning and ensemble learning, which is evaluated on multiple DNA methylation types across 15 species. EpiTEAmDNA integrates convolutional neural network (CNN) and conventional machine learning methods, and shows improved performances than the existing DL-based methods on small datasets when no additional knowledge is available. The experimental data suggests that the EpiTEAmDNA models may be further improved via transfer learning based on additional knowledge. The evaluation experiments on the independent test datasets also suggest that the proposed EpiTEAmDNA framework outperforms the existing models in most prediction tasks of the 3 DNA methylation types across 15 species. The source code, pre-trained global model, and the EpiTEAmDNA feature representation framework are freely available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Fei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Shuai Liu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yaqi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Meiyu Duan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| | - Zhaomin Yao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Gancheng Zhu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yutong Guo
- College of Life Sciences, Jilin University, Changchun, Jilin, 130012, China
| | - Ying Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| |
Collapse
|
47
|
Schaduangrat N, Anuwongcharoen N, Charoenkwan P, Shoombuatong W. DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J Cheminform 2023; 15:50. [PMID: 37149650 PMCID: PMC10163717 DOI: 10.1186/s13321-023-00721-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/08/2023] [Indexed: 05/08/2023] Open
Abstract
Drug resistance represents a major obstacle to therapeutic innovations and is a prevalent feature in prostate cancer (PCa). Androgen receptors (ARs) are the hallmark therapeutic target for prostate cancer modulation and AR antagonists have achieved great success. However, rapid emergence of resistance contributing to PCa progression is the ultimate burden of their long-term usage. Hence, the discovery and development of AR antagonists with capability to combat the resistance, remains an avenue for further exploration. Therefore, this study proposes a novel deep learning (DL)-based hybrid framework, named DeepAR, to accurately and rapidly identify AR antagonists by using only the SMILES notation. Specifically, DeepAR is capable of extracting and learning the key information embedded in AR antagonists. Firstly, we established a benchmark dataset by collecting active and inactive compounds against AR from the ChEMBL database. Based on this dataset, we developed and optimized a collection of baseline models by using a comprehensive set of well-known molecular descriptors and machine learning algorithms. Then, these baseline models were utilized for creating probabilistic features. Finally, these probabilistic features were combined and used for the construction of a meta-model based on a one-dimensional convolutional neural network. Experimental results indicated that DeepAR is a more accurate and stable approach for identifying AR antagonists in terms of the independent test dataset, by achieving an accuracy of 0.911 and MCC of 0.823. In addition, our proposed framework is able to provide feature importance information by leveraging a popular computational approach, named SHapley Additive exPlanations (SHAP). In the meanwhile, the characterization and analysis of potential AR antagonist candidates were achieved through the SHAP waterfall plot and molecular docking. The analysis inferred that N-heterocyclic moieties, halogenated substituents, and a cyano functional group were significant determinants of potential AR antagonists. Lastly, we implemented an online web server by using DeepAR (at http://pmlabstack.pythonanywhere.com/DeepAR ). We anticipate that DeepAR could be a useful computational tool for community-wide facilitation of AR candidates from a large number of uncharacterized compounds.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nuttapat Anuwongcharoen
- Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
48
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
49
|
Jiang J, Li J, Li J, Pei H, Li M, Zou Q, Lv Z. A Machine Learning Method to Identify Umami Peptide Sequences by Using Multiplicative LSTM Embedded Features. Foods 2023; 12:foods12071498. [PMID: 37048319 PMCID: PMC10094688 DOI: 10.3390/foods12071498] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 04/05/2023] Open
Abstract
Umami peptides enhance the umami taste of food and have good food processing properties, nutritional value, and numerous potential applications. Wet testing for the identification of umami peptides is a time-consuming and expensive process. Here, we report the iUmami-DRLF that uses a logistic regression (LR) method solely based on the deep learning pre-trained neural network feature extraction method, unified representation (UniRep based on multiplicative LSTM), for feature extraction from the peptide sequences. The findings demonstrate that deep learning representation learning significantly enhanced the capability of models in identifying umami peptides and predictive precision solely based on peptide sequence information. The newly validated taste sequences were also used to test the iUmami-DRLF and other predictors, and the result indicates that the iUmami-DRLF has better robustness and accuracy and remains valid at higher probability thresholds. The iUmami-DRLF method can aid further studies on enhancing the umami flavor of food for satisfying the need for an umami-flavored diet.
Collapse
Affiliation(s)
- Jici Jiang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Junxian Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
- Wu Yuzhang Honors College, Sichuan University, Chengdu 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
50
|
Fan Y, Sun G, Pan X. ELMo4m6A: A Contextual Language Embedding-Based Predictor for Detecting RNA N6-Methyladenosine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:944-954. [PMID: 35536814 DOI: 10.1109/tcbb.2022.3173323] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
N6-methyladenosine (m6A) is a universal post-transcriptional modification of RNAs, and it is widely involved in various biological processes. Identifying m6A modification sites accurately is indispensable to further investigate m6A-mediated biological functions. How to better represent RNA sequences is crucial for building effective computational methods for detecting m6A modification sites. However, traditional encoding methods require complex biological prior knowledge and are time-consuming. Furthermore, most of the existing m6A sites prediction methods are limited to single species, and few methods are able to predict m6A sites across different species and tissues. Thus, it is necessary to design a more efficient computational method to predict m6A sites across multiple species and tissues. In this paper, we proposed ELMo4m6A, a contextual language embedding-based method for predicting m6A sites from RNA sequences without any prior knowledge. ELMo4m6A first learns embeddings of RNA sequences using a language model ELMo, then uses a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) to identify m6A sites. The results of 5-fold cross-validation and independent testing demonstrate that ELMo4m6A is superior to state-of-the-art methods. Moreover, we applied integrated gradients to find potential sequence patterns contributing to m6A sites.
Collapse
|