1
|
Yuan S, Zhao C, Liu L, Zhou G. MGDM: Molecular generation using a multinomial diffusion model. Methods 2025; 239:1-9. [PMID: 40049434 DOI: 10.1016/j.ymeth.2025.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Revised: 02/12/2025] [Accepted: 03/03/2025] [Indexed: 03/20/2025] Open
Abstract
Accurate analysis of molecular structures and the rapid generation of valid molecules remain significant challenges in De Novo drug design. In this study, we propose the Multinomial Generated Diffusion Model (MGDM) for molecular generation. This model leverages a multinomial diffusion framework to process discrete data, with a focus on learning the multinomial distribution inherent in the dataset. During the generation process, the model progressively denoises molecules, transitioning from a uniform noise distribution to ultimately produce valid molecular structures. Initially, we generate molecules unconditionally to expand the compound library. In the next phase, we focus on generating molecules with specific properties to assess the model's capacity for conditional generation. For this, we implement a classifier-free guidance strategy, which directs the diffusion model's task without the need for training separate classifier models. To validate the effectiveness of our framework, we conducted experiments using the Molecular Sets (MOSES) dataset. The results demonstrate that, compared to several state-of-the-art methods, MGDM generates valid molecules while achieving superior or comparable performance in terms of novelty and diversity.
Collapse
Affiliation(s)
- Sisi Yuan
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, Charlotte, NC, USA.
| | - Chen Zhao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410086, PR China.
| | - Lin Liu
- School of Information Science and Technology, Yunnan Normal University, Kunming 650500, PR China.
| | - Guifei Zhou
- School of Information Science and Technology, Yunnan Normal University, Kunming 650500, PR China.
| |
Collapse
|
2
|
Li N, Qiao J, Gao F, Wang Y, Shi H, Zhang Z, Cui F, Zhang L, Wei L. GICL: A Cross-Modal Drug Property Prediction Framework Based on Knowledge Enhancement of Large Language Models. J Chem Inf Model 2025. [PMID: 40432191 DOI: 10.1021/acs.jcim.5c00895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2025]
Abstract
Deep learning models have demonstrated their potential in learning effective molecular representations critical for drug property prediction and drug discovery. Despite significant advancements in leveraging multimodal drug molecule semantics, existing approaches often struggle with challenges such as low-quality data and structural complexity. Large language models (LLMs) excel in generating high-quality molecular representations due to their robust characterization capabilities. In this work, we introduce GICL, a cross-modal contrastive learning framework that integrates LLM-derived embeddings with molecular image representations. Specifically, LLMs extract feature representations from the SMILES strings of drug molecules, which are then contrasted with graphical representations of molecular images to achieve a holistic understanding of molecular features. Experimental results demonstrate that GICL achieves state-of-the-art performance on the ADMET task while offering interpretable insights into drug properties, thereby facilitating more efficient drug design and discovery.
Collapse
Affiliation(s)
- Na Li
- School of Computer and Information Engineering, Qilu Institute of Technology, Jinan 250200, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan 250100, China
| | - Fei Gao
- School of Computer and Information Engineering, Qilu Institute of Technology, Jinan 250200, China
| | - Yanling Wang
- School of Computer and Information Engineering, Qilu Institute of Technology, Jinan 250200, China
| | - Hua Shi
- School of Optoelectronic and Communication Engineering, Xiamen University of Technology, Xiamen 361005, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Leyi Wei
- Macao Polytechnic University, Faculty of Applied Science, Centre for Artificial Intelligence Driven Drug Discovery, Macau 999078, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250100, China
| |
Collapse
|
3
|
Chen L, Li Y, Ma Y, Gao L, Yu L. Multiscale graph equivariant diffusion model for 3D molecule design. SCIENCE ADVANCES 2025; 11:eadv0778. [PMID: 40238892 DOI: 10.1126/sciadv.adv0778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 03/07/2025] [Indexed: 04/18/2025]
Abstract
Three-dimensional molecular generation is critical in drug design. However, current methods often rely on point clouds or oversimplified interaction models, limiting their ability to accurately represent molecular structures. To address these challenges, this paper proposes the multiscale graph equivariant diffusion model for 3D molecule design (MD3MD). MD3MD partitions molecular conformations into multiscale graphs, assigning different weights to capture atomic interactions across scales. This framework guides the diffusion process, enabling high-quality 3D molecular generation. Experimental results demonstrate that MD3MD excels in both unconditional and conditional generation tasks, producing diverse, stable, and innovative molecules that meet specified conditions. Visualization highlights MD3MD's ability to learn domain-specific patterns and generate molecules distinct from existing datasets while maintaining distributional consistency. By effectively exploring chemical space, MD3MD surpasses previous methods in generating innovative and chemically diverse molecules, offering a notable advancement in the field of molecular design.
Collapse
Affiliation(s)
- Lu Chen
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Yan Li
- School of Management, Xi'an Polytechnic University, Xi'an 710000, Shaanxi, China
| | - Yanjie Ma
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China
| |
Collapse
|
4
|
Shang Y, Wang Z, Chen Y, Yang X, Ren Z, Zeng X, Xu L. HNF-DDA: subgraph contrastive-driven transformer-style heterogeneous network embedding for drug-disease association prediction. BMC Biol 2025; 23:101. [PMID: 40241152 PMCID: PMC12004644 DOI: 10.1186/s12915-025-02206-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Accepted: 04/03/2025] [Indexed: 04/18/2025] Open
Abstract
BACKGROUND Drug-disease association (DDA) prediction aims to identify potential links between drugs and diseases, facilitating the discovery of new therapeutic potentials and reducing the cost and time associated with traditional drug development. However, existing DDA prediction methods often overlook the global relational information provided by other biological entities, and the complex association structure between drug diseases, limiting the potential correlations of drug and disease embeddings. RESULTS In this study, we propose HNF-DDA, a subgraph contrastive-driven transformer-style heterogeneous network embedding model for DDA prediction. Specifically, HNF-DDA adopts all-pairs message passing strategy to capture the global structure of the network, fully integrating multi-omics information. HNF-DDA also proposes the concept of subgraph contrastive learning to capture the local structure of drug-disease subgraphs, learning the high-order semantic information of nodes. Experimental results on two benchmark datasets demonstrate that HNF-DDA outperforms several state-of-the-art methods. Additionally, it shows superior performance across different dataset splitting schemes, indicating HNF-DDA's capability to generalize to novel drug and disease categories. Case studies for breast cancer and prostate cancer reveal that 9 out of the top 10 predicted candidate drugs for breast cancer and 8 out of the top 10 for prostate cancer have documented therapeutic effects. CONCLUSIONS HNF-DDA incorporates all-pairs message passing and subgraph capture strategies into heterogeneous network embedding, enabling effective learning of drug and disease representations enriched with heterogeneous information, while also demonstrating significant potential for applications in drug repositioning.
Collapse
Affiliation(s)
- Yifan Shang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 305-8577, Japan
| | - Yangyang Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Xinyu Yang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Zhonghao Ren
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, 518055, China.
| |
Collapse
|
5
|
Meng W, Xu X, Xiao Z, Gao L, Yu L. Cancer Drug Sensitivity Prediction Based on Deep Transfer Learning. Int J Mol Sci 2025; 26:2468. [PMID: 40141112 PMCID: PMC11942577 DOI: 10.3390/ijms26062468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2025] [Revised: 02/27/2025] [Accepted: 03/06/2025] [Indexed: 03/28/2025] Open
Abstract
In recent years, many approved drugs have been discovered using phenotypic screening, which elaborates the exact mechanisms of action or molecular targets of drugs. Drug susceptibility prediction is an important type of phenotypic screening. Large-scale pharmacogenomics studies have provided us with large amounts of drug sensitivity data. By analyzing these data using computational methods, we can effectively build models to predict drug susceptibility. However, due to the differences in data distribution among databases, researchers cannot directly utilize data from multiple sources. In this study, we propose a deep transfer learning model. We integrate the genomic characterization of cancer cell lines with chemical information on compounds, combined with the Encyclopedia of Cancer Cell Lines (CCLE) and the Genomics of Cancer Drug Sensitivity (GDSC) datasets, through a domain-adapted approach and predict the half-maximal inhibitory concentrations (IC50 values). Afterward, the validity of the prediction results of our model is verified. This study effectively addresses the challenge of cross-database distribution discrepancies in drug sensitivity prediction by integrating multi-source heterogeneous data and constructing a deep transfer learning model. This model serves as a reliable computational tool for precision drug development. Its widespread application can facilitate the optimization of therapeutic strategies in personalized medicine while also providing technical support for high-throughput drug screening and the discovery of new drug targets.
Collapse
Affiliation(s)
- Weijun Meng
- School of Computer Science and Technology, Xi’an University of Posts & Telecommunications, Xi’an 710071, China;
| | - Xinyu Xu
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (X.X.); (Z.X.); (L.G.)
| | - Zhichao Xiao
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (X.X.); (Z.X.); (L.G.)
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (X.X.); (Z.X.); (L.G.)
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (X.X.); (Z.X.); (L.G.)
| |
Collapse
|
6
|
Ma L, Liu J, Sun W, Zhao C, Yu L. scMFG: a single-cell multi-omics integration method based on feature grouping. BMC Genomics 2025; 26:132. [PMID: 39934664 PMCID: PMC11817349 DOI: 10.1186/s12864-025-11319-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 02/03/2025] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND Recent advancements in methodologies and technologies have enabled the simultaneous measurement of multiple omics data, which provides a comprehensive understanding of cellular heterogeneity. However, existing methods have limitations in accurately identifying cell types while maintaining model interpretability, especially in the presence of noise. METHODS We propose a novel method called scMFG, which leverages feature grouping and group integration techniques for the integration of single-cell multi-omics data. By organizing features with similar characteristics within each omics layer through feature grouping. Furthermore, scMFG ensures a consistent feature grouping approach across different omics layers, promoting comparability of diverse data types. Additionally, scMFG incorporates a matrix factorization-based approach to enable the integrated results remain interpretable. RESULTS We comprehensively evaluated scMFG's performance on four complex real-world datasets generated using diverse sequencing technologies, highlighting its robustness in accurately identifying cell types. Notably, scMFG exhibited superior performance in deciphering cellular heterogeneity at a finer resolution compared to existing methods when applied to simulated datasets. Furthermore, our method proved highly effective in identifying rare cell types, showcasing its robust performance and suitability for detecting low-abundance cellular populations. The interpretability of scMFG was successfully validated through its specific association of outputs with specific cell types or states observed in the neonatal mouse cerebral cortices dataset. Moreover, we demonstrated that scMFG is capable of identifying cell developmental trajectories even in datasets with batch effects. CONCLUSIONS Our work presents a robust framework for the analysis of single-cell multi-omics data, advancing our understanding of cellular heterogeneity in a comprehensive and interpretable manner.
Collapse
Affiliation(s)
- Litian Ma
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Jingtao Liu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Wei Sun
- Department of Rehabilitation Medicine, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China
| | - Chenguang Zhao
- Department of Rehabilitation Medicine, Xijing Hospital, Fourth Military Medical University, Xi'an, 710032, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China.
| |
Collapse
|
7
|
Meng C, Hou Y, Zou Q, Shi L, Su X, Ju Y. Rore: robust and efficient antioxidant protein classification via a novel dimensionality reduction strategy based on learning of fewer features. Genomics Inform 2024; 22:29. [PMID: 39633440 PMCID: PMC11616364 DOI: 10.1186/s44342-024-00026-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2024] [Accepted: 10/03/2024] [Indexed: 12/07/2024] Open
Abstract
In protein identification, researchers increasingly aim to achieve efficient classification using fewer features. While many feature selection methods effectively reduce the number of model features, they often cause information loss caused by merely selecting or discarding features, which limits classifier performance. To address this issue, we present Rore, an algorithm based on a feature-dimensionality reduction strategy. By mapping the original features to a latent space, Rore retains all relevant feature information while using fewer representations of the latent features. This approach significantly preserves the original information and overcomes the information loss problem associated with previous feature selection. Through extensive experimental validation and analysis, Rore demonstrated excellent performance on an antioxidant protein dataset, achieving an accuracy of 95.88% and MCC of 91.78%, using vectors including only 15 features. The Rore algorithm is available online at http://112.124.26.17:8021/Rore .
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, Hohhot, China
| | - Yongqi Hou
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Huangpu District, No. 415, Fengyang Road, Shanghai, China
| | - Xi Su
- Foshan Women and Children Hospital, Foshan, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China.
| |
Collapse
|
8
|
Wu CY, Xu ZX, Li N, Qi DY, Hao ZH, Wu HY, Gao R, Jin YT. Accurately identifying positive and negative regulation of apoptosis using fusion features and machine learning methods. Comput Biol Chem 2024; 113:108207. [PMID: 39265463 DOI: 10.1016/j.compbiolchem.2024.108207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 08/20/2024] [Accepted: 09/06/2024] [Indexed: 09/14/2024]
Abstract
Apoptotic proteins play a crucial role in the apoptosis process, ensuring a balance between cell proliferation and death. Thus, further elucidating the regulatory mechanisms of apoptosis will enhance our understanding of their functions. However, the development of computational methods to accurately identify positive and negative regulation of apoptosis remains a significant challenge. This work proposes a machine learning model based on multi-feature fusion to effectively identify the roles of positive and negative regulation of apoptosis. Initially, we constructed a reliable benchmark dataset containing 200 positive regulation of apoptosis and 241 negative regulation of apoptosis proteins. Subsequently, we developed a classifier that combines the support vector machine (SVM) with pseudo composition of k-spaced amino acid pairs (PseCKSAAP), composition transition distribution (CTD), dipeptide deviation from expected mean (DDE), and PSSM-composition to identify these proteins. Analysis of variance (ANOVA) was employed to select optimized features that could yield the maximum prediction performance. Evaluating the proposed model on independent data revealed and achieved an accuracy of 0.781 with an AUROC of 0.837, demonstrating our model's potent capabilities.
Collapse
Affiliation(s)
- Cheng-Yan Wu
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teacher's College, Baotou 014010, China.
| | - Zhi-Xue Xu
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teacher's College, Baotou 014010, China.
| | - Nan Li
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teacher's College, Baotou 014010, China.
| | - Dan-Yang Qi
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teacher's College, Baotou 014010, China.
| | - Zhi-Hong Hao
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teacher's College, Baotou 014010, China.
| | - Hong-Ye Wu
- Key Laboratory of Magnetism and Magnetic Materials at Universities of Inner Mongolia Autonomous Region, Baotou Teacher's College, Baotou 014010, China.
| | - Ru Gao
- The People's Hospital of Wenjiang, Chengdu, Sichuan 611130, China.
| | - Yan-Ting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
9
|
Ru X, Zhao S, Zou Q, Xu L. Identify potential drug candidates within a high-quality compound search space. Brief Bioinform 2024; 26:bbaf024. [PMID: 39853109 PMCID: PMC11758506 DOI: 10.1093/bib/bbaf024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 12/10/2024] [Accepted: 01/14/2025] [Indexed: 01/26/2025] Open
Abstract
The identification of potential effective drug candidates is a fundamental step in new drug discovery, with profound implications for pharmaceutical research and the healthcare sector. While many computational methods have been developed for such predictions and have yielded promising results, two challenges persist: (i) The cold start problem of new drugs, which increases the difficulty of prediction due to lack of historical data or prior knowledge. (ii) The vastness of the compound search space for potential drug candidates. In this study, we present a promising method that not only enhances the accuracy of identifying potential novel drug candidates but also refines the search space. Drawing inspiration from solutions to the cold start problem in recommender systems, we apply 'learning to rank' techniques to the field of new drug discovery. Furthermore, we propose using three similarity metrics to condense the compound search space into compact yet high-quality spaces, allowing for more efficient screening of potential drug candidates. Experimental results from two widely used datasets demonstrate that our method outperforms other state-of-the-art approaches in the new drug cold-start scenario. Additionally, we have verified that it is feasible to identify potential drug candidates within these high-quality compound search spaces. To our knowledge, this study is the first to address drug cold-start problem in such a confined space, potentially providing valuable insights and guidance for drug screening.
Collapse
Affiliation(s)
- Xiaoqing Ru
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, No. 100, Minjiang Avenue, Smart New Town, Quzhou, Zhejiang Province, 324000, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No. 1, Chengdian Road, Kecheng District, Quzhou, Zhejiang Province, 324003, China
| | - Shulin Zhao
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, No. 100, Minjiang Avenue, Smart New Town, Quzhou, Zhejiang Province, 324000, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No. 1, Chengdian Road, Kecheng District, Quzhou, Zhejiang Province, 324003, China
| | - Lifeng Xu
- The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, No. 100, Minjiang Avenue, Smart New Town, Quzhou, Zhejiang Province, 324000, China
| |
Collapse
|
10
|
Ren J, Guo Z, Qi Y, Zhang Z, Liu L. Prediction of YY1 loop anchor based on multi-omics features. Methods 2024; 232:96-106. [PMID: 39521361 DOI: 10.1016/j.ymeth.2024.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 10/22/2024] [Accepted: 11/06/2024] [Indexed: 11/16/2024] Open
Abstract
The three-dimensional structure of chromatin is crucial for the regulation of gene expression. YY1 promotes enhancer-promoter interactions in a manner analogous to CTCF-mediated chromatin interactions. However, little is known about which YY1 binding sites can form loop anchors. In this study, the LightGBM model was used to predict YY1-loop anchors by integrating multi-omics data. Due to the large imbalance in the number of positive and negative samples, we use AUPRC to reflect the quality of the classifier. The results show that the LightGBM model exhibits strong predictive performance (AUPRC≥0.93). To verify the robustness of the model, the dataset was divided into training and test sets at a 4:1 ratio. The results show that the model performs well for YY1-loop anchor prediction on both the training and independent test sets. Additionally, we ranked the importance of the features and found that the formation of YY1-loop anchors is primarily influenced by the co-binding of transcription factors CTCF, SMC3, and RAD21, as well as histone modifications and sequence context.
Collapse
Affiliation(s)
- Jun Ren
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Zhiling Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yixuan Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China; School of Mathematics and Statistics, Hainan Normal University, Haikou, China; School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zheng Zhang
- Computer Science and Information Systems, Murray State University, Murray, USA
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|
11
|
Duan X, Nie Y, Xie X, Zhang Q, Zhu C, Zhu H, Chen R, Xu J, Zhang J, Yang C, Yu Q, Cai K, Wang Y, Tian W. Sex differences and testosterone interfere with the structure of the gut microbiota through the bile acid signaling pathway. Front Microbiol 2024; 15:1421608. [PMID: 39493843 PMCID: PMC11527610 DOI: 10.3389/fmicb.2024.1421608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 09/26/2024] [Indexed: 11/05/2024] Open
Abstract
Background The gut microbiome has a significant impact on human wellness, contributing to the emergence and progression of a range of health issues including inflammatory and autoimmune conditions, metabolic disorders, cardiovascular problems, and psychiatric disorders. Notably, clinical observations have revealed that these illnesses can display differences in incidence and presentation between genders. The present study aimed to evaluate whether the composition of gut microbiota is associated with sex-specific differences and to elucidate the mechanism. Methods 16S-rRNA-sequencing technology, hormone analysis, gut microbiota transplantation, gonadectomy, and hormone treatment were employed to investigate the correlation between the gut microbiome and sex or sex hormones. Meanwhile, genes and proteins involved bile acid signaling pathway were analyzed both in the liver and ileum tissues. Results The composition and diversity of the microbiota from the jejunum and feces and the level of sex hormones in the serum differed between the sexes in young and middle-aged Sprague Dawley (SD) rats. However, no similar phenomenon was found in geriatric rats. Interestingly, whether in young, middle-aged, or old rats, the composition of the microbiota and bacterial diversity differed between the jejunum and feces in rats. Gut microbiota transplantation, gonadectomy, and hormone replacement also suggested that hormones, particularly testosterone (T), influenced the composition of the gut microbiota in rats. Meanwhile, the mRNA and protein level of genes involved bile acid signaling pathway (specifically SHP, FXR, CYP7A1, and ASBT) exhibited gender-specific differences, and T may play a significant role in mediating the expression of this pathway. Conclusion Sex-specific differences in the structure of the gut microbiota are mediated by T through the bile acid signaling pathway, pointing to potential targets for disease prevention and management techniques by indicating that sex differences and T levels may alter the composition of the gut microbiota via the bile acid signaling pathway.
Collapse
Affiliation(s)
- Xueqing Duan
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Yinli Nie
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Xin Xie
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Qi Zhang
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Chen Zhu
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Han Zhu
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Rui Chen
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Jun Xu
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Jinqiang Zhang
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Changfu Yang
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Qi Yu
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Kun Cai
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| | - Yong Wang
- CAS-Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Weiyi Tian
- School of Basic Medical Sciences, Guizhou University of Traditional Chinese Medicine, Gui Yang, China
| |
Collapse
|
12
|
Chen Y, Wang J, Zou Q, Niu M, Ding Y, Song J, Wang Y. DrugDAGT: a dual-attention graph transformer with contrastive learning improves drug-drug interaction prediction. BMC Biol 2024; 22:233. [PMID: 39396972 PMCID: PMC11472440 DOI: 10.1186/s12915-024-02030-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 10/02/2024] [Indexed: 10/15/2024] Open
Abstract
BACKGROUND Drug-drug interactions (DDIs) can result in unexpected pharmacological outcomes, including adverse drug events, which are crucial for drug discovery. Graph neural networks have substantially advanced our ability to model molecular representations; however, the precise identification of key local structures and the capture of long-distance structural correlations for better DDI prediction and interpretation remain significant challenges. RESULTS Here, we present DrugDAGT, a dual-attention graph transformer framework with contrastive learning for predicting multiple DDI types. The dual-attention graph transformer incorporates attention mechanisms at both the bond and atomic levels, thereby enabling the integration of short and long-range dependencies within drug molecules to pinpoint key local structures essential for DDI discovery. Moreover, DrugDAGT further implements graph contrastive learning to maximize the similarity of representations across different views for better discrimination of molecular structures. Experiments in both warm-start and cold-start scenarios demonstrate that DrugDAGT outperforms state-of-the-art baseline models, achieving superior overall performance. Furthermore, visualization of the learned representations of drug pairs and the attention map provides interpretable insights instead of black-box results. CONCLUSIONS DrugDAGT provides an effective tool for accurately predicting multiple DDI types by identifying key local chemical structures, offering valuable insights for prescribing medications, and guiding drug development. All data and code of our DrugDAGT can be found at https://github.com/codejiajia/DrugDAGT .
Collapse
Affiliation(s)
- Yaojia Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
- Wenzhou Medical University-Monash Biomedicine Discovery Institute Alliance in Clinical and Experimental Biomedicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325035, China
| | - Jiacheng Wang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Mengting Niu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic University, Shenzhen, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia.
- Wenzhou Medical University-Monash Biomedicine Discovery Institute Alliance in Clinical and Experimental Biomedicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325035, China.
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
13
|
Geng G, Wang L, Xu Y, Wang T, Ma W, Duan H, Zhang J, Mao A. MGDDI: A multi-scale graph neural networks for drug-drug interaction prediction. Methods 2024; 228:22-29. [PMID: 38754712 DOI: 10.1016/j.ymeth.2024.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 05/09/2024] [Accepted: 05/12/2024] [Indexed: 05/18/2024] Open
Abstract
Drug-drug interaction (DDI) prediction is crucial for identifying interactions within drug combinations, especially adverse effects due to physicochemical incompatibility. While current methods have made strides in predicting adverse drug interactions, limitations persist. Most methods rely on handcrafted features, restricting their applicability. They predominantly extract information from individual drugs, neglecting the importance of interaction details between drug pairs. To address these issues, we propose MGDDI, a graph neural network-based model for predicting potential adverse drug interactions. Notably, we use a multiscale graph neural network (MGNN) to learn drug molecule representations, addressing substructure size variations and preventing gradient issues. For capturing interaction details between drug pairs, we integrate a substructure interaction learning module based on attention mechanisms. Our experimental results demonstrate MGDDI's superiority in predicting adverse drug interactions, offering a solution to current methodological limitations.
Collapse
Affiliation(s)
- Guannan Geng
- Department of Endocrinology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lizhuang Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yanwei Xu
- Beidahuang Group Neuropsychiatric Hospital, Jiamusi, China; Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Tianshuo Wang
- School of Software, Shandong University, Jinan, China
| | - Wei Ma
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Jiahui Zhang
- Department of Stomatology and Dental Hygiene, The Fourth Affiliated Hospital, Harbin Medical University, Harbin, China.
| | - Anqiong Mao
- The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Department of Anesthesiology, Luzhou, China.
| |
Collapse
|
14
|
Chen S, Gao N, Li C, Zhai F, Jiang X, Zhang P, Guan J, Li K, Xiang R, Ling G. DrugSK: A Stacked Ensemble Learning Framework for Predicting Drug Combinations of Multiple Diseases. J Chem Inf Model 2024; 64:5317-5327. [PMID: 38900583 DOI: 10.1021/acs.jcim.4c00296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Combination therapy is an important direction of continuous exploration in the field of medicine, with the core goals of improving treatment efficacy, reducing adverse reactions, and optimizing clinical outcomes. Machine learning technology holds great promise in improving the prediction of drug synergy combinations. However, most studies focus on single disease-oriented collaborative predictive models or involve excessive feature categories, making it challenging to predict the majority of new drugs. To address these challenges, the DrugSK comprehensive model was developed, which utilizes SMILES-BERT to extract structural information from 3492 drugs and trains on reactions from 48,756 drug combinations. DrugSK is an integrated learning model capable of predicting interactions among various drug categories. First, the primary learner is trained from the initial data set. Random forest, support vector machine, and XGboost model are selected as primary learners and logistic regression as secondary learners. A new data set is then "generated" to train level 2 learners, which can be thought of as a prediction for each model. Finally, the results are filtered using logistic regression. Furthermore, the combination of the new antibacterial drug Drafloxacin with other antibacterial agents was tested. The synergistic effect of Drafloxacin and Isavuconazonium in the fight against Candida albicans has been confirmed, providing enlightenment for the clinical treatment of skin infection. DrugSK's prediction is accurate in practical application and can also predict the probability of the outcome. In addition, the tendency of Drafloxacin and antifungal drugs to be synergistic was found. The development of DrugSK will provide a new blueprint for predicting drug combination synergies.
Collapse
Affiliation(s)
- Siqi Chen
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Nan Gao
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Chunzhi Li
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Fei Zhai
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Xiwei Jiang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Peng Zhang
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| | - Jibin Guan
- Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Kefeng Li
- Center for Artificial Intelligence-Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SR 999708, China
| | - Rongwu Xiang
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
- Liaoning Medical Big Data and Artificial Intelligence Engineering Technology Research Center, Shenyang 110016, China
| | - Guixia Ling
- College of Medical Devices, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
- Wuya College of Innovation, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang 110016, China
| |
Collapse
|
15
|
Basith S, Pham NT, Manavalan B, Lee G. SEP-AlgPro: An efficient allergen prediction tool utilizing traditional machine learning and deep learning techniques with protein language model features. Int J Biol Macromol 2024; 273:133085. [PMID: 38871100 DOI: 10.1016/j.ijbiomac.2024.133085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 05/20/2024] [Accepted: 06/09/2024] [Indexed: 06/15/2024]
Abstract
Allergy is a hypersensitive condition in which individuals develop objective symptoms when exposed to harmless substances at a dose that would cause no harm to a "normal" person. Most current computational methods for allergen identification rely on homology or conventional machine learning using limited set of feature descriptors or validation on specific datasets, making them inefficient and inaccurate. Here, we propose SEP-AlgPro for the accurate identification of allergen protein from sequence information. We analyzed 10 conventional protein-based features and 14 different features derived from protein language models to gauge their effectiveness in differentiating allergens from non-allergens using 15 different classifiers. However, the final optimized model employs top 10 feature descriptors with top seven machine learning classifiers. Results show that the features derived from protein language models exhibit superior discriminative capabilities compared to traditional feature sets. This enabled us to select the most discriminatory baseline models, whose predicted outputs were aggregated and used as input to a deep neural network for the final allergen prediction. Extensive case studies showed that SEP-AlgPro outperforms state-of-the-art predictors in accurately identifying allergens. A user-friendly web server was developed and made freely available at https://balalab-skku.org/SEP-AlgPro/, making it a powerful tool for identifying potential allergens.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea.
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Republic of Korea; Department of Molecular Science and Technology, Ajou University, Suwon 16499, Republic of Korea.
| |
Collapse
|
16
|
Liu Z, Bai T, Liu B, Yu L. MulStack: An ensemble learning prediction model of multilabel mRNA subcellular localization. Comput Biol Med 2024; 175:108289. [PMID: 38688123 DOI: 10.1016/j.compbiomed.2024.108289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 02/28/2024] [Accepted: 03/12/2024] [Indexed: 05/02/2024]
Abstract
Subcellular localization of mRNA is related to protein synthesis, cell polarity, cell movement and other biological regulation mechanisms. The distribution of mRNAs in subcellulars is similar to that of proteins, and most mRNAs are distributed in multiple subcellulars. Recently, some computational methods have been designed to predict the subcellular localization of mRNA. However, these methods only employed a sin-gle level of mRNA features and did not employ the position encoding of nucleotides in mRNA. In this paper, an ensemble learning prediction model is proposed, named MulStack, which is based on random forest and deep learning for multilabel mRNA subcellular localization. The proposed method employs two levels of mRNA features, including sequence-level and residue-level features, and position encoding is employed for the first time in the field of subcellular localization of mRNA. Random forest is employed to learn mRNA sequence-level feature, deep learning is employed to learn mRNA sequence-level feature and mRNA residue-level combined with position encoding. And the outputs of random forest and deep learning model will be weighted sum as the prediction probability. Compared with existing methods, the results show that MulStack is the best in the localization of the nucleus, cytosol and exosome. In addition, position weight matrices (PWMs) are extracted by convolutional neural networks (CNNs) that can be matched with known RNA binding protein motifs. Gene ontology (GO) enrichment analysis shows biological processes, molecular functions and cellular components of mRNA genes. The prediction web server of MulStack is freely accessible at http://bliulab.net/MulStack.
Collapse
Affiliation(s)
- Ziqi Liu
- School of Computer Science and Technology, Xidian University, Xian, 710075, China.
| | - Tao Bai
- School of Mathematics & Computer Science, Yan'an University, Shaanxi, 716000, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xian, 710075, China.
| |
Collapse
|
17
|
Liu L, Jia R, Hou R, Huang C. Prediction of cell-type-specific cohesin-mediated chromatin loops based on chromatin state. Methods 2024; 226:151-160. [PMID: 38670416 DOI: 10.1016/j.ymeth.2024.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/02/2024] [Accepted: 04/18/2024] [Indexed: 04/28/2024] Open
Abstract
Chromatin loop is of crucial importance for the regulation of gene transcription. Cohesin is a type of chromatin-associated protein that mediates the interaction of chromatin through the loop extrusion. Cohesin-mediated chromatin interactions have strong cell-type specificity, posing a challenge for predicting chromatin loops. Existing computational methods perform poorly in predicting cell-type-specific chromatin loops. To address this issue, we propose a random forest model to predict cell-type-specific cohesin-mediated chromatin loops based on chromatin states identified by ChromHMM and the occupancy of related factors. Our results show that chromatin state is responsible for cell-type-specificity of loops. Using only chromatin states as features, the model achieved high accuracy in predicting cell-type-specific loops between two cell types and can be applied to different cell types. Furthermore, when chromatin states are combined with the occurrence frequency of CTCF, RAD21, YY1, and H3K27ac ChIP-seq peaks, more accurate prediction can be achieved. Our feature extraction method provides novel insights into predicting cell-type-specific chromatin loops and reveals the relationship between chromatin state and chromatin loop formation.
Collapse
Affiliation(s)
- Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China.
| | - Ranran Jia
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China.
| | - Rui Hou
- College of Data Science and Application, Inner Mongolia University of Technology, Hohhot 010051, China.
| | - Chengbing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba 623002, China.
| |
Collapse
|
18
|
Le VT, Malik MS, Tseng YH, Lee YC, Huang CI, Ou YY. DeepPLM_mCNN: An approach for enhancing ion channel and ion transporter recognition by multi-window CNN based on features from pre-trained language models. Comput Biol Chem 2024; 110:108055. [PMID: 38555810 DOI: 10.1016/j.compbiolchem.2024.108055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/28/2024] [Accepted: 03/19/2024] [Indexed: 04/02/2024]
Abstract
Accurate classification of membrane proteins like ion channels and transporters is critical for elucidating cellular processes and drug development. We present DeepPLM_mCNN, a novel framework combining Pretrained Language Models (PLMs) and multi-window convolutional neural networks (mCNNs) for effective classification of membrane proteins into ion channels and ion transporters. Our approach extracts informative features from protein sequences by utilizing various PLMs, including TAPE, ProtT5_XL_U50, ESM-1b, ESM-2_480, and ESM-2_1280. These PLM-derived features are then input into a mCNN architecture to learn conserved motifs important for classification. When evaluated on ion transporters, our best performing model utilizing ProtT5 achieved 90% sensitivity, 95.8% specificity, and 95.4% overall accuracy. For ion channels, we obtained 88.3% sensitivity, 95.7% specificity, and 95.2% overall accuracy using ESM-1b features. Our proposed DeepPLM_mCNN framework demonstrates significant improvements over previous methods on unseen test data. This study illustrates the potential of combining PLMs and deep learning for accurate computational identification of membrane proteins from sequence data alone. Our findings have important implications for membrane protein research and drug development targeting ion channels and transporters. The data and source codes in this study are publicly available at the following link: https://github.com/s1129108/DeepPLM_mCNN.
Collapse
Affiliation(s)
- Van-The Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Muhammad-Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Department of Computer Science and Engineering, Karakoram International University, Pakistan
| | - Yi-Hsuan Tseng
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Cheng Lee
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Cheng-I Huang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, 32003, Taiwan.
| |
Collapse
|
19
|
Cheng N, Wang L, Liu Y, Song B, Ding C. HANSynergy: Heterogeneous Graph Attention Network for Drug Synergy Prediction. J Chem Inf Model 2024; 64:4334-4347. [PMID: 38709204 PMCID: PMC11135324 DOI: 10.1021/acs.jcim.4c00003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/07/2024]
Abstract
Drug synergy therapy is a promising strategy for cancer treatment. However, the extensive variety of available drugs and the time-intensive process of determining effective drug combinations through clinical trials pose significant challenges. It requires a reliable method for the rapid and precise selection of drug synergies. In response, various computational strategies have been developed for predicting drug synergies, yet the exploitation of heterogeneous biological network features remains underexplored. In this study, we construct a heterogeneous graph that encompasses diverse biological entities and interactions, utilizing rich data sets from sources, such as DrugCombDB, PubChem, UniProt, and cancer cell line encyclopedia (CCLE). We initialize node feature representations and introduce a novel virtual node to enhance drug representation. Our proposed method, the heterogeneous graph attention network for drug-drug synergy prediction (HANSynergy), has been experimentally validated to demonstrate that the heterogeneous graph attention network can extract key node features, efficiently harness the diversity of information, and further enhance network functionality through the incorporation of a multihead attention mechanism. In the comparative experiment, the highest accuracy (Acc) and area under the curve (AUC) are 0.877 and 0.947, respectively, in DrugCombDB_early data set, demonstrating the superiority of HANSynergy over the competing methods. Moreover, protein-protein interactions are important in understanding the mechanism of action of drugs. The heterogeneous attention mechanism facilitates protein-protein interaction analysis. By analyzing the changes of attention weight before and after heterogeneous network training, we investigated proteins that may be associated with drug combinations. Additionally, case studies align our findings with existing research, underscoring the potential of HANSynergy in drug synergy prediction. This advancement not only contributes to the burgeoning field of drug synergy prediction but also holds the potential to provide valuable insights and uncover new drug synergies for combating cancer.
Collapse
Affiliation(s)
- Ning Cheng
- School
of Informatics, Hunan University of Chinese
Medicine, Changsha, Hunan 410208, China
| | - Li Wang
- Degree
Programs in Systems and information Engineering, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | - Yiping Liu
- College
of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Bosheng Song
- College
of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Changsong Ding
- School
of Informatics, Hunan University of Chinese
Medicine, Changsha, Hunan 410208, China
- Big
Data Analysis Laboratory of Traditional Chinese Medicine, Hunan University of Chinese Medicine, Changsha, Hunan 410208, China
| |
Collapse
|
20
|
Gao M, Zhang D, Chen Y, Zhang Y, Wang Z, Wang X, Li S, Guo Y, Webb GI, Nguyen ATN, May L, Song J. GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction. Comput Biol Med 2024; 173:108339. [PMID: 38547658 DOI: 10.1016/j.compbiomed.2024.108339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/05/2024] [Accepted: 03/17/2024] [Indexed: 04/17/2024]
Abstract
The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.
Collapse
Affiliation(s)
- Mengmeng Gao
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Daokun Zhang
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia.
| | - Yi Chen
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | - Yiwen Zhang
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Shanshan Li
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Yuming Guo
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Lauren May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia.
| |
Collapse
|
21
|
Rafiei F, Zeraati H, Abbasi K, Razzaghi P, Ghasemi JB, Parsaeian M, Masoudi-Nejad A. CFSSynergy: Combining Feature-Based and Similarity-Based Methods for Drug Synergy Prediction. J Chem Inf Model 2024; 64:2577-2585. [PMID: 38514966 DOI: 10.1021/acs.jcim.3c01486] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Drug synergy prediction plays a vital role in cancer treatment. Because experimental approaches are labor-intensive and expensive, computational-based approaches get more attention. There are two types of computational methods for drug synergy prediction: feature-based and similarity-based. In feature-based methods, the main focus is to extract more discriminative features from drug pairs and cell lines to pass to the task predictor. In similarity-based methods, the similarities among all drugs and cell lines are utilized as features and fed into the task predictor. In this work, a novel approach, called CFSSynergy, that combines these two viewpoints is proposed. First, a discriminative representation is extracted for paired drugs and cell lines as input. We have utilized transformer-based architecture for drugs. For cell lines, we have created a similarity matrix between proteins using the Node2Vec algorithm. Then, the new cell line representation is computed by multiplying the protein-protein similarity matrix and the initial cell line representation. Next, we compute the similarity between unique drugs and unique cells using the learned representation for paired drugs and cell lines. Then, we compute a new representation for paired drugs and cell lines based on the similarity-based features and the learned features. Finally, these features are fed to XGBoost as a task predictor. Two well-known data sets were used to evaluate the performance of our proposed method: DrugCombDB and OncologyScreen. The CFSSynergy approach consistently outperformed existing methods in comparative evaluations. This substantiates the efficacy of our approach in capturing complex synergistic interactions between drugs and cell lines, setting it apart from conventional similarity-based or feature-based methods.
Collapse
Affiliation(s)
- Fatemeh Rafiei
- Department of Epidemiology and Biostatistics, School of Health, Tehran University of Medical Sciences, Tehran 14167-53955, Iran
| | - Hojjat Zeraati
- Department of Epidemiology and Biostatistics, School of Health, Tehran University of Medical Sciences, Tehran 14167-53955, Iran
| | - Karim Abbasi
- Laboratory of System Biology, Bioinformatics & Artificial Intelligence in Medicine (LBB&AI), Faculty of Mathematics and Computer Science, Kharazmi University, Tehran 14588-89694, Iran
| | - Parvin Razzaghi
- Department of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan 45137-66731, Iran
| | - Jahan B Ghasemi
- Chemistry Department, Faculty of Chemistry, School of Sciences, University of Tehran, Tehran 14174-66191, Iran
| | - Mahboubeh Parsaeian
- Department of Epidemiology and Biostatistics, School of Health, Tehran University of Medical Sciences, Tehran 14167-53955, Iran
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, U.K
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran 13145-1365, Iran
| |
Collapse
|
22
|
Ren L, Huang D, Liu H, Ning L, Cai P, Yu X, Zhang Y, Luo N, Lin H, Su J, Zhang Y. Applications of single‑cell omics and spatial transcriptomics technologies in gastric cancer (Review). Oncol Lett 2024; 27:152. [PMID: 38406595 PMCID: PMC10885005 DOI: 10.3892/ol.2024.14285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 01/19/2024] [Indexed: 02/27/2024] Open
Abstract
Gastric cancer (GC) is a prominent contributor to global cancer-related mortalities, and a deeper understanding of its molecular characteristics and tumor heterogeneity is required. Single-cell omics and spatial transcriptomics (ST) technologies have revolutionized cancer research by enabling the exploration of cellular heterogeneity and molecular landscapes at the single-cell level. In the present review, an overview of the advancements in single-cell omics and ST technologies and their applications in GC research is provided. Firstly, multiple single-cell omics and ST methods are discussed, highlighting their ability to offer unique insights into gene expression, genetic alterations, epigenomic modifications, protein expression patterns and cellular location in tissues. Furthermore, a summary is provided of key findings from previous research on single-cell omics and ST methods used in GC, which have provided valuable insights into genetic alterations, tumor diagnosis and prognosis, tumor microenvironment analysis, and treatment response. In summary, the application of single-cell omics and ST technologies has revealed the levels of cellular heterogeneity and the molecular characteristics of GC, and holds promise for improving diagnostics, personalized treatments and patient outcomes in GC.
Collapse
Affiliation(s)
- Liping Ren
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, Sichuan 611844, P.R. China
| | - Danni Huang
- Department of Radiology, Central South University Xiangya School of Medicine Affiliated Haikou People's Hospital, Haikou, Hainan 570208, P.R. China
| | - Hongjiang Liu
- School of Computer Science and Technology, Aba Teachers College, Aba, Sichuan 624099, P.R. China
| | - Lin Ning
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, Sichuan 611844, P.R. China
| | - Peiling Cai
- School of Basic Medical Sciences, Chengdu University, Chengdu, Sichuan 610106, P.R. China
| | - Xiaolong Yu
- Hainan Yazhou Bay Seed Laboratory, Sanya Nanfan Research Institute, Material Science and Engineering Institute of Hainan University, Sanya, Hainan 572025, P.R. China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan 611137, P.R. China
| | - Nanchao Luo
- School of Computer Science and Technology, Aba Teachers College, Aba, Sichuan 624099, P.R. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P.R. China
| | - Jinsong Su
- Research Institute of Integrated Traditional Chinese Medicine and Western Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, Sichuan 611137, P.R. China
| | - Yinghui Zhang
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, Sichuan 611844, P.R. China
| |
Collapse
|
23
|
Zhang ZY, Zhang Z, Ye X, Sakurai T, Lin H. A BERT-based model for the prediction of lncRNA subcellular localization in Homo sapiens. Int J Biol Macromol 2024; 265:130659. [PMID: 38462114 DOI: 10.1016/j.ijbiomac.2024.130659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/19/2024] [Accepted: 03/04/2024] [Indexed: 03/12/2024]
Abstract
Understanding the subcellular localization of lncRNAs is crucial for comprehending their regulation activities. The conventional detection of lncRNA subcellular location usually uses in situ detection techniques, which are resource intensive. Some machine learning-based algorithms have been proposed for lncRNA subcellular location prediction in mammals. However, due to the low level of conservation of lncRNA sequence, the performance of cross-species models remains unsatisfactory. In this study, we curated a novel dataset containing subcellular location information of lncRNAs in Homo sapiens. Subsequently, based on the BERT pre-trained language algorithm, we developed a model for lncRNA subcellular location prediction. Our model achieved a micro-average area under the receiver operating characteristic (AUROC) of 0.791 on the training set and an AUROC of 0.700 on the testing nucleus set. Additionally, we conducted cross-species validation and motif discovery to further investigate underlying patterns. In summary, our study provides valuable guidance and computational analysis tools for exploring the mechanisms of lncRNA subcellular localization and the dynamic spatial changes of RNA in abnormal physiological states.
Collapse
Affiliation(s)
- Zhao-Yue Zhang
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba 3058577, Japan
| | - Zheng Zhang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan.
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Hao Lin
- Center for Information Biology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
24
|
Gu X, Liu J, Yu Y, Xiao P, Ding Y. MFD-GDrug: multimodal feature fusion-based deep learning for GPCR-drug interaction prediction. Methods 2024; 223:75-82. [PMID: 38286333 DOI: 10.1016/j.ymeth.2024.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 01/14/2024] [Accepted: 01/26/2024] [Indexed: 01/31/2024] Open
Abstract
The accurate identification of drug-protein interactions (DPIs) is crucial in drug development, especially concerning G protein-coupled receptors (GPCRs), which are vital targets in drug discovery. However, experimental validation of GPCR-drug pairings is costly, prompting the need for accurate predictive methods. To address this, we propose MFD-GDrug, a multimodal deep learning model. Leveraging the ESM pretrained model, we extract protein features and employ a CNN for protein feature representation. For drugs, we integrated multimodal features of drug molecular structures, including three-dimensional features derived from Mol2vec and the topological information of drug graph structures extracted through Graph Convolutional Neural Networks (GCN). By combining structural characterizations and pretrained embeddings, our model effectively captures GPCR-drug interactions. Our tests on leading GPCR-drug interaction datasets show that MFD-GDrug outperforms other methods, demonstrating superior predictive accuracy.
Collapse
Affiliation(s)
- Xingyue Gu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yue Yu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| | - Pengfeng Xiao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China.
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China.
| |
Collapse
|
25
|
Liu M, Wu T, Li X, Zhu Y, Chen S, Huang J, Zhou F, Liu H. ACPPfel: Explainable deep ensemble learning for anticancer peptides prediction based on feature optimization. Front Genet 2024; 15:1352504. [PMID: 38487252 PMCID: PMC10937565 DOI: 10.3389/fgene.2024.1352504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/19/2024] [Indexed: 03/17/2024] Open
Abstract
Background: Cancer is a significant global health problem that continues to cause a high number of deaths worldwide. Traditional cancer treatments often come with risks that can compromise the functionality of vital organs. As a potential alternative to these conventional therapies, Anticancer peptides (ACPs) have garnered attention for their small size, high specificity, and reduced toxicity, making them as a promising option for cancer treatments. Methods: However, the process of identifying effective ACPs through wet-lab screening experiments is time-consuming and requires a lot of labor. To overcome this challenge, a deep ensemble learning method is constructed to predict anticancer peptides (ACPs) in this study. To evaluate the reliability of the framework, four different datasets are used in this study for training and testing. During the training process of the model, integration of feature selection methods, feature dimensionality reduction measures, and optimization of the deep ensemble model are carried out. Finally, we explored the interpretability of features that affected the final prediction results and built a web server platform to facilitate anticancer peptides prediction, which can be used by all researchers for further studies. This web server can be accessed at http://lmylab.online:5001/. Results: The result of this study achieves an accuracy rate of 98.53% and an AUC (Area under Curve) value of 0.9972 on the ACPfel dataset, it has improvements on other datasets as well.
Collapse
Affiliation(s)
- Mingyou Liu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Tao Wu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
| | - Xue Li
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Yingxue Zhu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Sen Chen
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Fengfeng Zhou
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Hongmei Liu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
26
|
Liang Z, Lin C, Tan G, Li J, He Y, Cai S. A low-cost machine learning framework for predicting drug-drug interactions based on fusion of multiple features and a parameter self-tuning strategy. Phys Chem Chem Phys 2024; 26:6300-6315. [PMID: 38305788 DOI: 10.1039/d4cp00039k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Poly-drug therapy is now recognized as a crucial treatment, and the analysis of drug-drug interactions (DDIs) offers substantial theoretical support and guidance for its implementation. Predicting potential DDIs using intelligent algorithms is an emerging approach in pharmacological research. However, the existing supervised models and deep learning-based techniques still have several limitations. This paper proposes a novel DDI analysis and prediction framework called the Multi-View Semi-supervised Graph-based (MVSG) framework, which provides a comprehensive judgment by integrating multiple DDI features and functions without any time-consuming training process. Unlike conventional approaches, MVSG can search for the most suitable similarity (or distance) measurement among DDI data and construct graph structures for each feature. By employing a parameter self-tuning strategy, MVSG fuses multiple graphs according to the contributions of features' information. The actual anticancer drug data are extracted from the authoritative public database for evaluating the effectiveness of our framework, including 904 drugs, 7730 DDI records and 19 types of drug interactions. Validation results indicate that the prediction is more accurate when multiple features are adopted by our framework. In comparison to conventional machine learning techniques, MVSG can achieve higher performance even with less labeled data and without a training process. Finally, MVSG is employed to narrow down the search for potential valuable combinations.
Collapse
Affiliation(s)
- Zexiao Liang
- School of Integrated Circuits, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China.
| | - Canxin Lin
- School of Computer Science and Technology, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China
| | - Guoliang Tan
- School of Automation, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China
| | - Jianzhong Li
- School of Integrated Circuits, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China.
| | - Yan He
- School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China
| | - Shuting Cai
- School of Integrated Circuits, Guangdong University of Technology, 100 Waihuan Xi Road, Panyu District, Guangzhou, 510006, Guangdong, China.
| |
Collapse
|
27
|
Xie X, Wu C, Hao Y, Wang T, Yang Y, Cai P, Zhang Y, Huang J, Deng K, Yan D, Lin H. Benefits and risks of drug combination therapy for diabetes mellitus and its complications: a comprehensive review. Front Endocrinol (Lausanne) 2023; 14:1301093. [PMID: 38179301 PMCID: PMC10766371 DOI: 10.3389/fendo.2023.1301093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Accepted: 11/27/2023] [Indexed: 01/06/2024] Open
Abstract
Diabetes is a chronic metabolic disease, and its therapeutic goals focus on the effective management of blood glucose and various complications. Drug combination therapy has emerged as a comprehensive treatment approach for diabetes. An increasing number of studies have shown that, compared with monotherapy, combination therapy can bring significant clinical benefits while controlling blood glucose, weight, and blood pressure, as well as mitigating damage from certain complications and delaying their progression in diabetes, including both type 1 diabetes (T1D), type 2 diabetes (T2D) and related complications. This evidence provides strong support for the recommendation of combination therapy for diabetes and highlights the importance of combined treatment. In this review, we first provided a brief overview of the phenotype and pathogenesis of diabetes and discussed several conventional anti-diabetic medications currently used for the treatment of diabetes. We then reviewed several clinical trials and pre-clinical animal experiments on T1D, T2D, and their common complications to evaluate the efficacy and safety of different classes of drug combinations. In general, combination therapy plays a pivotal role in the management of diabetes. Integrating the effectiveness of multiple drugs enables more comprehensive and effective control of blood glucose without increasing the risk of hypoglycemia or other serious adverse events. However, specific treatment regimens should be tailored to individual patients and implemented under the guidance of healthcare professionals.
Collapse
Affiliation(s)
- Xueqin Xie
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Changchun Wu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuduo Hao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Tianyu Wang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuhe Yang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Peiling Cai
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Jian Huang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Kejun Deng
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dan Yan
- Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Hao Lin
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
28
|
Li J, Zhang H, Wang X, Wang H, Hao J, Bai G. Inpainting Saturation Artifact in Anterior Segment Optical Coherence Tomography. SENSORS (BASEL, SWITZERLAND) 2023; 23:9439. [PMID: 38067812 PMCID: PMC10708580 DOI: 10.3390/s23239439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 11/22/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023]
Abstract
The cornea is an important refractive structure in the human eye. The corneal segmentation technique provides valuable information for clinical diagnoses, such as corneal thickness. Non-contact anterior segment optical coherence tomography (AS-OCT) is a prevalent ophthalmic imaging technique that can visualize the anterior and posterior surfaces of the cornea. Nonetheless, during the imaging process, saturation artifacts are commonly generated due to the tangent of the corneal surface at that point, which is normal to the incident light source. This stripe-shaped saturation artifact covers the corneal surface, causing blurring of the corneal edge, reducing the accuracy of corneal segmentation. To settle this matter, an inpainting method that introduces structural similarity and frequency loss is proposed to remove the saturation artifact in AS-OCT images. Specifically, the structural similarity loss reconstructs the corneal structure and restores corneal textural details. The frequency loss combines the spatial domain with the frequency domain to ensure the overall consistency of the image in both domains. Furthermore, the performance of the proposed method in corneal segmentation tasks is evaluated, and the results indicate a significant benefit for subsequent clinical analysis.
Collapse
Affiliation(s)
| | - He Zhang
- Electronics Information Engineering College, Changchun Univesity, Changchun 130022, China; (J.L.); (X.W.); (H.W.); (J.H.); (G.B.)
| | | | | | | | | |
Collapse
|
29
|
Pham NT, Phan LT, Seo J, Kim Y, Song M, Lee S, Jeon YJ, Manavalan B. Advancing the accuracy of SARS-CoV-2 phosphorylation site detection via meta-learning approach. Brief Bioinform 2023; 25:bbad433. [PMID: 38058187 PMCID: PMC10753650 DOI: 10.1093/bib/bbad433] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 10/30/2023] [Accepted: 11/05/2023] [Indexed: 12/08/2023] Open
Abstract
The worldwide appearance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has generated significant concern and posed a considerable challenge to global health. Phosphorylation is a common post-translational modification that affects many vital cellular functions and is closely associated with SARS-CoV-2 infection. Precise identification of phosphorylation sites could provide more in-depth insight into the processes underlying SARS-CoV-2 infection and help alleviate the continuing COVID-19 crisis. Currently, available computational tools for predicting these sites lack accuracy and effectiveness. In this study, we designed an innovative meta-learning model, Meta-Learning for Serine/Threonine Phosphorylation (MeL-STPhos), to precisely identify protein phosphorylation sites. We initially performed a comprehensive assessment of 29 unique sequence-derived features, establishing prediction models for each using 14 renowned machine learning methods, ranging from traditional classifiers to advanced deep learning algorithms. We then selected the most effective model for each feature by integrating the predicted values. Rigorous feature selection strategies were employed to identify the optimal base models and classifier(s) for each cell-specific dataset. To the best of our knowledge, this is the first study to report two cell-specific models and a generic model for phosphorylation site prediction by utilizing an extensive range of sequence-derived features and machine learning algorithms. Extensive cross-validation and independent testing revealed that MeL-STPhos surpasses existing state-of-the-art tools for phosphorylation site prediction. We also developed a publicly accessible platform at https://balalab-skku.org/MeL-STPhos. We believe that MeL-STPhos will serve as a valuable tool for accelerating the discovery of serine/threonine phosphorylation sites and elucidating their role in post-translational regulation.
Collapse
Affiliation(s)
- Nhat Truong Pham
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Le Thi Phan
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Jimin Seo
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Yeonwoo Kim
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Minkyung Song
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Sukchan Lee
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Young-Jun Jeon
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Balachandran Manavalan
- Department of Integrative Biotechnology and of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| |
Collapse
|
30
|
Bai T, Liu B. ncRNALocate-EL: a multi-label ncRNA subcellular locality prediction model based on ensemble learning. Brief Funct Genomics 2023; 22:442-452. [PMID: 37122147 DOI: 10.1093/bfgp/elad007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 12/31/2022] [Accepted: 01/31/2023] [Indexed: 05/02/2023] Open
Abstract
Subcellular localizations of ncRNAs are associated with specific functions. Currently, an increasing number of biological researchers are focusing on computational approaches to identify subcellular localizations of ncRNAs. However, the performance of the existing computational methods is low and needs to be further studied. First, most prediction models are trained with outdated databases. Second, only a few predictors can identify multiple subcellular localizations simultaneously. In this work, we establish three human ncRNA subcellular datasets based on the latest RNALocate, including lncRNA, miRNA and snoRNA, and then we propose a novel multi-label classification model based on ensemble learning called ncRNALocate-EL to identify multi-label subcellular localizations of three ncRNAs. The results show that the ncRNALocate-EL outperforms previous methods. Our method achieved an average precision of 0.709,0.977 and 0.730 on three human ncRNA datasets. The web server of ncRNALocate-EL has been established, which can be accessed at https://bliulab.net/ncRNALocate-EL.
Collapse
|
31
|
Zou X, Ren L, Cai P, Zhang Y, Ding H, Deng K, Yu X, Lin H, Huang C. Accurately identifying hemagglutinin using sequence information and machine learning methods. Front Med (Lausanne) 2023; 10:1281880. [PMID: 38020152 PMCID: PMC10644030 DOI: 10.3389/fmed.2023.1281880] [Citation(s) in RCA: 58] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction Hemagglutinin (HA) is responsible for facilitating viral entry and infection by promoting the fusion between the host membrane and the virus. Given its significance in the process of influenza virus infestation, HA has garnered attention as a target for influenza drug and vaccine development. Thus, accurately identifying HA is crucial for the development of targeted vaccine drugs. However, the identification of HA using in-silico methods is still lacking. This study aims to design a computational model to identify HA. Methods In this study, a benchmark dataset comprising 106 HA and 106 non-HA sequences were obtained from UniProt. Various sequence-based features were used to formulate samples. By perform feature optimization and inputting them four kinds of machine learning methods, we constructed an integrated classifier model using the stacking algorithm. Results and discussion The model achieved an accuracy of 95.85% and with an area under the receiver operating characteristic (ROC) curve of 0.9863 in the 5-fold cross-validation. In the independent test, the model exhibited an accuracy of 93.18% and with an area under the ROC curve of 0.9793. The code can be found from https://github.com/Zouxidan/HA_predict.git. The proposed model has excellent prediction performance. The model will provide convenience for biochemical scholars for the study of HA.
Collapse
Affiliation(s)
- Xidan Zou
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Liping Ren
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Peiling Cai
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Kejun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaolong Yu
- School of Materials Science and Engineering, Hainan University, Haikou, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chengbing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| |
Collapse
|
32
|
Wan H, Zhang Y, Huang S. Prediction of thermophilic protein using 2-D general series correlation pseudo amino acid features. Methods 2023; 218:141-148. [PMID: 37604248 DOI: 10.1016/j.ymeth.2023.08.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/08/2023] [Accepted: 08/18/2023] [Indexed: 08/23/2023] Open
Abstract
The demand for thermophilic protein has been increasing in protein engineering recently. Many machine-learning methods for identifying thermophilic proteins have emerged during this period. However, most machine learning-based thermophilic protein identification studies have only focused on accuracy. The relationship between the features' meaning and the proteins' physicochemical properties has yet to be studied in depth. In this article, we focused on the relationship between the features and the thermal stability of thermophilic proteins. This method used 2-D general series correlation pseudo amino acid (SC-PseAAC-General) features and realized accuracy of 82.76% using the J48 classifier. In addition, this research found the presence of higher frequencies of glutamic acid in thermophilic proteins, which help thermophilic proteins maintain their thermal stability by forming hydrogen bonds and salt bridges that prevent denaturation at high temperatures.
Collapse
Affiliation(s)
- Hao Wan
- College of Life Science, Qingdao University, Qingdao 266071, China.
| | - Yanan Zhang
- College of Life Science, Qingdao University, Qingdao 266071, China
| | - Shibo Huang
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| |
Collapse
|
33
|
Momanyi BM, Zulfiqar H, Grace-Mercure BK, Ahmed Z, Ding H, Gao H, Liu F. CFNCM: Collaborative filtering neighborhood-based model for predicting miRNA-disease associations. Comput Biol Med 2023; 163:107165. [PMID: 37315383 DOI: 10.1016/j.compbiomed.2023.107165] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 05/31/2023] [Accepted: 06/08/2023] [Indexed: 06/16/2023]
Abstract
MicroRNAs have a significant role in the emergence of various human disorders. Consequently, it is essential to understand the existing interactions between miRNAs and diseases, as this will help scientists better study and comprehend the diseases' biological mechanisms. Findings can be employed as biomarkers or drug targets to advance the detection, diagnosis, and treatment of complex human disorders by foretelling possible disease-related miRNAs. This study proposed a computational model for predicting potential miRNA-disease associations called the Collaborative Filtering Neighborhood-based Classification Model (CFNCM), in light of the shortcomings of conventional and biological experiments, which are expensive and time-consuming. The model generated integrated miRNA and disease similarity matrices using the validated associations and miRNA and disease similarity information and used them as the input features for CFNCM. To produce class labels, we first determined the association scores for brand-new pairs using user-based collaborative filtering. With zero as the threshold, the associations with scores >0 were labelled 1, indicating a potential positive association, otherwise, it is marked as 0. Then, we developed classification models using various machine-learning algorithms. By comparison, we discovered that the support vector machine (SVM) produced the best AUC of 0.96 with 10-fold cross-validation through the GridSearchCV technique for identifying optimal parameter values. In addition, the models were evaluated and verified by analyzing the top 50 breast and lung neoplasms-related miRNAs, of which 46 and 47 associations were verified in two authoritative databases, dbDEMC and miR2Disease.
Collapse
Affiliation(s)
- Biffon Manyura Momanyi
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, 313001, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Zahoor Ahmed
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, 313001, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Hui Gao
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China.
| |
Collapse
|
34
|
Xu X, Gao L, Yu L. GOLF-Net: Global and local association fusion network for COVID-19 lung infection segmentation. Comput Biol Med 2023; 164:107361. [PMID: 37595522 DOI: 10.1016/j.compbiomed.2023.107361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/27/2023] [Accepted: 08/12/2023] [Indexed: 08/20/2023]
Abstract
The global spread of the Corona Virus Disease 2019 (COVID-19) has caused significant health hazards, leading researchers to explore new methods for detecting lung infections that can supplement molecular diagnosis. Computer tomography (CT) has emerged as a promising tool, although accurately segmenting infected areas in COVID-19 CT scans, especially given the limited available data, remains a challenge for deep learning models. To address this issue, we propose a novel segmentation network, the GlObal and Local association Fusion Network (GOLF-Net), that combines global and local features from Convolutional Neural Networks and Transformers, respectively. Our network leverages attention mechanisms to enhance the correlation and representation of local features, improving the accuracy of infected area segmentation. Additionally, we implement transfer learning to pretrain our network parameters, providing a robust solution to the issue of limited COVID-19 CT data. Our experimental results demonstrate that the segmentation performance of our network exceeds that of most existing models, with a Dice coefficient of 95.09% and an IoU of 92.58%. © 2014 Hosting by Elsevier B.V. All rights reserved.
Collapse
Affiliation(s)
- Xinyu Xu
- School of Computer Science and Technology, Xidian University, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, China.
| |
Collapse
|
35
|
Teng S, Yin C, Wang Y, Chen X, Yan Z, Cui L, Wei L. MolFPG: Multi-level fingerprint-based Graph Transformer for accurate and robust drug toxicity prediction. Comput Biol Med 2023; 164:106904. [PMID: 37453376 DOI: 10.1016/j.compbiomed.2023.106904] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/20/2023] [Accepted: 04/10/2023] [Indexed: 07/18/2023]
Abstract
Drug toxicity prediction is essential to drug development, which can help screen compounds with potential toxicity and reduce the cost and risk of animal experiments and clinical trials. However, traditional handcrafted feature-based and molecular-graph-based approaches are insufficient for molecular representation learning. To address the problem, we developed an innovative molecular fingerprint Graph Transformer framework (MolFPG) with a global-aware module for interpretable toxicity prediction. Our approach encodes compounds using multiple molecular fingerprinting techniques and integrates Graph Transformer-based molecular representation for feature learning and toxic prediction. Experimental results show that our proposed approach has high accuracy and reliability in predicting drug toxicity. In addition, we explored the relationship between drug features and toxicity through an interpretive analysis approach, which improved the interpretability of the approach. Our results highlight the potential of Graph Transformers and multi-level fingerprints for accelerating the drug discovery process by reliably, effectively alarming drug safety. We believe that our study will provide vital support and reference for further development in the field of drug development and toxicity assessment.
Collapse
Affiliation(s)
- Saisai Teng
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Chenglin Yin
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | | | - Zhongmin Yan
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| |
Collapse
|
36
|
Meng C, Pei Y, Zou Q, Yuan L. DP-AOP: A novel SVM-based antioxidant proteins identifier. Int J Biol Macromol 2023; 247:125499. [PMID: 37414318 DOI: 10.1016/j.ijbiomac.2023.125499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 06/01/2023] [Accepted: 06/19/2023] [Indexed: 07/08/2023]
Abstract
The identification of antioxidant proteins is a challenging yet meaningful task, as they can protect against the damage caused by some free radicals. In addition to time-consuming, laborious, and expensive experimental identification methods, efficient identification of antioxidant proteins through machine learning algorithms has become increasingly common. In recent years, researchers have proposed models for identifying antioxidant proteins; unfortunately, although the accuracy of models is already high, their sensitivity is too low, indicating the possibility of overfitting in the model. Therefore, we developed a new model called DP-AOP for the recognition of antioxidant proteins. We used the SMOTE algorithm to balance the dataset, selected Wei's proposed feature extraction algorithm to obtain 473 dimensional feature vectors, and based on the sorting function in MRMD, scored and ranked each feature to obtain a feature set with contribution values ranging from high to low. To effectively reduce the feature dimension, we combined the dynamic programming idea to make the local eight features the optimal subset. After obtaining the 36 dimensional feature vectors, we finally selected 17 features through experimental analysis. The SVM classification algorithm was used to implement the model through the libsvm tool. The model achieved satisfactory performance, with an accuracy rate of 91.076 %, SN of 96.4 %, SP of 85.8 %, MCC of 82.6 %, and F1 core of 91.5 %. Furthermore, we built a free web server to facilitate researchers' subsequent unfolding studies of antioxidant protein recognition. The website is http://112.124.26.17:8003/#/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China.
| | - Yue Pei
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, China.
| |
Collapse
|
37
|
Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open
Abstract
As the volume of protein sequence and structure data grows rapidly, the functions of the overwhelming majority of proteins cannot be experimentally determined. Automated annotation of protein function at a large scale is becoming increasingly important. Existing computational prediction methods are typically based on expanding the relatively small number of experimentally determined functions to large collections of proteins with various clues, including sequence homology, protein-protein interaction, gene co-expression, etc. Although there has been some progress in protein function prediction in recent years, the development of accurate and reliable solutions still has a long way to go. Here we exploit AlphaFold predicted three-dimensional structural information, together with other non-structural clues, to develop a large-scale approach termed PredGO to annotate Gene Ontology (GO) functions for proteins. We use a pre-trained language model, geometric vector perceptrons and attention mechanisms to extract heterogeneous features of proteins and fuse these features for function prediction. The computational results demonstrate that the proposed method outperforms other state-of-the-art approaches for predicting GO functions of proteins in terms of both coverage and accuracy. The improvement of coverage is because the number of structures predicted by AlphaFold is greatly increased, and on the other hand, PredGO can extensively use non-structural information for functional prediction. Moreover, we show that over 205 000 ($\sim $100%) entries in UniProt for human are annotated by PredGO, over 186 000 ($\sim $90%) of which are based on predicted structure. The webserver and database are available at http://predgo.denglab.org/.
Collapse
Affiliation(s)
- Rongtao Zheng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| |
Collapse
|
38
|
Deng Y, Ma S, Li J, Zheng B, Lv Z. Using the Random Forest for Identifying Key Physicochemical Properties of Amino Acids to Discriminate Anticancer and Non-Anticancer Peptides. Int J Mol Sci 2023; 24:10854. [PMID: 37446031 PMCID: PMC10341712 DOI: 10.3390/ijms241310854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/17/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Anticancer peptides (ACPs) represent a promising new therapeutic approach in cancer treatment. They can target cancer cells without affecting healthy tissues or altering normal physiological functions. Machine learning algorithms have increasingly been utilized for predicting peptide sequences with potential ACP effects. This study analyzed four benchmark datasets based on a well-established random forest (RF) algorithm. The peptide sequences were converted into 566 physicochemical features extracted from the amino acid index (AAindex) library, which were then subjected to feature selection using four methods: light gradient-boosting machine (LGBM), analysis of variance (ANOVA), chi-squared test (Chi2), and mutual information (MI). Presenting and merging the identified features using Venn diagrams, 19 key amino acid physicochemical properties were identified that can be used to predict the likelihood of a peptide sequence functioning as an ACP. The results were quantified by performance evaluation metrics to determine the accuracy of predictions. This study aims to enhance the efficiency of designing peptide sequences for cancer treatment.
Collapse
Affiliation(s)
- Yiting Deng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Shuhan Ma
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China;
| | - Bowen Zheng
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China; (Y.D.); (S.M.); (B.Z.)
| |
Collapse
|
39
|
Deng L, Jiang Y, Hu X, Zheng R, Huang Z, Zhang J. ABLNCPP: Attention Mechanism-Based Bidirectional Long Short-Term Memory for Noncoding RNA Coding Potential Prediction. J Chem Inf Model 2023; 63:3955-3966. [PMID: 37294848 DOI: 10.1021/acs.jcim.3c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
With the continuous development of ribosome profiling, sequencing technology, and proteomics, evidence is mounting that noncoding RNA (ncRNA) may be a novel source of peptides or proteins. These peptides and proteins play crucial roles in inhibiting tumor progression and interfering with cancer metabolism and other essential physiological processes. Therefore, identifying ncRNAs with coding potential is vital to ncRNA functional research. However, existing studies perform well in classifying ncRNAs and mRNAs, and no research has been explicitly raised to distinguish whether ncRNA transcripts have coding potential. For this reason, we propose an attention mechanism-based bidirectional LSTM network called ABLNCPP to assess the coding possibility of ncRNA sequences. Considering the sequential information loss in previous methods, we introduce a novel nonoverlapping trinucleotide embedding (NOLTE) method for ncRNAs to obtain embeddings containing sequential features. The extensive evaluations show that ABLNCPP outperforms other state-of-the-art models. In general, ABLNCPP overcomes the bottleneck of ncRNA coding potential prediction and is expected to provide valuable contributions to cancer discovery and treatment in the future. The source code and data sets are freely available at https://github.com/YinggggJ/ABLNCPP.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Ying Jiang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Xiaowen Hu
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Rongtao Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, Changsha 410018, China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China
| |
Collapse
|
40
|
Su W, Qian X, Yang K, Ding H, Huang C, Zhang Z. Recognition of outer membrane proteins using multiple feature fusion. Front Genet 2023; 14:1211020. [PMID: 37351347 PMCID: PMC10284346 DOI: 10.3389/fgene.2023.1211020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 05/24/2023] [Indexed: 06/24/2023] Open
Abstract
Introduction: Outer membrane proteins are crucial in maintaining the structural stability and permeability of the outer membrane. Outer membrane proteins exhibit several functions such as antigenicity and strong immunogenicity, which have potential applications in clinical diagnosis and disease prevention. However, wet experiments for studying OMPs are time and capital-intensive, thereby necessitating the use of computational methods for their identification. Methods: In this study, we developed a computational model to predict outer membrane proteins. The non-redundant dataset consists of a positive set of 208 outer membrane proteins and a negative set of 876 non-outer membrane proteins. In this study, we employed the pseudo amino acid composition method to extract feature vectors and subsequently utilized the support vector machine for prediction. Results and Discussion: In the Jackknife cross-validation, the overall accuracy and the area under receiver operating characteristic curve were observed to be 93.19% and 0.966, respectively. These results demonstrate that our model can produce accurate predictions, and could serve as a valuable guide for experimental research on outer membrane proteins.
Collapse
Affiliation(s)
- Wenxia Su
- College of Science, Inner Mongolia Agriculture University, Hohhot, China
| | - Xiaojun Qian
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Keli Yang
- Nonlinear Research Institute, Baoji University of Arts and Sciences, Baoji, China
| | - Hui Ding
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chengbing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Zhaoyue Zhang
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| |
Collapse
|
41
|
Lin Y, Sun M, Zhang J, Li M, Yang K, Wu C, Zulfiqar H, Lai H. Computational identification of promoters in Klebsiella aerogenes by using support vector machine. Front Microbiol 2023; 14:1200678. [PMID: 37250059 PMCID: PMC10215528 DOI: 10.3389/fmicb.2023.1200678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 04/18/2023] [Indexed: 05/31/2023] Open
Abstract
Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes.
Collapse
Affiliation(s)
- Yan Lin
- Key Laboratory for Animal Disease-Resistance Nutrition of the Ministry of Agriculture, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China
| | - Meili Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Junjie Zhang
- Key Laboratory for Animal Disease-Resistance Nutrition of the Ministry of Agriculture, Animal Nutrition Institute, Sichuan Agricultural University, Chengdu, China
| | - Mingyan Li
- Chifeng Product Quality Inspection and Testing Centre, Chifeng, China
| | - Keli Yang
- Nonlinear Research Institute, Baoji University of Arts and Sciences, Baoji, China
| | - Chengyan Wu
- Baotou Teacher’s College, Inner Mongolia University of Science and Technology, Baotou, China
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Hongyan Lai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| |
Collapse
|
42
|
Su D, Xiong Y, Wei H, Wang S, Ke J, Liang P, Zhang H, Yu Y, Zuo Y, Yang L. Integrated analysis of ovarian cancer patients from prospective transcription factor activity reveals subtypes of prognostic significance. Heliyon 2023; 9:e16147. [PMID: 37215759 PMCID: PMC10199194 DOI: 10.1016/j.heliyon.2023.e16147] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/04/2023] [Accepted: 05/07/2023] [Indexed: 05/24/2023] Open
Abstract
Transcription factors are protein molecules that act as regulators of gene expression. Aberrant protein activity of transcription factors can have a significant impact on tumor progression and metastasis in tumor patients. In this study, 868 immune-related transcription factors were identified from the transcription factor activity profile of 1823 ovarian cancer patients. The prognosis-related transcription factors were identified through univariate Cox analysis and random survival tree analysis, and two distinct clustering subtypes were subsequently derived based on these transcription factors. We assessed the clinical significance and genomics landscape of the two clustering subtypes and found statistically significant differences in prognosis, response to immunotherapy, and chemotherapy among ovarian cancer patients with different subtypes. Multi-scale Embedded Gene Co-expression Network Analysis was used to identify differential gene modules between the two clustering subtypes, which allowed us to conduct further analysis of biological pathways that exhibited significant differences between them. Finally, a ceRNA network was constructed to analyze lncRNA-miRNA-mRNA regulatory pairs with differential expression levels between two clustering subtypes. We expected that our study may provide some useful references for stratifying and treating patients with ovarian cancer.
Collapse
Affiliation(s)
- Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Jiawei Ke
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Pengfei Liang
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Haoxin Zhang
- Department of Gastrointestinal Oncology, Harbin Medical University Cancer Hospital, Harbin 150081, China
| | - Yao Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
- Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd., Hohhot, 010010, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| |
Collapse
|
43
|
Wei W, Su Y. Function of CD8 +, conventional CD4 +, and regulatory CD4 + T cell identification in lung cancer. Comput Biol Med 2023; 160:106933. [PMID: 37156220 DOI: 10.1016/j.compbiomed.2023.106933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 04/06/2023] [Accepted: 04/13/2023] [Indexed: 05/10/2023]
Abstract
Lung cancer is the malignant tumor with the highest mortality rate in the world. There is obvious heterogeneity within the tumor. Single cell sequencing technology enables scholars to obtain information about the cell type, status, subpopulation distribution and communication behavior between cells in the tumor microenvironment from the cellular level. However, due to the problem of sequencing depth, some genes with low expression cannot be detected, which results in that most of the specific genes of immune cells cannot be recognized, and lead to defects in the functional identification of immune cells. In this paper, we used single cell sequencing data of 12346 T cells in 14 treatment-naïve non-small-cell lung cancer patients to identify immune cell-specific genes and infer the function of three types of T cells. The method, named GRAPH-LC, implemented this function by gene interaction network and graph learning methods. Graph learning methods are used to extract genes feature and dense neural network is used to identify immune cell-specific genes. The experiments on 10-cross validation shows that the AUROC and AUPR reached at least 0.802, 0.815 on identifying cell-specific genes of three types of T cells. And we did functional enrichment analysis on the top 15 expressed genes. By functional enrichment analysis, we got 95 GO terms and 39 KEGG pathways that related to three types of T cells. The use of this technology will help to deeply understand the mechanism of the occurrence and development of lung cancer, find new diagnostic markers and therapeutic targets, and provide a theoretical reference for the precise treatment of lung cancer patients in the future.
Collapse
Affiliation(s)
- Wei Wei
- Department of Lung Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin Lung Cancer Center, tianjin, China
| | - Yanjun Su
- Department of Lung Cancer, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin's Clinical Research Center for Cancer, Tianjin Lung Cancer Center, tianjin, China.
| |
Collapse
|
44
|
Wang Y, Zhang Y, Wang J, Xie F, Zheng D, Zou X, Guo M, Ding Y, Wan J, Han K. Prediction of drug-target interactions via neural tangent kernel extraction feature matrix factorization model. Comput Biol Med 2023; 159:106955. [PMID: 37094465 DOI: 10.1016/j.compbiomed.2023.106955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 04/04/2023] [Accepted: 04/16/2023] [Indexed: 04/26/2023]
Abstract
Drug discovery is a complex and lengthy process that often requires years of research and development. Therefore, drug research and development require a lot of investment and resource support, as well as professional knowledge, technology, skills, and other elements. Predicting of drug-target interactions (DTIs) is an important part of drug development. If machine learning is used to predict DTIs, the cost and time of drug development can be significantly reduced. Currently, machine learning methods are widely used to predict DTIs. In this study neighborhood regularized logistic matrix factorization method based on extracted features from a neural tangent kernel (NTK) to predict DTIs. First, the potential feature matrix of drugs and targets is extracted from the NTK model, then the corresponding Laplacian matrix is constructed according to the feature matrix. Next, the Laplacian matrix of the drugs and targets is used as the condition for matrix factorization to obtain two low-dimensional matrices. Finally, the matrix of the predicted DTIs was obtained by multiplying these two low-dimensional matrices. For the four gold standard datasets, the present method is significantly better than the other methods that is compared to, indicating that the automatic feature extraction method using the deep learning model is competitive compared with the manual feature selection method.
Collapse
Affiliation(s)
- Yu Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Yu Zhang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Jianchun Wang
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Fang Xie
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Dequan Zheng
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China
| | - Xiang Zou
- Pharmaceutical Engineering Technology Research Center, Harbin University of Commerce, Harbin, 150076, China
| | - Mian Guo
- Department of Neurosurgery, The Second Affiliated Hospital of Harbin Medical University, 150086, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China.
| | - Jie Wan
- Laboratory for Space Environment and Physical Sciences, Harbin Institute of Technology, Harbin, 150001, China.
| | - Ke Han
- School of Computer and Information Engineering, Heilongjiang Provincial Key Laboratory of Electronic Commerce and Information Processing, Harbin University of Commerce, Harbin, 150028, China; Pharmaceutical Engineering Technology Research Center, Harbin University of Commerce, Harbin, 150076, China.
| |
Collapse
|
45
|
Zulfiqar H, Ahmed Z, Kissanga Grace-Mercure B, Hassan F, Zhang ZY, Liu F. Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique. Front Microbiol 2023; 14:1170785. [PMID: 37125199 PMCID: PMC10133480 DOI: 10.3389/fmicb.2023.1170785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 03/17/2023] [Indexed: 05/02/2023] Open
Abstract
Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in Agrobacterium tumefaciens (A. tumefaciens) strain C58. In the model, promotor sequences were encoded by three different kinds of feature descriptors, namely, accumulated nucleotide frequency, k-mer nucleotide composition, and binary encodings. The obtained features were optimized by using correlation and the mRMR-based algorithm. These optimized features were inputted into a random forest (RF) classifier to discriminate promotor sequences from non-promotor sequences in A. tumefaciens strain C58. The examination of 10-fold cross-validation showed that the proposed model could yield an overall accuracy of 0.837. This model will provide help for the study of promoters in A. tumefaciens C58 strain.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Bakanina Kissanga Grace-Mercure
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Farwa Hassan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fen Liu
- Department of Radiation Oncology, Peking University Cancer Hospital (Inner Mongolia Campus), Affiliated Cancer Hospital of Inner Mongolia Medical University, Inner Mongolia Cancer Hospital, Hohhot, China
| |
Collapse
|
46
|
Zhang X, Zhu W, Sun H, Ding Y, Liu L. Prediction of CTCF loop anchor based on machine learning. Front Genet 2023; 14:1181956. [PMID: 37077544 PMCID: PMC10106609 DOI: 10.3389/fgene.2023.1181956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 03/24/2023] [Indexed: 04/05/2023] Open
Abstract
Introduction: Various activities in biological cells are affected by three-dimensional genome structure. The insulators play an important role in the organization of higher-order structure. CTCF is a representative of mammalian insulators, which can produce barriers to prevent the continuous extrusion of chromatin loop. As a multifunctional protein, CTCF has tens of thousands of binding sites in the genome, but only a portion of them can be used as anchors of chromatin loops. It is still unclear how cells select the anchor in the process of chromatin looping.Methods: In this paper, a comparative analysis is performed to investigate the sequence preference and binding strength of anchor and non-anchor CTCF binding sites. Furthermore, a machine learning model based on the CTCF binding intensity and DNA sequence is proposed to predict which CTCF sites can form chromatin loop anchors.Results: The accuracy of the machine learning model that we constructed for predicting the anchor of the chromatin loop mediated by CTCF reached 0.8646. And we find that the formation of loop anchor is mainly influenced by the CTCF binding strength and binding pattern (which can be interpreted as the binding of different zinc fingers).Discussion: In conclusion, our results suggest that The CTCF core motif and it’s flanking sequence may be responsible for the binding specificity. This work contributes to understanding the mechanism of loop anchor selection and provides a reference for the prediction of CTCF-mediated chromatin loops.
Collapse
Affiliation(s)
- Xiao Zhang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- *Correspondence: Wen Zhu,
| | - Huimin Sun
- School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
47
|
Yang YH, Ma CY, Gao D, Liu XW, Yuan SS, Ding H. i2OM: Toward a better prediction of 2'-O-methylation in human RNA. Int J Biol Macromol 2023; 239:124247. [PMID: 37003392 DOI: 10.1016/j.ijbiomac.2023.124247] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/06/2023] [Accepted: 03/22/2023] [Indexed: 04/03/2023]
Abstract
2'-O-methylation (2OM) is an omnipresent post-transcriptional modification in RNAs. It is important for the regulation of RNA stability, mRNA splicing and translation, as well as innate immunity. With the increase in publicly available 2OM data, several computational tools have been developed for the identification of 2OM sites in human RNA. Unfortunately, these tools suffer from the low discriminative power of redundant features, unreasonable dataset construction or overfitting. To address those issues, based on four types of 2OM (2OM-adenine (A), cytosine (C), guanine (G), and uracil (U)) data, we developed a two-step feature selection model to identify 2OM. For each type, the one-way analysis of variance (ANOVA) combined with mutual information (MI) was proposed to rank sequence features for obtaining the optimal feature subset. Subsequently, four predictors based on eXtreme Gradient Boosting (XGBoost) or support vector machine (SVM) were presented to identify the four types of 2OM sites. Finally, the proposed model could produce an overall accuracy of 84.3 % on the independent set. To provide a convenience for users, an online tool called i2OM was constructed and can be freely access at i2om.lin-group.cn. The predictor may provide a reference for the study of the 2OM.
Collapse
Affiliation(s)
- Yu-He Yang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Cai-Yi Ma
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Dong Gao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xiao-Wei Liu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Shi-Shi Yuan
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hui Ding
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| |
Collapse
|
48
|
Malik A, Shoombuatong W, Kim CB, Manavalan B. GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features. Int J Biol Macromol 2023; 229:529-538. [PMID: 36596370 DOI: 10.1016/j.ijbiomac.2022.12.315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/19/2022] [Accepted: 12/28/2022] [Indexed: 01/02/2023]
Abstract
The cell surface proteins of gram-positive bacteria are involved in many important biological functions, including the infection of host cells. Owing to their virulent nature, these proteins are also considered strong candidates for potential drug or vaccine targets. Among the various cell surface proteins of gram-positive bacteria, LPXTG-like proteins form a major class. These proteins have a highly conserved C-terminal cell wall sorting signal, which consists of an LPXTG sequence motif, a hydrophobic domain, and a positively charged tail. These surface proteins are targeted to the cell envelope by a sortase enzyme via transpeptidation. A variety of LPXTG-like proteins have been experimentally characterized; however, their number in public databases has increased owing to extensive bacterial genome sequencing without proper annotation. In the absence of experimental characterization, identifying and annotating these sequences is extremely challenging. Therefore, in this study, we developed the first machine learning-based predictor called GPApred, which can identify LPXTG-like proteins from their primary sequences. Using a newly constructed benchmark dataset, we explored different classifiers and five feature encodings and their hybrids. Optimal features were derived using the recursive feature elimination method, and these features were then trained using a support vector machine algorithm. The performance of different models was evaluated using independent datasets, and a final model (GPApred) was selected based on consistency during cross-validation and independent assessment. GPApred can be an effective tool for predicting LPXTG-like sequences and can be further employed for functional characterization or drug targeting. Availability: https://procarb.org/gpapred/.
Collapse
Affiliation(s)
- Adeel Malik
- Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, Republic of Korea
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Chang-Bae Kim
- Department of Biotechnology, Sangmyung University, Seoul 03016, Republic of Korea.
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
49
|
Constructing discriminative feature space for LncRNA-protein interaction based on deep autoencoder and marginal fisher analysis. Comput Biol Med 2023; 157:106711. [PMID: 36924738 DOI: 10.1016/j.compbiomed.2023.106711] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/26/2023] [Accepted: 02/26/2023] [Indexed: 03/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) play important roles by regulating proteins in many biological processes and life activities. To uncover molecular mechanisms of lncRNA, it is very necessary to identify interactions of lncRNA with proteins. Recently, some machine learning methods were proposed to detect lncRNA-protein interactions according to the distribution of known interactions. The performances of these methods were largely dependent upon: (1) how exactly the distribution of known interactions was characterized by feature space; (2) how discriminative the feature space was for distinguishing lncRNA-protein interactions. Because the known interactions may be multiple and complex model, it remains a challenge to construct discriminative feature space for lncRNA-protein interactions. To resolve this problem, a novel method named DFRPI was developed based on deep autoencoder and marginal fisher analysis in this paper. Firstly, some initial features of lncRNA-protein interactions were extracted from the primary sequences and secondary structures of lncRNA and protein. Secondly, a deep autoencoder was exploited to learn encode parameters of the initial features to describe the known interactions precisely. Next, the marginal fisher analysis was employed to optimize the encode parameters of features to characterize a discriminative feature space of the lncRNA-protein interactions. Finally, a random forest-based predictor was trained on the discriminative feature space to detect lncRNA-protein interactions. Verified by a series of experiments, the results showed that our predictor achieved the precision of 0.920, recall of 0.916, accuracy of 0.918, MCC of 0.836, specificity of 0.920, sensitivity of 0.916 and AUC of 0.906 respectively, which outperforms the concerned methods for predicting lncRNA-protein interaction. It may be suggested that the proposed method can generate a reasonable and effective feature space for distinguishing lncRNA-protein interactions accurately. The code and data are available on https://github.com/D0ub1e-D/DFRPI.
Collapse
|
50
|
Cheng N, Liu J, Chen C, Zheng T, Li C, Huang J. Prediction of lung cancer metastasis by gene expression. Comput Biol Med 2023; 153:106490. [PMID: 36638618 DOI: 10.1016/j.compbiomed.2022.106490] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/14/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022]
Abstract
Tumor metastasis is the main cause of death in cancer patients. Early prediction of tumor metastasis can allow for timely intervention. At present, research on tumor metastasis mainly focuses on manual diagnosis by imaging or diagnosis by computational methods. With the deterioration of the tumor, gene expression levels in blood change greatly. It is feasible to measure the transcripts of key genes to predict whether cancer will metastasize. Therefore, in this paper, we obtained gene expression data from 226 patients from TCGA. These data included 239,322 transcripts. Background screening and LASSO analysis were used to select 31 transcripts as features. Finally, a deep neural network (DNN) was used to determine whether or not lung cancer would metastasize. We compared our methods with several other methods and found that our method achieved the best precision. In addition, in a previous study, we identified 7 genes that play a vital role in lung cancer. We added those gene transcripts into the DNN and found that the AUC and AUPR of the model were increased.
Collapse
Affiliation(s)
- Nitao Cheng
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Junliang Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Chen Chen
- Department of Biological Repositories, Zhongnan Hospital of Wuhan University, China
| | - Tang Zheng
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Changsheng Li
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Jingyu Huang
- Department of Thoracic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China.
| |
Collapse
|