1
|
Hassan MT, Tayara H, Chong KT. Possum: identification and interpretation of potassium ion inhibitors using probabilistic feature vectors. Arch Toxicol 2025; 99:225-235. [PMID: 39438319 DOI: 10.1007/s00204-024-03888-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024]
Abstract
The flow of potassium ions through cell membranes plays a crucial role in facilitating various cell processes such as hormone secretion, epithelial function, maintenance of electrochemical gradients, and electrical impulse formation. Potassium ion inhibitors are considered promising alternatives in treating cancer, muscle weakness, renal dysfunction, endocrine disorders, impaired cellular function, and cardiac arrhythmia. Thus, it becomes essential to identify and understand potassium ion inhibitors in order to regulate the ion flow across ion channels. In this study, we created a meta-model, POSSUM, for the identification of potassium ion inhibitors. Two distinct datasets were used for training, testing, and evaluation of the meta-model. We employed seven feature descriptors and five distinctive classifiers to construct 35 baseline models. We used the mean Gini index score to select the optimal base models and classifiers. The POSSUM method was trained on the optimal probabilistic feature vectors. The proposed optimal model, POSSUM, outperforms the baseline models and the existing methods on both datasets. We anticipate POSSUM will be a very useful tool and will be essential in the process of finding and screening possible potassium ion inhibitors.
Collapse
Affiliation(s)
- Mir Tanveerul Hassan
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, Jeollabuk-do, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, Jeollabuk-do, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, Jeollabuk-do, South Korea.
- Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju, 54896, Jeollabuk-do, South Korea.
| |
Collapse
|
2
|
Wang Z, Yuan H, Yan J, Liu J. Identification, characterization, and design of plant genome sequences using deep learning. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2025; 121:e17190. [PMID: 39666835 DOI: 10.1111/tpj.17190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 11/11/2024] [Accepted: 11/23/2024] [Indexed: 12/14/2024]
Abstract
Due to its excellent performance in processing large amounts of data and capturing complex non-linear relationships, deep learning has been widely applied in many fields of plant biology. Here we first review the application of deep learning in analyzing genome sequences to predict gene expression, chromatin interactions, and epigenetic features (open chromatin, transcription factor binding sites, and methylation sites) in plants. Then, current motif mining and functional component design and synthesis based on generative adversarial networks, large models, and attention mechanisms are elaborated in detail. The progress of protein structure and function prediction, genomic prediction, and large model applications based on deep learning is also discussed. Finally, this work provides prospects for the future development of deep learning in plants with regard to multiple omics data, algorithm optimization, large language models, sequence design, and intelligent breeding.
Collapse
Affiliation(s)
- Zhenye Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hao Yuan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Hongshan Laboratory, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, 430070, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Hongshan Laboratory, Wuhan, 430070, China
| |
Collapse
|
3
|
Park S, To Chong K, Tayara H. CpGFuse: a holistic approach for accurate identification of methylation states of DNA CpG sites. Brief Bioinform 2024; 26:bbaf063. [PMID: 39968737 PMCID: PMC11836533 DOI: 10.1093/bib/bbaf063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 12/27/2024] [Accepted: 02/07/2025] [Indexed: 02/20/2025] Open
Abstract
Anomalous DNA methylation has wide-ranging implications, spanning from neurological disorders to cancer and cardiovascular complications. Current methods for single-cell DNA methylation analysis face limitations in coverage, leading to information loss and hampering our understanding of disease associations. The primary goal of this study is the imputation of CpG site methylation states in a given cell by leveraging the CpG states of other cells of the same type. To address this, we introduce CpGFuse, a novel methodology that combines information from diverse genomic features. Leveraging two benchmark datasets, we employed a careful preprocessing approach and conducted a comprehensive ablation study to assess the individual and collective contributions of DNA sequence, intercellular, and intracellular features. Our proposed model, CpGFuse, employs a convolutional neural network with an attention mechanism, surpassing existing models across HCCs and HepG2 datasets. The results highlight the effectiveness of our approach in enhancing accuracy and providing a robust tool for CpG site prediction in genomics. CpGFuse's success underscores the importance of integrating multiple genomic features for accurate identification of methylation states of CpG site.
Collapse
Affiliation(s)
- Sehi Park
- Department of Electronics and Information Engineering, Jeonbuk National University, Baekje-daero, Deokjin-gu, Jeonju 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Baekje-daero, Deokjin-gu, Jeonju 54896, South Korea
- Advances Electronics and Information Research Center, Jeonbuk National University, Baekje-daero, Deokjin-gu, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of international Engineering and Science, Jeonbuk National University, Baekje-daero, Deokjin-gu, Jeonju 54896, South Korea
| |
Collapse
|
4
|
Hassan MT, Tayara H, Chong KT. NaII-Pred: An ensemble-learning framework for the identification and interpretation of sodium ion inhibitors by fusing multiple feature representation. Comput Biol Med 2024; 178:108737. [PMID: 38879934 DOI: 10.1016/j.compbiomed.2024.108737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/21/2024] [Accepted: 06/08/2024] [Indexed: 06/18/2024]
Abstract
High-affinity ligand peptides for ion channels are essential for controlling the flow of ions across the plasma membrane. These peptides are now being investigated as possible therapeutic possibilities for a variety of illnesses, including cancer and cardiovascular disease. So, the identification and interpretation of ligand peptide inhibitors to control ion flow across cells become pivotal for exploration. In this work, we developed an ensemble-based model, NaII-Pred, for the identification of sodium ion inhibitors. The ensemble model was trained, tested, and evaluated on three different datasets. The NaII-Pred method employs six different descriptors and a hybrid feature set in conjunction with five conventional machine learning classifiers to create 35 baseline models. Through an ensemble approach, the top five baseline models trained on the hybrid feature set were integrated to yield the final predictive model, NaII-Pred. Our proposed model, NaII-Pred, outperforms the baseline models and the current predictors on both datasets. We believe NaII-Pred will play a critical role in screening and identifying potential sodium ion inhibitors and will be an invaluable tool.
Collapse
Affiliation(s)
- Mir Tanveerul Hassan
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea; Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
5
|
Khan A, Kandel J, Tayara H, Chong KT. Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods. Mol Inform 2024; 43:e202300217. [PMID: 38050743 DOI: 10.1002/minf.202300217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 11/06/2023] [Accepted: 12/03/2023] [Indexed: 12/06/2023]
Abstract
Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for the accurate and rapid prediction of perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available at GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.
Collapse
Affiliation(s)
- Asad Khan
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Jeevan Kandel
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju, 54896, South Korea
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| |
Collapse
|
6
|
Chen K, Zhu X, Wang J, Zhao Z, Hao L, Guo X, Liu Y. MFPred: prediction of ncRNA families based on multi-feature fusion. Brief Bioinform 2023; 24:bbad303. [PMID: 37615358 DOI: 10.1093/bib/bbad303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/30/2023] [Accepted: 07/31/2023] [Indexed: 08/25/2023] Open
Abstract
Non-coding RNA (ncRNA) plays a critical role in biology. ncRNAs from the same family usually have similar functions, as a result, it is essential to predict ncRNA families before identifying their functions. There are two primary methods for predicting ncRNA families, namely, traditional biological methods and computational methods. In traditional biological methods, a lot of manpower and resources are required to predict ncRNA families. Therefore, this paper proposed a new ncRNA family prediction method called MFPred based on computational methods. MFPred identified ncRNA families by extracting sequence features of ncRNAs, and it possessed three primary modules, including (1) four ncRNA sequences encoding and feature extraction module, which encoded ncRNA sequences and extracted four different features of ncRNA sequences, (2) dynamic Bi_GRU and feature fusion module, which extracted contextual information features of the ncRNA sequence and (3) ResNet_SE module that extracted local information features of the ncRNA sequence. In this study, MFPred was compared with the previously proposed ncRNA family prediction methods using two frequently used public ncRNA datasets, NCY and nRC. The results showed that MFPred outperformed other prediction methods in the two datasets.
Collapse
Affiliation(s)
- Kai Chen
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
| | - Xiaodong Zhu
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
- College of Computer Science and Technology, jilin University, Changchun, 130012, China
| | - Jiahao Wang
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
| | - Ziqi Zhao
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
| | - Lei Hao
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
| | - Xinsheng Guo
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
- College of Computer Science and Technology, jilin University, Changchun, 130012, China
| | - Yuanning Liu
- College of Software, jilin University, Changchun, 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, jilin University, Changchun, 130012, China
- College of Computer Science and Technology, jilin University, Changchun, 130012, China
| |
Collapse
|
7
|
Rehman S, Ahmad Z, Ramakrishnan M, Kalendar R, Zhuge Q. Regulation of plant epigenetic memory in response to cold and heat stress: towards climate resilient agriculture. Funct Integr Genomics 2023; 23:298. [PMID: 37700098 DOI: 10.1007/s10142-023-01219-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 08/18/2023] [Accepted: 08/23/2023] [Indexed: 09/14/2023]
Abstract
Plants have evolved to adapt and grow in hot and cold climatic conditions. Some also adapt to daily and seasonal temperature changes. Epigenetic modifications play an important role in regulating plant tolerance under such conditions. DNA methylation and post-translational modifications of histone proteins influence gene expression during plant developmental stages and under stress conditions, including cold and heat stress. While short-term modifications are common, some modifications may persist and result in stress memory that can be inherited by subsequent generations. Understanding the mechanisms of epigenomes responding to stress and the factors that trigger stress memory is crucial for developing climate-resilient agriculture, but such an integrated view is currently limited. This review focuses on the plant epigenetic stress memory during cold and heat stress. It also discusses the potential of machine learning to modify stress memory through epigenetics to develop climate-resilient crops.
Collapse
Affiliation(s)
- Shamsur Rehman
- Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Forest Genetics and Biotechnology, College of Biology and the Environment, Nanjing Forestry University, Ministry of Education, Nanjing, China
| | - Zishan Ahmad
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, 210037, China
- Bamboo Research Institute, Nanjing Forestry University, Nanjing, 210037, China
| | - Muthusamy Ramakrishnan
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, 210037, China
- Bamboo Research Institute, Nanjing Forestry University, Nanjing, 210037, China
| | - Ruslan Kalendar
- Helsinki Institute of Life Science HiLIFE, Biocenter 3, Viikinkaari 1, FI-00014 University of Helsinki, Helsinki, Finland.
- Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan.
| | - Qiang Zhuge
- Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Forest Genetics and Biotechnology, College of Biology and the Environment, Nanjing Forestry University, Ministry of Education, Nanjing, China.
| |
Collapse
|
8
|
Alakuş TB. A Novel Repetition Frequency-Based DNA Encoding Scheme to Predict Human and Mouse DNA Enhancers with Deep Learning. Biomimetics (Basel) 2023; 8:218. [PMID: 37366813 DOI: 10.3390/biomimetics8020218] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 05/18/2023] [Accepted: 05/22/2023] [Indexed: 06/28/2023] Open
Abstract
Recent studies have shown that DNA enhancers have an important role in the regulation of gene expression. They are responsible for different important biological elements and processes such as development, homeostasis, and embryogenesis. However, experimental prediction of these DNA enhancers is time-consuming and costly as it requires laboratory work. Therefore, researchers started to look for alternative ways and started to apply computation-based deep learning algorithms to this field. Yet, the inconsistency and unsuccessful prediction performance of computational-based approaches among various cell lines led to the investigation of these approaches as well. Therefore, in this study, a novel DNA encoding scheme was proposed, and solutions were sought to the problems mentioned and DNA enhancers were predicted with BiLSTM. The study consisted of four different stages for two scenarios. In the first stage, DNA enhancer data were obtained. In the second stage, DNA sequences were converted to numerical representations by both the proposed encoding scheme and various DNA encoding schemes including EIIP, integer number, and atomic number. In the third stage, the BiLSTM model was designed, and the data were classified. In the final stage, the performance of DNA encoding schemes was determined by accuracy, precision, recall, F1-score, CSI, MCC, G-mean, Kappa coefficient, and AUC scores. In the first scenario, it was determined whether the DNA enhancers belonged to humans or mice. As a result of the prediction process, the highest performance was achieved with the proposed DNA encoding scheme, and an accuracy of 92.16% and an AUC score of 0.85 were calculated, respectively. The closest accuracy score to the proposed scheme was obtained with the EIIP DNA encoding scheme and the result was observed as 89.14%. The AUC score of this scheme was measured as 0.87. Among the remaining DNA encoding schemes, the atomic number showed an accuracy score of 86.61%, while this rate decreased to 76.96% with the integer scheme. The AUC values of these schemes were 0.84 and 0.82, respectively. In the second scenario, it was determined whether there was a DNA enhancer and, if so, it was decided to which species this enhancer belonged. In this scenario, the highest accuracy score was obtained with the proposed DNA encoding scheme and the result was 84.59%. Moreover, the AUC score of the proposed scheme was determined as 0.92. EIIP and integer DNA encoding schemes showed accuracy scores of 77.80% and 73.68%, respectively, while their AUC scores were close to 0.90. The most ineffective prediction was performed with the atomic number and the accuracy score of this scheme was calculated as 68.27%. Finally, the AUC score of this scheme was 0.81. At the end of the study, it was observed that the proposed DNA encoding scheme was successful and effective in predicting DNA enhancers.
Collapse
Affiliation(s)
- Talha Burak Alakuş
- Department of Software Engineering, Faculty of Engineering, Kırklareli University, 39100 Kırklareli, Turkey
| |
Collapse
|
9
|
Nguyen-Vo TH, Trinh QH, Nguyen L, Nguyen-Hoang PU, Rahardja S, Nguyen BP. i4mC-GRU: Identifying DNA N 4-Methylcytosine sites in mouse genomes using bidirectional gated recurrent unit and sequence-embedded features. Comput Struct Biotechnol J 2023; 21:3045-3053. [PMID: 37273848 PMCID: PMC10238585 DOI: 10.1016/j.csbj.2023.05.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 05/12/2023] [Accepted: 05/12/2023] [Indexed: 06/06/2023] Open
Abstract
N4-methylcytosine (4mC) is one of the most common DNA methylation modifications found in both prokaryotic and eukaryotic genomes. Since the 4mC has various essential biological roles, determining its location helps reveal unexplored physiological and pathological pathways. In this study, we propose an effective computational method called i4mC-GRU using a gated recurrent unit and duplet sequence-embedded features to predict potential 4mC sites in mouse (Mus musculus) genomes. To fairly assess the performance of the model, we compared our method with several state-of-the-art methods using two different benchmark datasets. Our results showed that i4mC-GRU achieved area under the receiver operating characteristic curve values of 0.97 and 0.89 and area under the precision-recall curve values of 0.98 and 0.90 on the first and second benchmark datasets, respectively. Briefly, our method outperformed existing methods in predicting 4mC sites in mouse genomes. Also, we deployed i4mC-GRU as an online web server, supporting users in genomics studies.
Collapse
Affiliation(s)
- Thanh-Hoang Nguyen-Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand
- School of Innovation, Design and Technology, Wellington Institute of Technology, Wellington 5012, New Zealand
| | - Quang H. Trinh
- School of Information and Communication Technology, Hanoi University of Science and Technology, Hanoi 100000, Vietnam
| | - Loc Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Phuong-Uyen Nguyen-Hoang
- Computational Biology Center, International University - VNU HCMC, Ho Chi Minh City 700000, Vietnam
| | - Susanto Rahardja
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China
- Infocomm Technology Cluster, Singapore Institute of Technology, Singapore 138683, Singapore
| | - Binh P. Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington 6140, New Zealand
| |
Collapse
|
10
|
Shujaat M, Kim H, Tayara H, Chong KT. iProm-Sigma54: A CNN Base Prediction Tool for σ54 Promoters. Cells 2023; 12:cells12060829. [PMID: 36980170 PMCID: PMC10047130 DOI: 10.3390/cells12060829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/11/2023] Open
Abstract
The sigma (σ) factor of RNA holoenzymes is essential for identifying and binding to promoter regions during gene transcription in prokaryotes. σ54 promoters carried out various ancillary methods and environmentally responsive procedures; therefore, it is crucial to accurately identify σ54 promoter sequences to comprehend the underlying process of gene regulation. Herein, we come up with a convolutional neural network (CNN) based prediction tool named “iProm-Sigma54” for the prediction of σ54 promoters. The CNN consists of two one-dimensional convolutional layers, which are followed by max pooling layers and dropout layers. A one-hot encoding scheme was used to extract the input matrix. To determine the prediction performance of iProm-Sigma54, we employed four assessment metrics and five-fold cross-validation; performance was measured using a benchmark and test dataset. According to the findings of this comparison, iProm-Sigma54 outperformed existing methodologies for identifying σ54 promoters. Additionally, a publicly accessible web server was constructed.
Collapse
Affiliation(s)
- Muhammad Shujaat
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Hoonjoo Kim
- School of Pharmacy, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Correspondence: (H.K.); (K.T.C.)
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Correspondence: (H.K.); (K.T.C.)
| |
Collapse
|
11
|
Ahmad W, Tayara H, Chong KT. Attention-Based Graph Neural Network for Molecular Solubility Prediction. ACS OMEGA 2023; 8:3236-3244. [PMID: 36713733 PMCID: PMC9878542 DOI: 10.1021/acsomega.2c06702] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]
Abstract
Drug discovery (DD) research is aimed at the discovery of new medications. Solubility is an important physicochemical property in drug development. Active pharmaceutical ingredients (APIs) are essential substances for high drug efficacy. During DD research, aqueous solubility (AS) is a key physicochemical attribute required for API characterization. High-precision in silico solubility prediction reduces the experimental cost and time of drug development. Several artificial tools have been employed for solubility prediction using machine learning and deep learning techniques. This study aims to create different deep learning models that can predict the solubility of a wide range of molecules using the largest currently available solubility data set. Simplified molecular-input line-entry system (SMILES) strings were used as molecular representation, models developed using simple graph convolution, graph isomorphism network, graph attention network, and AttentiveFP network. Based on the performance of the models, the AttentiveFP-based network model was finally selected. The model was trained and tested on 9943 compounds. The model outperformed on 62 anticancer compounds with metric Pearson correlation R 2 and root-mean-square error values of 0.52 and 0.61, respectively. AS can be improved by graph algorithm improvement or more molecular properties addition.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
| | - Hilal Tayara
- School
of International Engineering and Science, Jeonbuk National University, Jeonju54896, South Korea
| | - Kil To Chong
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
- Advanced
Electronics and Information Research Center, Jeonbuk National University, Jeonju54896, South Korea
| |
Collapse
|
12
|
Nabeel Asim M, Ali Ibrahim M, Fazeel A, Dengel A, Ahmed S. DNA-MP: a generalized DNA modifications predictor for multiple species based on powerful sequence encoding method. Brief Bioinform 2023; 24:6931721. [PMID: 36528802 DOI: 10.1093/bib/bbac546] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/06/2022] [Accepted: 11/12/2022] [Indexed: 12/23/2022] Open
Abstract
Accurate prediction of deoxyribonucleic acid (DNA) modifications is essential to explore and discern the process of cell differentiation, gene expression and epigenetic regulation. Several computational approaches have been proposed for particular type-specific DNA modification prediction. Two recent generalized computational predictors are capable of detecting three different types of DNA modifications; however, type-specific and generalized modifications predictors produce limited performance across multiple species mainly due to the use of ineffective sequence encoding methods. The paper in hand presents a generalized computational approach "DNA-MP" that is competent to more precisely predict three different DNA modifications across multiple species. Proposed DNA-MP approach makes use of a powerful encoding method "position specific nucleotides occurrence based 117 on modification and non-modification class densities normalized difference" (POCD-ND) to generate the statistical representations of DNA sequences and a deep forest classifier for modifications prediction. POCD-ND encoder generates statistical representations by extracting position specific distributional information of nucleotides in the DNA sequences. We perform a comprehensive intrinsic and extrinsic evaluation of the proposed encoder and compare its performance with 32 most widely used encoding methods on $17$ benchmark DNA modifications prediction datasets of $12$ different species using $10$ different machine learning classifiers. Overall, with all classifiers, the proposed POCD-ND encoder outperforms existing $32$ different encoders. Furthermore, combinedly over 5-fold cross validation benchmark datasets and independent test sets, proposed DNA-MP predictor outperforms state-of-the-art type-specific and generalized modifications predictors by an average accuracy of 7% across 4mc datasets, 1.35% across 5hmc datasets and 10% for 6ma datasets. To facilitate the scientific community, the DNA-MP web application is available at https://sds_genetic_analysis.opendfki.de/DNA_Modifications/.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany.,German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany.,German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Ahtisham Fazeel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany.,German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern 67663, Germany.,German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern 67663, Germany
| |
Collapse
|
13
|
Yu L, Zhang Y, Xue L, Liu F, Chen Q, Luo J, Jing R. Systematic Analysis and Accurate Identification of DNA N4-Methylcytosine Sites by Deep Learning. Front Microbiol 2022; 13:843425. [PMID: 35401453 PMCID: PMC8989013 DOI: 10.3389/fmicb.2022.843425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Accepted: 02/21/2022] [Indexed: 11/13/2022] Open
Abstract
DNA N4-methylcytosine (4mC) is a pivotal epigenetic modification that plays an essential role in DNA replication, repair, expression and differentiation. To gain insight into the biological functions of 4mC, it is critical to identify their modification sites in the genomics. Recently, deep learning has become increasingly popular in recent years and frequently employed for the 4mC site identification. However, a systematic analysis of how to build predictive models using deep learning techniques is still lacking. In this work, we first summarized all existing deep learning-based predictors and systematically analyzed their models, features and datasets, etc. Then, using a typical standard dataset with three species (A. thaliana, C. elegans, and D. melanogaster), we assessed the contribution of different model architectures, encoding methods and the attention mechanism in establishing a deep learning-based model for the 4mC site prediction. After a series of optimizations, convolutional-recurrent neural network architecture using the one-hot encoding and attention mechanism achieved the best overall prediction performance. Extensive comparison experiments were conducted based on the same dataset. This work will be helpful for researchers who would like to build the 4mC prediction models using deep learning in the future.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Yonglin Zhang
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Qi Chen
- Department of Endocrinology and Metabolism, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China.,Department of Pharmacy, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, China
| |
Collapse
|
14
|
Recognition of mRNA N4 Acetylcytidine (ac4C) by Using Non-Deep vs. Deep Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12031344] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Deep learning models have been successfully applied in a wide range of fields. The creation of a deep learning framework for analyzing high-performance sequence data have piqued the research community’s interest. N4 acetylcytidine (ac4C) is a post-transcriptional modification in mRNA, is an mRNA component that plays an important role in mRNA stability control and translation. The ac4C method of mRNA changes is still not simple, time consuming, or cost effective for conventional laboratory experiments. As a result, we developed DL-ac4C, a CNN-based deep learning model for ac4C recognition. In the alternative scenario, the model families are well-suited to working in large datasets with a large number of available samples, especially in biological domains. In this study, the DL-ac4C method (deep learning) is compared to non-deep learning (machine learning) methods, regression, and support vector machine. The results show that DL-ac4C is more advanced than previously used approaches. The proposed model improves the accuracy recall area by 9.6 percent and 9.8 percent, respectively, for cross-validation and independent tests. More nuanced methods of incorporating prior bio-logical knowledge into the estimation procedure of deep learning models are required to achieve better results in terms of predictive efficiency and cost-effectiveness. Based on an experiment’s acetylated dataset, the DL-ac4C sequence-based predictor for acetylation sites in mRNA can predict whether query sequences have potential acetylation motifs.
Collapse
|