1
|
Ansar Khawaja S, Alturise F, Alkhalifah T, Khan SA, Khan YD. Gluconeogenesis unraveled: A proteomic Odyssey with machine learning. Methods 2024; 232:29-42. [PMID: 39276958 DOI: 10.1016/j.ymeth.2024.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/05/2024] [Accepted: 09/01/2024] [Indexed: 09/17/2024] Open
Abstract
The metabolic pathway known as gluconeogenesis, which produces glucose from non-carbohydrate substrates, is essential for maintaining balanced blood sugar levels while fasting. It's extremely important to anticipate gluconeogenesis rates accurately to recognize metabolic disorders and create efficient treatment strategies. The implementation of deep learning and machine learning methods to forecast complex biological processes has been gaining popularity in recent years. The recognition of both the regulation of the pathway and possible therapeutic applications of proteins depends on accurate identification associated with their gluconeogenesis patterns. This article analyzes the uses of machine learning and deep learning models, to predict gluconeogenesis efficiency. The study also discusses the challenges that come with restricted data availability and model interpretability, as well as possible applications in personalized healthcare, metabolic disease treatment, and the discovery of drugs. The predictor utilizes statistics moments on the structures of gluconeogenesis and their enzymes, while Random Forest is utilized as a classifier to ensure the accuracy of this model in identifying the best outcomes. The method was validated utilizing the independent test, self-consistency, 10k fold cross-validations, and jackknife test which achieved 92.33 %, 91.87%, 87.88%, and 87.02%. An accurate prediction of gluconeogenesis has significant implications for understanding metabolic disorders and developing targeted therapies. This study contributes to the rising field of predictive biology by mixing algorithms for deep learning, and machine learning, with metabolic pathways.
Collapse
Affiliation(s)
- Seher Ansar Khawaja
- Department of Computer Science, University of Management and Technology, Lahore, Paksistan
| | - Fahad Alturise
- Department of Cybersecurity, College of Computer, Qassim University, Buraydah, Saudi Arabia.
| | - Tamim Alkhalifah
- Deparment of Computer Engineering, College of Computer, Qassim University, Buraydah, Saudi Arabia.
| | - Sher Afzal Khan
- Deparment of Computer Sciences, Abdul Wali Khan University, Mardan, Pakistan.
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Paksistan.
| |
Collapse
|
2
|
Feng C, Wu J, Wei H, Xu L, Zou Q. CRCF: A Method of Identifying Secretory Proteins of Malaria Parasites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2149-2157. [PMID: 34061749 DOI: 10.1109/tcbb.2021.3085589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Malaria is a mosquito-borne disease that results in millions of cases and deaths annually. The development of a fast computational method that identifies secretory proteins of the malaria parasite is important for research on antimalarial drugs and vaccines. Thus, a method was developed to identify the secretory proteins of malaria parasites. In this method, a reduced alphabet was selected to recode the original protein sequence. A feature synthesis method was used to synthesise three different types of feature information. Finally, the random forest method was used as a classifier to identify the secretory proteins. In addition, a web server was developed to share the proposed algorithm. Experiments using the benchmark dataset demonstrated that the overall accuracy achieved by the proposed method was greater than 97.8 percent using the 10-fold cross-validation method. Furthermore, the reduced schemes and characteristic performance analyses are discussed.
Collapse
|
3
|
Bonidia RP, Domingues DS, Sanches DS, de Carvalho ACPLF. MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors. Brief Bioinform 2022; 23:bbab434. [PMID: 34750626 PMCID: PMC8769707 DOI: 10.1093/bib/bbab434] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/18/2021] [Accepted: 09/20/2021] [Indexed: 12/24/2022] Open
Abstract
One of the main challenges in applying machine learning algorithms to biological sequence data is how to numerically represent a sequence in a numeric input vector. Feature extraction techniques capable of extracting numerical information from biological sequences have been reported in the literature. However, many of these techniques are not available in existing packages, such as mathematical descriptors. This paper presents a new package, MathFeature, which implements mathematical descriptors able to extract relevant numerical information from biological sequences, i.e. DNA, RNA and proteins (prediction of structural features along the primary sequence of amino acids). MathFeature makes available 20 numerical feature extraction descriptors based on approaches found in the literature, e.g. multiple numeric mappings, genomic signal processing, chaos game theory, entropy and complex networks. MathFeature also allows the extraction of alternative features, complementing the existing packages. To ensure that our descriptors are robust and to assess their relevance, experimental results are presented in nine case studies. According to these results, the features extracted by MathFeature showed high performance (0.6350-0.9897, accuracy), both applying only mathematical descriptors, but also hybridization with well-known descriptors in the literature. Finally, through MathFeature, we overcame several studies in eight benchmark datasets, exemplifying the robustness and viability of the proposed package. MathFeature has advanced in the area by bringing descriptors not available in other packages, as well as allowing non-experts to use feature extraction techniques.
Collapse
Affiliation(s)
- Robson P Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Douglas S Domingues
- Group of Genomics and Transcriptomes in Plants, Institute of Biosciences, São Paulo State University (UNESP), Rio Claro 13506-900, Brazil
| | - Danilo S Sanches
- Department of Computer Science, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| |
Collapse
|
4
|
Prediction for understanding the effectiveness of antiviral peptides. Comput Biol Chem 2021; 95:107588. [PMID: 34655913 DOI: 10.1016/j.compbiolchem.2021.107588] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 10/01/2021] [Accepted: 10/02/2021] [Indexed: 11/20/2022]
Abstract
The low efficacy of current antivirals in conjunction with the resistance of viruses against existing antiviral drugs has resulted in the demand for the development of novel antiviral agents. Antiviral peptides (AVPs) are those bioactive peptides having virucidal activity and they can be developed into promising antiviral drugs. They are shorter length peptides having the ability to cease the progression of viral infections. The use of antiviral peptides in therapeutics has recently attracted the attention of the research community. The development and identification of AVPs is imperative for the discovery of novel therapeutics for viral infections. In the present work, a meta classifier (stacking) based approach is implemented for the prediction of IC50 (half maximal inhibitory concentration) and pIC50 (negative log of half maximal inhibitory concentration) values. The best prediction model with evolutionary information and local alignment scores as features achieved a correlation coefficient values of 0.670 and 0.753 on the training and testing sets respectively for IC50. Further, the prediction of pIC50 reached a correlation coefficient value of 0.797 and 0.789 for training and testing sets respectively. For the development of machine learning models involved in the prediction of IC50, the use of pIC50 over IC50 is recommended as the target variable. Further on a systematic comparison of AVPs with high IC50 values and Low IC50 values, it is revealed that higher mean charge and tiny amino acids are preferred and higher length and consecutive hydrophilic amino acids are avoided in the former.
Collapse
|
5
|
Perpetuo L, Klein J, Ferreira R, Guedes S, Amado F, Leite-Moreira A, Silva AMS, Thongboonkerd V, Vitorino R. How can artificial intelligence be used for peptidomics? Expert Rev Proteomics 2021; 18:527-556. [PMID: 34343059 DOI: 10.1080/14789450.2021.1962303] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
INTRODUCTION Peptidomics is an emerging field of omics sciences using advanced isolation, analysis, and computational techniques that enable qualitative and quantitative analyses of various peptides in biological samples. Peptides can act as useful biomarkers and as therapeutic molecules for diseases. AREAS COVERED The use of therapeutic peptides can be predicted quickly and efficiently using data-driven computational methods, particularly artificial intelligence (AI) approach. Various AI approaches are useful for peptide-based drug discovery, such as support vector machine, random forest, extremely randomized trees, and other more recently developed deep learning methods. AI methods are relatively new to the development of peptide-based therapies, but these techniques already become essential tools in protein science by dissecting novel therapeutic peptides and their functions (Figure 1).[Figure: see text]. EXPERT OPINION Researchers have shown that AI models can facilitate the development of peptidomics and selective peptide therapies in the field of peptide science. Biopeptide prediction is important for the discovery and development of successful peptide-based drugs. Due to their ability to predict therapeutic roles based on sequence details, many AI-dependent prediction tools have been developed (Figure 1).
Collapse
Affiliation(s)
- Luís Perpetuo
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1297, Institute of Cardiovascular and Metabolic Disease, Université Toulouse III, Toulouse, France
| | - Rita Ferreira
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Sofia Guedes
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Francisco Amado
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Adelino Leite-Moreira
- UnIC, Departamento de Cirurgia e Fisiologia, Faculdade de Medicina da Universidade do Porto, Porto
| | - Artur M S Silva
- LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro
| | - Visith Thongboonkerd
- Medical Proteomics Unit, Office for Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand
| | - Rui Vitorino
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro.,LAQV/REQUIMTE, Department of Chemistry, University of Aveiro, Aveiro.,UnIC, Departamento de Cirurgia e Fisiologia, Faculdade de Medicina da Universidade do Porto, Porto
| |
Collapse
|
6
|
Bartas M, Červeň J, Guziurová S, Slychko K, Pečinka P. Amino Acid Composition in Various Types of Nucleic Acid-Binding Proteins. Int J Mol Sci 2021; 22:ijms22020922. [PMID: 33477647 PMCID: PMC7831508 DOI: 10.3390/ijms22020922] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 01/15/2021] [Accepted: 01/16/2021] [Indexed: 12/20/2022] Open
Abstract
Nucleic acid-binding proteins are traditionally divided into two categories: With the ability to bind DNA or RNA. In the light of new knowledge, such categorizing should be overcome because a large proportion of proteins can bind both DNA and RNA. Another even more important features of nucleic acid-binding proteins are so-called sequence or structure specificities. Proteins able to bind nucleic acids in a sequence-specific manner usually contain one or more of the well-defined structural motifs (zinc-fingers, leucine zipper, helix-turn-helix, or helix-loop-helix). In contrast, many proteins do not recognize nucleic acid sequence but rather local DNA or RNA structures (G-quadruplexes, i-motifs, triplexes, cruciforms, left-handed DNA/RNA form, and others). Finally, there are also proteins recognizing both sequence and local structural properties of nucleic acids (e.g., famous tumor suppressor p53). In this mini-review, we aim to summarize current knowledge about the amino acid composition of various types of nucleic acid-binding proteins with a special focus on significant enrichment and/or depletion in each category.
Collapse
|
7
|
Vishnoi S, Matre H, Garg P, Pandey SK. Artificial intelligence and machine learning for protein toxicity prediction using proteomics data. Chem Biol Drug Des 2020; 96:902-920. [DOI: 10.1111/cbdd.13701] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 04/23/2020] [Accepted: 04/26/2020] [Indexed: 12/13/2022]
Affiliation(s)
- Shubham Vishnoi
- Department of Physics, Bernal Institute University of Limerick Limerick Ireland
| | - Himani Matre
- Department of Biotechnology National Institute of Pharmaceutical Education and Research S.A.S. Nagar India
| | - Prabha Garg
- Department of Pharmacoinformatics National Institute of Pharmaceutical Education and Research Mohali India
| | - Shubham Kumar Pandey
- Department of Pharmacoinformatics National Institute of Pharmaceutical Education and Research Mohali India
| |
Collapse
|
8
|
Abstract
During the last three decades or so, many efforts have been made to study the protein cleavage
sites by some disease-causing enzyme, such as HIV (Human Immunodeficiency Virus) protease
and SARS (Severe Acute Respiratory Syndrome) coronavirus main proteinase. It has become increasingly
clear <i>via</i> this mini-review that the motivation driving the aforementioned studies is quite wise,
and that the results acquired through these studies are very rewarding, particularly for developing peptide
drugs.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
9
|
Some illuminating remarks on molecular genetics and genomics as well as drug development. Mol Genet Genomics 2020; 295:261-274. [PMID: 31894399 DOI: 10.1007/s00438-019-01634-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023]
Abstract
Facing the explosive growth of biological sequences unearthed in the post-genomic age, one of the most important but also most difficult problems in computational biology is how to express a biological sequence with a discrete model or a vector, but still keep it with considerable sequence-order information or its special pattern. To deal with such a challenging problem, the ideas of "pseudo amino acid components" and "pseudo K-tuple nucleotide composition" have been proposed. The ideas and their approaches have further stimulated the birth for "distorted key theory", "wenxing diagram", and substantially strengthening the power in treating the multi-label systems, as well as the establishment of the famous "5-steps rule". All these logic developments are quite natural that are very useful not only for theoretical scientists but also for experimental scientists in conducting genetics/genomics analysis and drug development. Presented in this review paper are also their future perspectives; i.e., their impacts will become even more significant and propounding.
Collapse
|
10
|
Shao YT, Liu XX, Lu Z, Chou KC. pLoc_Deep-mHum: Predict Subcellular Localization of Human Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.127042] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
11
|
Shao Y, Chou KC. pLoc_Deep-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.126034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|