1
|
Qi R, Zou Q. Special Protein or RNA Molecules Computational Identification. Int J Mol Sci 2023; 24:11312. [PMID: 37511072 PMCID: PMC10379736 DOI: 10.3390/ijms241411312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 06/15/2023] [Accepted: 07/04/2023] [Indexed: 07/30/2023] Open
Abstract
The identification of special protein or RNA molecules via computational methods is of great importance in understanding their biological functions and developing new treatments for diseases [...].
Collapse
Affiliation(s)
- Ren Qi
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
2
|
Aslam S, Rehman HM, Sarwar MZ, Ahmad A, Ahmed N, Amirzada MI, Rehman HM, Yasmin H, Nadeem T, Bashir H. Computational Modeling, High-Level Soluble Expression and In Vitro Cytotoxicity Assessment of Recombinant Pseudomonas aeruginosa Azurin: A Promising Anti-Cancer Therapeutic Candidate. Pharmaceutics 2023; 15:1825. [PMID: 37514012 PMCID: PMC10383417 DOI: 10.3390/pharmaceutics15071825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 05/31/2023] [Accepted: 06/13/2023] [Indexed: 07/30/2023] Open
Abstract
Azurin is a natural protein produced by Pseudomonas aeruginosa that exhibits potential anti-tumor, anti-HIV, and anti-parasitic properties. The current study aimed to investigate the potential of azurin protein against breast cancer using both in silico and in vitro analyses. The amino acid sequence of Azurin was used to predict its secondary and tertiary structures, along with its physicochemical properties, using online software. The resulting structure was validated and confirmed using Ramachandran plots and ERRAT2. The mature azurin protein comprises 128 amino acids, and the top-ranked structure obtained from I-TASSER was shown to have a molecular weight of 14 kDa and a quality factor of 100% by ERRAT2, with 87.4% of residues in the favored region of the Ramachandran plot. Docking and simulation studies of azurin protein were conducted using HDOCK and Desmond servers, respectively. The resulting analysis revealed that Azurin docked against p53 and EphB2 receptors demonstrated maximum binding affinity, indicating its potential to cause apoptosis. The recombinant azurin gene was successfully cloned and expressed in a BL21 (DE3) strain using a pET20b expression vector under the control of the pelB ladder, followed by IPTG induction. The azurin protein was purified to high levels using affinity chromatography, yielding 70 mg/L. In vitro cytotoxicity assay was performed using MCF-7 cells, revealing the significant cytotoxicity of the azurin protein to be 105 µg/mL. These findings highlight the potential of azurin protein as an anticancer drug candidate.
Collapse
Affiliation(s)
- Shakira Aslam
- Centre for Applied Molecular Biology, University of the Punjab, Lahore 54590, Pakistan
| | - Hafiz Muzzammel Rehman
- School of Biochemistry and Biotechnology, University of the Punjab, Lahore 54590, Pakistan
- Department of Human Genetics and Molecular Biology, University of Health Science, Lahore 54600, Pakistan
| | | | - Ajaz Ahmad
- Department of Clinical Pharmacy, College of Pharmacy, King Saud University, Riyadh 11451, Saudi Arabia
| | - Nadeem Ahmed
- Centre of Excellence in Molecular Biology, University of the Punjab, Lahore 54000, Pakistan
- International Center for Genetic Engineering and Biotechnology, Galleria Padriciano, 99, 34149 Trieste, TS, Italy
| | - Muhammad Imran Amirzada
- Department of Pharmacy, COMSATS University Islamabad, Abbottabad Campus, Abbottabad 22010, Pakistan
- School of Pharmaceutical Sciences, Jiangnan University, Wuxi 214082, China
| | - Hafiz Muhammad Rehman
- Centre for Applied Molecular Biology, University of the Punjab, Lahore 54590, Pakistan
- University Institute of Medical Laboratory Technology, Faculty of Allied Health Sciences, The University of Lahore, Lahore 54000, Pakistan
| | - Humaira Yasmin
- Department of Infectious Diseases, Faculty of Medicine, South Kensington Campus, Imperial College, London W2 1NY, UK
- Department of Biosciences, COMSATS University Islamabad, Islamabad 54000, Pakistan
| | - Tariq Nadeem
- Centre of Excellence in Molecular Biology, University of the Punjab, Lahore 54000, Pakistan
| | - Hamid Bashir
- Centre for Applied Molecular Biology, University of the Punjab, Lahore 54590, Pakistan
| |
Collapse
|
3
|
Papakonstantinou E, Io Diakou K, Mitsis T, Dragoumani K, Bacopoulou F, Megalooikonomou V, Kossida S, Chrousos GP, Vlachakis D. Molecular fusion events in carcinogenic organisms: a bioinformatics study for the detection of fused proteins between viruses, bacteria and eukaryotes. EMBNET.JOURNAL 2022; 27:e1004. [PMID: 35464257 PMCID: PMC9029568 DOI: 10.14806/ej.27.0.1004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Molecular fusion events have a prominent role in the initial steps of carcinogenesis. In this study, a bioinformatics analysis was performed between four organisms that are known to induce cancer development in humans: two viruses, Human Herpesvirus 4, and Human T-cell leukaemia virus, one bacterium, Helicobacter Pylori, and one trematode, Schistosoma mansoni. The annotated proteomes from these organisms were analysed using the SAFE software to identify protein fusion events, which may provide insight into protein function similarities and possible merging events during the course of evolution. Based on the results, five fused proteins with very similar functions were detected, whereas proteins with different functions that might act in the same molecular complex or biochemical pathway were not found. Thus, this study analysed the above four well-known cancer-related organisms with de novo bioinformatics programs and provided useful information on protein fusion events, hopefully leading to deeper understanding of carcinogenenesis.
Collapse
Affiliation(s)
- Eleni Papakonstantinou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Kalliopi Io Diakou
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Thanasis Mitsis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Konstantina Dragoumani
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Flora Bacopoulou
- University Research Institute of Maternal and Child Health & Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece
| | - Vasilis Megalooikonomou
- Computer Engineering and Informatics Department, School of Engineering, University of Patras, Patras. Greece
| | - Sophia Kossida
- IMGT, The International ImMunoGeneTics Information System, Université de Montpellier, Laboratoire d'ImmunoGénétique Moléculaire and Institut de Génétique Humaine, University of Montpellier, Montpellier, France
| | - George P Chrousos
- University Research Institute of Maternal and Child Health & Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece
- Division of Endocrinology and Metabolism, Center of Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | - Dimitrios Vlachakis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
- University Research Institute of Maternal and Child Health & Precision Medicine, and UNESCO Chair on Adolescent Health Care, National and Kapodistrian University of Athens, "Aghia Sophia" Children's Hospital, Athens, Greece
- Division of Endocrinology and Metabolism, Center of Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| |
Collapse
|
4
|
Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7191684. [PMID: 35242211 PMCID: PMC8888042 DOI: 10.1155/2022/7191684] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 01/15/2022] [Accepted: 01/18/2022] [Indexed: 11/27/2022]
Abstract
Protein-protein interactions (PPIs) play a crucial role in understanding disease pathogenesis, genetic mechanisms, guiding drug design, and other biochemical processes, thus, the identification of PPIs is of great importance. With the rapid development of high-throughput sequencing technology, a large amount of PPIs sequence data has been accumulated. Researchers have designed many experimental methods to detect PPIs by using these sequence data, hence, the prediction of PPIs has become a research hotspot in proteomics. However, since traditional experimental methods are both time-consuming and costly, it is difficult to analyze and predict the massive amount of PPI data quickly and accurately. To address these issues, many computational systems employing machine learning knowledge were widely applied to PPIs prediction, thereby improving the overall recognition rate. In this paper, a novel and efficient computational technology is presented to implement a protein interaction prediction system using only protein sequence information. First, the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST) was employed to generate a position-specific scoring matrix (PSSM) containing protein evolutionary information from the initial protein sequence. Second, we used a novel data processing feature representation scheme, MatFLDA, to extract the essential information of PSSM for protein sequences and obtained five training and five testing datasets by adopting a five-fold cross-validation method. Finally, the random fern (RFs) classifier was employed to infer the interactions among proteins, and a model called MatFLDA_RFs was developed. The proposed MatFLDA_RFs model achieved good prediction performance with 95.03% average accuracy on Yeast dataset and 85.35% average accuracy on H. pylori dataset, which effectively outperformed other existing computational methods. The experimental results indicate that the proposed method is capable of yielding better prediction results of PPIs, which provides an effective tool for the detection of new PPIs and the in-depth study of proteomics. Finally, we also developed a web server for the proposed model to predict protein-protein interactions, which is freely accessible online at http://120.77.11.78:5001/webserver/MatFLDA_RFs.
Collapse
|
5
|
Hossain MS, Tonmoy MIQ, Fariha A, Islam MS, Roy AS, Islam MN, Kar K, Alam MR, Rahaman MM. Prediction of the Effects of Variants and Differential Expression of Key Host Genes ACE2, TMPRSS2, and FURIN in SARS-CoV-2 Pathogenesis: An In Silico Approach. Bioinform Biol Insights 2021; 15:11779322211054684. [PMID: 34720581 PMCID: PMC8554545 DOI: 10.1177/11779322211054684] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 10/02/2021] [Indexed: 12/15/2022] Open
Abstract
A new strain of the beta coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is solely responsible for the ongoing coronavirus disease 2019 (COVID-19) pandemic. Although several studies suggest that the spike protein of this virus interacts with the cell surface receptor, angiotensin-converting enzyme 2 (ACE2), and is subsequently cleaved by TMPRSS2 and FURIN to enter into the host cell, conclusive insight about the interaction pattern of the variants of these proteins is still lacking. Thus, in this study, we analyzed the functional conjugation among the spike protein, ACE2, TMPRSS2, and FURIN in viral pathogenesis as well as the effects of the mutations of the proteins through the implementation of several bioinformatics approaches. Analysis of the intermolecular interactions revealed that T27A (ACE2), G476S (receptor-binding domain [RBD] of the spike protein), C297T (TMPRSS2), and P812S (cleavage site for TMPRSS2) coding variants may render resistance in viral infection, whereas Q493L (RBD), S477I (RBD), P681R (cleavage site for FURIN), and P683W (cleavage site for FURIN) may lead to increase viral infection. Genotype-specific expression analysis predicted several genetic variants of ACE2 (rs2158082, rs2106806, rs4830971, and rs4830972), TMPRSS2 (rs458213, rs468444, rs4290734, and rs6517666), and FURIN (rs78164913 and rs79742014) that significantly alter their normal expression which might affect the viral spread. Furthermore, we also found that ACE2, TMPRSS2, and FURIN proteins are functionally co-related with each other, and several genes are highly co-expressed with them, which might be involved in viral pathogenesis. This study will thus help in future genomics and proteomics studies of SARS-CoV-2 and will provide an opportunity to understand the underlying molecular mechanism during SARS-CoV-2 pathogenesis.
Collapse
Affiliation(s)
- Md. Shahadat Hossain
- Department of Biotechnology & Genetic Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh
| | | | - Atqiya Fariha
- Department of Biotechnology & Genetic Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Md. Sajedul Islam
- Department of Biochemistry & Biotechnology, University of Barishal, Barishal, Bangladesh
| | - Arpita Singha Roy
- Department of Biotechnology & Genetic Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Md. Nur Islam
- Department of Biotechnology & Genetic Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Kumkum Kar
- Department of Biotechnology & Genetic Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Mohammad Rahanur Alam
- Department of Food Technology & Nutrition Science, Noakhali Science and Technology University, Noakhali, Bangladesh
| | | |
Collapse
|
6
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
7
|
Handling Noise in Protein Interaction Networks. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8984248. [PMID: 31828144 PMCID: PMC6885184 DOI: 10.1155/2019/8984248] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 09/23/2019] [Indexed: 12/22/2022]
Abstract
Protein-protein interactions (PPIs) can be conveniently represented as networks, allowing the use of graph theory for their study. Network topology studies may reveal patterns associated with specific organisms. Here, we propose a new methodology to denoise PPI networks and predict missing links solely based on the network topology, the organization measurement (OM) method. The OM methodology was applied in the denoising of the PPI networks of two Saccharomyces cerevisiae datasets (Yeast and CS2007) and one Homo sapiens dataset (Human). To evaluate the denoising capabilities of the OM methodology, two strategies were applied. The first strategy compared its application in random networks and in the reference set networks, while the second strategy perturbed the networks with the gradual random addition and removal of edges. The application of the OM methodology to the Yeast and Human reference sets achieved an AUC of 0.95 and 0.87, in Yeast and Human networks, respectively. The random removal of 80% of the Yeast and Human reference set interactions resulted in an AUC of 0.71 and 0.62, whereas the random addition of 80% interactions resulted in an AUC of 0.75 and 0.72, respectively. Applying the OM methodology to the CS2007 dataset yields an AUC of 0.99. We also perturbed the network of the CS2007 dataset by randomly inserting and removing edges in the same proportions previously described. The false positives identified and removed from the network varied from 97%, when inserting 20% more edges, to 89%, when 80% more edges were inserted. The true positives identified and inserted in the network varied from 95%, when removing 20% of the edges, to 40%, after the random deletion of 80% edges. The OM methodology is sensitive to the topological structure of the biological networks. The obtained results suggest that the present approach can efficiently be used to denoise PPI networks.
Collapse
|
8
|
Exploring the Potential of Spherical Harmonics and PCVM for Compounds Activity Prediction. Int J Mol Sci 2019; 20:ijms20092175. [PMID: 31052500 PMCID: PMC6539940 DOI: 10.3390/ijms20092175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/14/2019] [Accepted: 04/29/2019] [Indexed: 01/11/2023] Open
Abstract
Biologically active chemical compounds may provide remedies for several diseases. Meanwhile, Machine Learning techniques applied to Drug Discovery, which are cheaper and faster than wet-lab experiments, have the capability to more effectively identify molecules with the expected pharmacological activity. Therefore, it is urgent and essential to develop more representative descriptors and reliable classification methods to accurately predict molecular activity. In this paper, we investigate the potential of a novel representation based on Spherical Harmonics fed into Probabilistic Classification Vector Machines classifier, namely SHPCVM, to compound the activity prediction task. We make use of representation learning to acquire the features which describe the molecules as precise as possible. To verify the performance of SHPCVM ten-fold cross-validation tests are performed on twenty-one G protein-coupled receptors (GPCRs). Experimental outcomes (accuracy of 0.86) assessed by the classification accuracy, precision, recall, Matthews’ Correlation Coefficient and Cohen’s kappa reveal that using our Spherical Harmonics-based representation which is relatively short and Probabilistic Classification Vector Machines can achieve very satisfactory performance results for GPCRs.
Collapse
|
9
|
An JY, You ZH, Zhou Y, Wang DF. Sequence-based Prediction of Protein-Protein Interactions Using Gray Wolf Optimizer-Based Relevance Vector Machine. Evol Bioinform Online 2019; 15:1176934319844522. [PMID: 31080346 PMCID: PMC6498782 DOI: 10.1177/1176934319844522] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 03/20/2019] [Indexed: 12/18/2022] Open
Abstract
Protein-protein interactions (PPIs) are essential to a number of biological processes. The PPIs generated by biological experiment are both time-consuming and expensive. Therefore, many computational methods have been proposed to identify PPIs. However, most of these methods are limited as they are difficult to compute and rely on a large number of homologous proteins. Accordingly, it is urgent to develop effective computational methods to detect PPIs using only protein sequence information. The kernel parameter of relevance vector machine (RVM) is set by experience, which may not obtain the optimal solution, affecting the prediction performance of RVM. In this work, we presented a novel computational approach called GWORVM-BIG, which used Bi-gram (BIG) to represent protein sequences on a position-specific scoring matrix (PSSM) and GWORVM classifier to perform classification for predicting PPIs. More specifically, the proposed GWORVM model can obtain the optimum solution of kernel parameters using gray wolf optimizer approach, which has the advantages of less control parameters, strong global optimization ability, and ease of implementation compared with other optimization algorithms. The experimental results on yeast and human data sets demonstrated the good accuracy and efficiency of the proposed GWORVM-BIG method. The results showed that the proposed GWORVM classifier can significantly improve the prediction performance compared with the RVM model using other optimizer algorithms including grid search (GS), genetic algorithm (GA), and particle swarm optimization (PSO). In addition, the proposed method is also compared with other existing algorithms, and the experimental results further indicated that the proposed GWORVM-BIG model yields excellent prediction performance. For facilitating extensive studies for future proteomics research, the GWORVMBIG server is freely available for academic use at http://219.219.62.123:8888/GWORVMBIG.
Collapse
Affiliation(s)
- Ji-Yong An
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center of Minstry of Education of the People's Republic of China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center of Minstry of Education of the People's Republic of China
| | - Da-Fu Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China.,Mine Digitization Engineering Research Center of Minstry of Education of the People's Republic of China
| |
Collapse
|
10
|
Chen ZH, Li LP, He Z, Zhou JR, Li Y, Wong L. An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation. Front Genet 2019; 10:90. [PMID: 30881376 PMCID: PMC6405691 DOI: 10.3389/fgene.2019.00090] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 01/29/2019] [Indexed: 12/23/2022] Open
Abstract
Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on human and yeast datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for yeast and human datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.
Collapse
Affiliation(s)
- Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Li-Ping Li
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Zhou He
- College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
| | - Ji-Ren Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yangming Li
- ECTET, Rochester Institute of Technology, Rochester, NY, United States
| | - Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
11
|
Zhan ZH, Jia LN, Zhou Y, Li LP, Yi HC. BGFE: A Deep Learning Model for ncRNA-Protein Interaction Predictions Based on Improved Sequence Information. Int J Mol Sci 2019; 20:E978. [PMID: 30813451 PMCID: PMC6412311 DOI: 10.3390/ijms20040978] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Revised: 02/19/2019] [Accepted: 02/20/2019] [Indexed: 11/26/2022] Open
Abstract
The interactions between ncRNAs and proteins are critical for regulating various cellular processes in organisms, such as gene expression regulations. However, due to limitations, including financial and material consumptions in recent experimental methods for predicting ncRNA and protein interactions, it is essential to propose an innovative and practical approach with convincing performance of prediction accuracy. In this study, based on the protein sequences from a biological perspective, we put forward an effective deep learning method, named BGFE, to predict ncRNA and protein interactions. Protein sequences are represented by bi-gram probability feature extraction method from Position Specific Scoring Matrix (PSSM), and for ncRNA sequences, k-mers sparse matrices are employed to represent them. Furthermore, to extract hidden high-level feature information, a stacked auto-encoder network is employed with the stacked ensemble integration strategy. We evaluate the performance of the proposed method by using three datasets and a five-fold cross-validation after classifying the features through the random forest classifier. The experimental results clearly demonstrate the effectiveness and the prediction accuracy of our approach. In general, the proposed method is helpful for ncRNA and protein interacting predictions and it provides some serviceable guidance in future biological research.
Collapse
Affiliation(s)
- Zhao-Hui Zhan
- China University of Mining and Technology, Xuzhou 221116, China.
| | - Li-Na Jia
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, Shandong, China.
| | - Yong Zhou
- China University of Mining and Technology, Xuzhou 221116, China.
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| | - Hai-Cheng Yi
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.
| |
Collapse
|