1
|
Wang M, Ali H, Xu Y, Xie J, Xu S. BiPSTP: Sequence feature encoding method for identifying different RNA modifications with bidirectional position-specific trinucleotides propensities. J Biol Chem 2024; 300:107140. [PMID: 38447795 PMCID: PMC10997841 DOI: 10.1016/j.jbc.2024.107140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/17/2024] [Accepted: 02/25/2024] [Indexed: 03/08/2024] Open
Abstract
RNA modification, a posttranscriptional regulatory mechanism, significantly influences RNA biogenesis and function. The accurate identification of modification sites is paramount for investigating their biological implications. Methods for encoding RNA sequence into numerical data play a crucial role in developing robust models for predicting modification sites. However, existing techniques suffer from limitations, including inadequate information representation, challenges in effectively integrating positional and sequential information, and the generation of irrelevant or redundant features when combining multiple approaches. These deficiencies hinder the effectiveness of machine learning models in addressing the performance challenges associated with predicting RNA modification sites. Here, we introduce a novel RNA sequence feature representation method, named BiPSTP, which utilizes bidirectional trinucleotide position-specific propensities. We employ the parameter ξ to denote the interval between the current nucleotide and its adjacent forward or backward dinucleotide, enabling the extraction of positional and sequential information from RNA sequences. Leveraging the BiPSTP method, we have developed the prediction model mRNAPred using support vector machine classifier to identify multiple types of RNA modification sites. We evaluate the performance of our BiPSTP method and mRNAPred model across 12 distinct RNA modification types. Our experimental results demonstrate the superiority of the mRNAPred model compared to state-of-art models in the domain of RNA modification sites identification. Importantly, our BiPSTP method enhances the robustness and generalization performance of prediction models. Notably, it can be applied to feature extraction from DNA sequences to predict other biological modification sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Haider Ali
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yandi Xu
- School of Computer Science, Shaanxi Normal University, Xi'an, China; College of Life Sciences, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China.
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.
| |
Collapse
|
2
|
Omer A. MicroRNAs as powerful tool against COVID-19: Computational perspective. WIREs Mech Dis 2023; 15:e1621. [PMID: 37345625 DOI: 10.1002/wsbm.1621] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/13/2023] [Accepted: 05/23/2023] [Indexed: 06/23/2023]
Abstract
Severe acute respiratory syndrome coronavirus 2 is the virus that is responsible for the current pandemic, COVID-19 (SARS-CoV-2). MiRNAs, a component of RNAi technology, belong to the family of short, noncoding ssRNAs, and may be crucial in the battle against this global threat since they are involved in regulating complex biochemical pathways and may prevent viral proliferation, translation, and host expression. The complicated metabolic pathways are modulated by the activity of many proteins, mRNAs, and miRNAs working together in miRNA-mediated genetic control. The amount of omics data has increased dramatically in recent years. This massive, linked, yet complex metabolic regulatory network data offers a wealth of opportunity for iterative analysis; hence, extensive, in-depth, but time-efficient screening is necessary to acquire fresh discoveries; this is readily performed with the use of bioinformatics. We have reviewed the literature on microRNAs, bioinformatics, and COVID-19 infection to summarize (1) the function of miRNAs in combating COVID-19, and (2) the use of computational methods in combating COVID-19 in certain noteworthy studies, and (3) computational tools used by these studies against COVID-19 in several purposes. This article is categorized under: Infectious Diseases > Computational Models.
Collapse
Affiliation(s)
- Ankur Omer
- Government College Silodi, MPHED, Katni, Madhya Pradesh, India
| |
Collapse
|
3
|
Chiu CC, Wu CM, Chien TN, Kao LJ, Li C, Chu CM. Integrating Structured and Unstructured EHR Data for Predicting Mortality by Machine Learning and Latent Dirichlet Allocation Method. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4340. [PMID: 36901354 PMCID: PMC10001457 DOI: 10.3390/ijerph20054340] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 02/22/2023] [Accepted: 02/24/2023] [Indexed: 06/18/2023]
Abstract
An ICU is a critical care unit that provides advanced medical support and continuous monitoring for patients with severe illnesses or injuries. Predicting the mortality rate of ICU patients can not only improve patient outcomes, but also optimize resource allocation. Many studies have attempted to create scoring systems and models that predict the mortality of ICU patients using large amounts of structured clinical data. However, unstructured clinical data recorded during patient admission, such as notes made by physicians, is often overlooked. This study used the MIMIC-III database to predict mortality in ICU patients. In the first part of the study, only eight structured variables were used, including the six basic vital signs, the GCS, and the patient's age at admission. In the second part, unstructured predictor variables were extracted from the initial diagnosis made by physicians when the patients were admitted to the hospital and analyzed using Latent Dirichlet Allocation techniques. The structured and unstructured data were combined using machine learning methods to create a mortality risk prediction model for ICU patients. The results showed that combining structured and unstructured data improved the accuracy of the prediction of clinical outcomes in ICU patients over time. The model achieved an AUROC of 0.88, indicating accurate prediction of patient vital status. Additionally, the model was able to predict patient clinical outcomes over time, successfully identifying important variables. This study demonstrated that a small number of easily collectible structured variables, combined with unstructured data and analyzed using LDA topic modeling, can significantly improve the predictive performance of a mortality risk prediction model for ICU patients. These results suggest that initial clinical observations and diagnoses of ICU patients contain valuable information that can aid ICU medical and nursing staff in making important clinical decisions.
Collapse
Affiliation(s)
- Chih-Chou Chiu
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Chung-Min Wu
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Te-Nien Chien
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Ling-Jing Kao
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Chengcheng Li
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
| | - Chuan-Mei Chu
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
| |
Collapse
|
4
|
Saini S, Khurana S, Saini D, Rajput S, Thakur CJ, Singh J, Jaswal A, Kapoor Y, Kumar V, Saini A. In silico analysis of genomic landscape of SARS-CoV-2 and its variant of concerns (Delta and Omicron) reveals changes in the coding potential of miRNAs and their target genes. Gene 2023; 853:147097. [PMID: 36470485 PMCID: PMC9721428 DOI: 10.1016/j.gene.2022.147097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 11/24/2022] [Accepted: 11/29/2022] [Indexed: 12/12/2022]
Abstract
COVID-19 related morbidities and mortalities are still continued due to the emergence of new variants of SARS-CoV-2. In the last few years, viral miRNAs have been the centre of study to understand the disease pathophysiology. In this work, we aimed to predict the change in coding potential of the viral miRNAs in SARS-CoV-2's VOCs, Delta and Omicron compared to the Reference (Wuhan origin) strain using bioinformatics tools. After ab-intio based screening by the Vmir tool and validation, we retrieved 22, 6, and 6 pre-miRNAs for Reference, Delta, and Omicron. Most of the predicted unique pre-miRNAs of Delta and Omicron were found to be encoded from the terminal and origin of the genomic sequence, respectively. Mature miRNAs identified by MatureBayes from the unique pre-miRNAs were used for target identification using miRDB. A total of 1786, 216, and 143 high-confidence target genes were captured for GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis. The GO and KEGG pathways terms analysis revealed the involvement of Delta miRNAs targeted genes in the pathways such as Human cytomegalovirus infection, Breast cancer, Apoptosis, Neurotrophin signaling, and Axon guidance whereas the Sphingolipid signaling pathway was found for the Omicron. Furthermore, we focussed our analysis on target genes that were validated through GEO's (Gene Expression Omnibus) DEGs (Differentially Expressed Genes) dataset, in which FGL2, TNSF12, OGN, GDF11, and BMP11 target genes were found to be down-regulated by Reference miRNAs and YAE1 and RSU1 by Delta. Few genes were also observed to be validated among in up-regulated gene set of the GEO dataset, in which MMP14, TNFRSF21, SGMS1, and TMEM192 were related to Reference whereas ZEB2 was detected in all three strains. This study thus provides an in-silico based analysis that deciphered the unique pre-miRNAs in Delta and Omicron compared to Reference. However, the findings need future wet lab studies for validation.
Collapse
Affiliation(s)
- Sandeep Saini
- Department of Bioinformatics, Goswami Ganesh Dutta Sanatan Dharma College, Sector 32, Chandigarh 160030, India; Department of Biophysics, Panjab University, Sector 25, Chandigarh 160014, India.
| | - Savi Khurana
- Department of Bioinformatics, Goswami Ganesh Dutta Sanatan Dharma College, Sector 32, Chandigarh 160030, India
| | - Dikshant Saini
- Department of Bioinformatics, Goswami Ganesh Dutta Sanatan Dharma College, Sector 32, Chandigarh 160030, India
| | - Saru Rajput
- Department of Bioinformatics, Goswami Ganesh Dutta Sanatan Dharma College, Sector 32, Chandigarh 160030, India
| | - Chander Jyoti Thakur
- Department of Bioinformatics, Goswami Ganesh Dutta Sanatan Dharma College, Sector 32, Chandigarh 160030, India
| | - Jeevisha Singh
- Department of Bioinformatics, Goswami Ganesh Dutta Sanatan Dharma College, Sector 32, Chandigarh 160030, India
| | - Akanksha Jaswal
- Department of Bioinformatics, Goswami Ganesh Dutta Sanatan Dharma College, Sector 32, Chandigarh 160030, India
| | - Yogesh Kapoor
- Department of Engineering and Technology, Shoolini University, Solan, Himachal Pradesh, India
| | - Varinder Kumar
- Department of Bioinformatics, Goswami Ganesh Dutta Sanatan Dharma College, Sector 32, Chandigarh 160030, India
| | - Avneet Saini
- Department of Biophysics, Panjab University, Sector 25, Chandigarh 160014, India.
| |
Collapse
|
5
|
Zhang S, Wang J, Li X, Liang Y. M6A-GSMS: Computational identification of N 6-methyladenosine sites with GBDT and stacking learning in multiple species. J Biomol Struct Dyn 2022; 40:12380-12391. [PMID: 34459713 DOI: 10.1080/07391102.2021.1970628] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
N6-methyladenosine (m6A) is one of the most abundant forms of RNA methylation modifications currently known. It involves a wide range of biological processes, including degradation, stability, alternative splicing, etc. Therefore, the development of convenient and efficient m6A prediction technologies are urgent. In this work, a novel predictor based on GBDT and stacking learning is developed to identify m6A sites, which is called M6A-GSMS. To achieve accurate prediction, we explore RNA sequence information from four aspects: correlation, structure, physicochemical properties and pseudo ribonucleic acid composition. After using the GBDT algorithm for feature selection, a stacking model is constructed by combining seven basic classifiers. Compared with other state-of-the-art methods, the results show that M6A-GSMS can obtain excellent performance for identifying the m6A sites. The prediction accuracy of A.thaliana, D.melanogaster, M.musculus, S.cerevisiae and Human reaches 88.4%, 60.8%, 80.5%, 92.4% and 61.8%, respectively. This method provides an effective prediction for the investigation of m6A sites. In addition, all the datasets and codes are currently available at https://github.com/Wang-Jinyue/M6A-GSMS.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Jinyue Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Xinjie Li
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| |
Collapse
|
6
|
Shahrear S, Zinnia MA, Ahmed T, Islam ABMMK. Deciphering the role of predicted miRNAs of polyomaviruses in carcinogenesis. Biochim Biophys Acta Mol Basis Dis 2022; 1868:166537. [PMID: 36089125 DOI: 10.1016/j.bbadis.2022.166537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 08/13/2022] [Accepted: 09/01/2022] [Indexed: 11/20/2022]
Abstract
Human polyomaviruses are relatively common in the general population. Polyomaviruses maintain a persistent infection after initial infection in childhood, acting as an opportunistic pathogen in immunocompromised populations and their association has been linked to carcinogenesis. A comprehensive understanding of the underlying molecular mechanisms of carcinogenesis in consequence of polyomavirus infection remains elusive. However, the critical role of viral miRNAs and their potential targets in modifying the transcriptome profile of the host remains largely unknown. Polyomavirus-derived miRNAs have the potential to play a substantial role in carcinogenesis. Employing computational approaches, putative viral miRNAs along with their target genes have been predicted and possible roles of the targeted genes in many significant biological processes have been obtained. Polyomaviruses have been observed to target intracellular signal transduction pathways through miRNA-mediated epigenetic regulation, which may contribute to cancer development. In addition, BKPyV-infected human renal cell microarray data was coupled with predicted target genes and analysis of the downregulated genes indicated that viruses target multiple signaling pathways (e.g. MAPK signaling pathway, PI3K-Akt signaling pathway, PPAR signaling pathway) in the host as well as turning off several tumor suppression genes (e.g. FGGY, EPHX2, CACNA2D3, CDH16) through miRNA-induced mechanisms, assuring cell transformation. This study provides a conceptual framework for the underlying molecular mechanisms involved in the course of carcinogenesis upon polyomavirus infection.
Collapse
Affiliation(s)
- Sazzad Shahrear
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Dhaka, Bangladesh
| | | | - Tasnim Ahmed
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Dhaka, Bangladesh
| | | |
Collapse
|
7
|
Wang M, Li F, Wu H, Liu Q, Li S. PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest. Interdiscip Sci 2022; 14:697-711. [PMID: 35488998 DOI: 10.1007/s12539-022-00520-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 04/05/2022] [Accepted: 04/05/2022] [Indexed: 12/12/2022]
Abstract
Promoters short DNA sequences play vital roles in initiating gene transcription. However, it remains a challenge to identify promoters using conventional experiment techniques in a high-throughput manner. To this end, several computational predictors based on machine learning models have been developed, while their performance is unsatisfactory. In this study, we proposed a novel two-layer predictor, called PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning. PredPromoter-MF(2L) was developed based on various deep features learned by a pre-trained deep learning network model and sequence-derived features. Feature selection based on XGBoost was applied to reduce fused features dimensions, and a cascade deep forest model was trained on the selected feature subset for promoter prediction. The results both fivefold cross-validation and independent test demonstrated that PredPromoter-MF(2L) outperformed state-of-the-art methods.
Collapse
Affiliation(s)
- Miao Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC, 3000, Australia
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.
| |
Collapse
|
8
|
DeepRHD: An efficient Hybrid feature Extraction technique for protein remote homology detection using Deep learning strategies. Comput Biol Chem 2022; 100:107749. [DOI: 10.1016/j.compbiolchem.2022.107749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 07/28/2022] [Accepted: 07/30/2022] [Indexed: 11/19/2022]
|
9
|
Chiu CC, Wu CM, Chien TN, Kao LJ, Qiu JT. Predicting the Mortality of ICU Patients by Topic Model with Machine-Learning Techniques. Healthcare (Basel) 2022; 10:healthcare10061087. [PMID: 35742138 PMCID: PMC9222812 DOI: 10.3390/healthcare10061087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/07/2022] [Accepted: 06/08/2022] [Indexed: 11/16/2022] Open
Abstract
Predicting clinical patients’ vital signs is a leading critical issue in intensive care units (ICUs) related studies. Early prediction of the mortality of ICU patients can reduce the overall mortality and cost of complication treatment. Some studies have predicted mortality based on electronic health record (EHR) data by using machine learning models. However, the semi-structured data (i.e., patients’ diagnosis data and inspection reports) is rarely used in these models. This study utilized data from the Medical Information Mart for Intensive Care III. We used a Latent Dirichlet Allocation (LDA) model to classify text in the semi-structured data of some particular topics and established and compared the classification and regression trees (CART), logistic regression (LR), multivariate adaptive regression splines (MARS), random forest (RF), and gradient boosting (GB). A total of 46,520 ICU Patients were included, with 11.5% mortality in the Medical Information Mart for Intensive Care III group. Our results revealed that the semi-structured data (diagnosis data and inspection reports) of ICU patients contain useful information that can assist clinical doctors in making critical clinical decisions. In addition, in our comparison of five machine learning models (CART, LR, MARS, RF, and GB), the GB model showed the best performance with the highest area under the receiver operating characteristic curve (AUROC) (0.9280), specificity (93.16%), and sensitivity (83.25%). The RF, LR, and MARS models showed better performance (AUROC are 0.9096, 0.8987, and 0.8935, respectively) than the CART (0.8511). The GB model showed better performance than other machine learning models (CART, LR, MARS, and RF) in predicting the mortality of patients in the intensive care unit. The analysis results could be used to develop a clinically useful decision support system.
Collapse
Affiliation(s)
- Chih-Chou Chiu
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan; (C.-C.C.); (C.-M.W.); (L.-J.K.)
| | - Chung-Min Wu
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan; (C.-C.C.); (C.-M.W.); (L.-J.K.)
| | - Te-Nien Chien
- College of Management, National Taipei University of Technology, Taipei 106, Taiwan
- Correspondence: ; Tel.: +886-2-2771-2171 (ext. 3403)
| | - Ling-Jing Kao
- Department of Business Management, National Taipei University of Technology, Taipei 106, Taiwan; (C.-C.C.); (C.-M.W.); (L.-J.K.)
| | - Jiantai Timothy Qiu
- Department of Obstetrics and Gynecology, Taipei Medical University Hospital, Taipei 110, Taiwan;
- College of Medicine, Taipei Medical University, Taipei 110, Taiwan
| |
Collapse
|
10
|
Min H, Xin XH, Gao CQ, Wang L, Du PF. XGEM: Predicting Essential miRNAs by the Ensembles of Various Sequence-Based Classifiers With XGBoost Algorithm. Front Genet 2022; 13:877409. [PMID: 35419029 PMCID: PMC8996062 DOI: 10.3389/fgene.2022.877409] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 03/07/2022] [Indexed: 01/27/2023] Open
Abstract
MicroRNAs (miRNAs) play vital roles in gene expression regulations. Identification of essential miRNAs is of fundamental importance in understanding their cellular functions. Experimental methods for identifying essential miRNAs are always costly and time-consuming. Therefore, computational methods are considered as alternative approaches. Currently, only a handful of studies are focused on predicting essential miRNAs. In this work, we proposed to predict essential miRNAs using the XGBoost framework with CART (Classification and Regression Trees) on various types of sequence-based features. We named this method as XGEM (XGBoost for essential miRNAs). The prediction performance of XGEM is promising. In comparison with other state-of-the-art methods, XGEM performed the best, indicating its potential in identifying essential miRNAs.
Collapse
Affiliation(s)
- Hui Min
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Xiao-Hong Xin
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Chu-Qiao Gao
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Likun Wang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center of Life Sciences, Peking University Health Science Center, Beijing, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
11
|
Samami E, Pourali G, Arabpour M, Fanipakdel A, Shahidsales S, Javadinia SA, Hassanian SM, Mohammadparast S, Avan A. The Potential Diagnostic and Prognostic Value of Circulating MicroRNAs in the Assessment of Patients With Prostate Cancer: Rational and Progress. Front Oncol 2022; 11:716831. [PMID: 35186706 PMCID: PMC8855122 DOI: 10.3389/fonc.2021.716831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 12/31/2021] [Indexed: 12/20/2022] Open
Abstract
Prostate cancer (P.C.) is one of the most frequent diagnosed cancers among men and the first leading cause of death with an annual incidence of 1.4 million worldwide. Prostate-specific antigen is being used for screening/diagnosis of prostate disease, although it is associated with several limitations. Thus, identification of novel biomarkers is warranted for diagnosis of patients at earlier stages. MicroRNAs (miRNAs) are recently being emerged as potential biomarkers. It has been shown that these small molecules can be circulated in body fluids and prognosticate the risk of developing P.C. Several miRNAs, including MiR-20a, MiR-21, miR-375, miR-378, and miR-141, have been proposed to be expressed in prostate cancer. This review summarizes the current knowledge about possible molecular mechanisms and potential application of tissue specific and circulating microRNAs as diagnosis, prognosis, and therapeutic targets in prostate cancer.
Collapse
Affiliation(s)
- Elham Samami
- Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Tehran University of Medical Sciences, Tehran, Iran
| | - Ghazaleh Pourali
- Cancer Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahla Arabpour
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Azar Fanipakdel
- Cancer Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | | | - Seyed Alireza Javadinia
- Vasei Clinical Research Development Unit, Sabzevar University of Medical Sciences, Sabzevar, Iran
| | - Seyed Mahdi Hassanian
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Saeid Mohammadparast
- Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Amir Avan
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
- Basic Medical Sciences Institute, Mashhad University of Medical Sciences, Mashhad, Iran
- *Correspondence: Amir Avan,
| |
Collapse
|
12
|
Gharbi S, Mohammadi Z, Dezaki MS, Dokanehiifard S, Dabiri S, Korsching E. Characterization of the first microRNA in human CDH1 that affects cell cycle and apoptosis and indicates breast cancers progression. J Cell Biochem 2022; 123:657-672. [PMID: 34997630 DOI: 10.1002/jcb.30211] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/26/2021] [Accepted: 12/21/2021] [Indexed: 11/12/2022]
Abstract
The E-cadherin protein (Cadherin 1, gene: CDH1), a master regulator of the human epithelial homeostasis, contributes to the epithelial-mesenchymal transition (EMT) which confers cell migratory features to the cells. The EMT is central to many pathophysiological changes in cancer. Therefore, a better understanding of this regulatory scenario is beneficial for therapeutic regiments. The CDH1 gene is approximately 100 kbp long and consists of 16 exons with a relatively large second intron. Since none microRNA (miRNA) has been identified in CDH1 up to now we screened the CDH1 gene for promising miRNA hairpin structures in silico. Out of the 27 hairpin structures we identified, one stable RNA fold with a promising sequence motive was selected for experimental verification. The exogenous validation of the hairpin sequence was performed by transfection of HEK293T cells and the mature miRNA sequences could be verified by quantitative polymerase chain reaction. The endogenous expression of the mature miRNA provisionally named CDH1-i2-miR-1 could be confirmed in two normal (HEK293T, HUVEK) and five cancer cell lines (MCF7, MDA-MB-231, SW480, HT-29, A549). The functional characterization by the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide assay showed a suppression of HEK293T cell proliferation. A flow cytometry-based approach showed the ability of CDH1-i2-miR-1 to arrest transfected cells on a G2/M state while annexin staining exemplified an apoptotic effect. BAX and PTEN expression levels were affected following the overexpression with the new miRNA. The in vivo expression level was assessed in 35 breast tumor tissues and their paired nonmalignant marginal part. A fourfold downregulation in the tumor specimens compared to their marginal controls could be observed. It can be concluded that the sequence of the hub gene CDH1 harbors at least one miRNA but eventually even more relevant for the pathophysiology of breast cancer.
Collapse
Affiliation(s)
- Sedigheh Gharbi
- Department of Biology, Faculty of Sciences, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Zahra Mohammadi
- Department of Biology, Faculty of Sciences, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Maryam Saedi Dezaki
- Department of Biology, Faculty of Sciences, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Sadat Dokanehiifard
- Department of Human Genetics, Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Shahriar Dabiri
- Department of Pathology, Pathology and Stem Cell Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Eberhard Korsching
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| |
Collapse
|
13
|
Rahaman M, Komanapalli J, Mukherjee M, Byram PK, Sahoo S, Chakravorty N. Decrypting the role of predicted SARS-CoV-2 miRNAs in COVID-19 pathogenesis: A bioinformatics approach. Comput Biol Med 2021; 136:104669. [PMID: 34320442 PMCID: PMC8294073 DOI: 10.1016/j.compbiomed.2021.104669] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 07/18/2021] [Accepted: 07/18/2021] [Indexed: 12/14/2022]
Abstract
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), is a highly transmissible virus causing the ongoing global pandemic, COVID-19. Evidence suggests that viral and host microRNAs play pivotal roles in progression of such infections. The decisive impact of viral miRNAs and their putative targets in modulating the transcriptomic profile of its host, however remains unexplored. We hypothesized that the SARS-CoV-2 derived miRNAs can potentially play a contributory role in its pathogenicity and aid in its survival. A series of computational tools predicted 34 SARS-CoV-2 encoded miRNAs and their putative targets in the host. Immune and apoptotic pathways were identified as most enriched pathways. Further investigation using a dataset of SARS-CoV-2 infected cells (available from public repository- GSE150392) revealed that 46 genes related to immune and apoptosis-related functions were deregulated. Of these 46 genes, 42 genes were identified to be significantly up-regulated and 4 genes were down-regulated. In silico analysis revealed all of the these significantly down-regulated genes to be putative targets of 9 out of 34 of our predicted viral miRNAs. Overall, 123 out of 324 genes that are differentially regulated in SARS-CoV2 infected cells, and also identified as putative targets of viral miRNAs, were found to be significantly down-regulated. KEGG pathway analysis using these genes revealed p53 signaling as the most enriched pathway – a pathway that is known to influence immune responses. This study thus provides the theoretical foundation for the underlying molecular mechanisms involved in progression of viral pathogenesis.
Collapse
Affiliation(s)
- Motiur Rahaman
- School of Medical Science and Technology, IIT Kharagpur, Kharagpur, Paschim Medinipur, West Bengal, 721302, India
| | - Jaikrishna Komanapalli
- Department of Biotechnology, IIT Kharagpur, Kharagpur, Paschim Medinipur, West Bengal, 721302, India
| | - Mandrita Mukherjee
- School of Medical Science and Technology, IIT Kharagpur, Kharagpur, Paschim Medinipur, West Bengal, 721302, India
| | - Prasanna Kumar Byram
- School of Medical Science and Technology, IIT Kharagpur, Kharagpur, Paschim Medinipur, West Bengal, 721302, India
| | - Sunanda Sahoo
- School of Medical Science and Technology, IIT Kharagpur, Kharagpur, Paschim Medinipur, West Bengal, 721302, India
| | - Nishant Chakravorty
- School of Medical Science and Technology, IIT Kharagpur, Kharagpur, Paschim Medinipur, West Bengal, 721302, India.
| |
Collapse
|
14
|
iDRP-PseAAC: Identification of DNA Replication Proteins Using General PseAAC and Position Dependent Features. Int J Pept Res Ther 2021; 27:1315-1329. [PMID: 33584161 PMCID: PMC7869428 DOI: 10.1007/s10989-021-10170-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/18/2021] [Indexed: 10/25/2022]
Abstract
DNA replication is one of the specific processes to be considered in all the living organisms, specifically eukaryotes. The prevalence of DNA replication is significant for an evolutionary transition at the beginning of life. DNA replication proteins are those proteins which support the process of replication and are also reported to be important in drug design and discovery. This information depicts that DNA replication proteins have a very important role in human bodies, however, to study their mechanism, their identification is necessary. Thus, it is a very important task but, in any case, an experimental identification is time-consuming, highly-costly and laborious. To cope with this issue, a computational methodology is required for prediction of these proteins, however, no prior method exists. This study comprehends the construction of novel prediction model to serve the proposed purpose. The prediction model is developed based on the artificial neural network by integrating the position relative features and sequence statistical moments in PseAAC for training neural networks. Highest overall accuracy has been achieved through tenfold cross-validation and Jackknife testing that was computed to be 96.22% and 98.56%, respectively. Our astonishing experimental results demonstrated that the proposed predictor surpass the existing models that can be served as a time and cost-effective stratagem for designing novel drugs to strike the contemporary bacterial infection.
Collapse
|
15
|
Guan ZX, Li SH, Zhang ZM, Zhang D, Yang H, Ding H. A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods. Curr Genomics 2020; 21:11-25. [PMID: 32655294 PMCID: PMC7324890 DOI: 10.2174/1389202921666200214125102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 01/24/2020] [Accepted: 01/30/2020] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.
Collapse
Affiliation(s)
- Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Shi-Hao Li
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu610054, China
| |
Collapse
|
16
|
Chou KC. An Insightful 10-year Recollection Since the Emergence of the 5-steps Rule. Curr Pharm Des 2020; 25:4223-4234. [PMID: 31782354 DOI: 10.2174/1381612825666191129164042] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/25/2019] [Indexed: 11/22/2022]
Abstract
OBJECTIVE One of the most challenging and also the most difficult problems is how to formulate a biological sequence with a vector but considerably keep its sequence order information. METHODS To address such a problem, the approach of Pseudo Amino Acid Components or PseAAC has been developed. RESULTS AND CONCLUSION It has become increasingly clear via the 10-year recollection that the aforementioned proposal has been indeed very powerful.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts 02478, United States.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
17
|
|
18
|
Liu B. BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform 2020; 20:1280-1294. [PMID: 29272359 DOI: 10.1093/bib/bbx165] [Citation(s) in RCA: 194] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 11/08/2017] [Indexed: 01/07/2023] Open
Abstract
With the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems is how to computationally analyze their structures and functions. Machine learning techniques are playing key roles in this field. Typically, predictors based on machine learning techniques contain three main steps: feature extraction, predictor construction and performance evaluation. Although several Web servers and stand-alone tools have been developed to facilitate the biological sequence analysis, they only focus on individual step. In this regard, in this study a powerful Web server called BioSeq-Analysis (http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/) has been proposed to automatically complete the three main steps for constructing a predictor. The user only needs to upload the benchmark data set. BioSeq-Analysis can generate the optimized predictor based on the benchmark data set, and the performance measures can be reported as well. Furthermore, to maximize user's convenience, its stand-alone program was also released, which can be downloaded from http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/download/, and can be directly run on Windows, Linux and UNIX. Applied to three sequence analysis tasks, experimental results showed that the predictors generated by BioSeq-Analysis even outperformed some state-of-the-art methods. It is anticipated that BioSeq-Analysis will become a useful tool for biological sequence analysis.
Collapse
|
19
|
Liu H, Ren G, Chen H, Liu Q, Yang Y, Zhao Q. Predicting lncRNA–miRNA interactions based on logistic matrix factorization with neighborhood regularized. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105261] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
20
|
Shao Y, Chou KC. pLoc_Deep-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.126034] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
21
|
Ru X, Cao P, Li L, Zou Q. Selecting Essential MicroRNAs Using a Novel Voting Method. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:16-23. [PMID: 31479921 PMCID: PMC6727015 DOI: 10.1016/j.omtn.2019.07.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 06/20/2019] [Accepted: 07/08/2019] [Indexed: 02/06/2023]
Abstract
Among the large number of known microRNAs (miRNAs), some miRNAs play negligible roles in cell regulation. Therefore, selecting essential miRNAs is an important initial step for a deeper understanding of miRNAs and their functions. In this study, we generated 60 classification models by combining 12 representative feature extraction methods and 5 commonly used classification algorithms. The optimal model for essential miRNA classification that we obtained is based on the Mismatch feature extraction method combined with the random forest algorithm. The F-Measure, area under the curve, and accuracy values of this model were 93.2%, 96.7%, and 93.0%, respectively. We also found that the distribution of the positive and negative examples of the first few features greatly influenced the classification results. The feature extraction methods performed best when the differences between the positive and negative examples were obvious, and this led to better classification of essential miRNAs. Because each classifier's predictions for the same sample may be different, we employed a novel voting method to improve the accuracy of the classification of essential miRNAs. The performance results showed that the best classification results were obtained when five classification models were used in the voting. The five classification models were constructed based on the Mismatch, pseudo-distance structure status pair composition, Subsequence, Kmer, and Triplet feature extraction methods. The voting result was 95.3%. Our results suggest that the voting method can be an important tool for selecting essential miRNAs.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Peigang Cao
- Department of Cardiology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
22
|
Chou KC. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis. Curr Top Med Chem 2019; 19:2283-2300. [DOI: 10.2174/1568026619666191018100141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023]
Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
23
|
Li CC, Liu B. MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks. Brief Bioinform 2019; 21:2133-2141. [PMID: 31774907 DOI: 10.1093/bib/bbz133] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 12/31/2022] Open
Abstract
Protein fold recognition is one of the most critical tasks to explore the structures and functions of the proteins based on their primary sequence information. The existing protein fold recognition approaches rely on features reflecting the characteristics of protein folds. However, the feature extraction methods are still the bottleneck of the performance improvement of these methods. In this paper, we proposed two new feature extraction methods called MotifCNN and MotifDCNN to extract more discriminative fold-specific features based on structural motif kernels to construct the motif-based convolutional neural networks (CNNs). The pairwise sequence similarity scores calculated based on fold-specific features are then fed into support vector machines to construct the predictor for fold recognition, and a predictor called MotifCNN-fold has been proposed. Experimental results on the benchmark dataset showed that MotifCNN-fold obviously outperformed all the other competing methods. In particular, the fold-specific features extracted by MotifCNN and MotifDCNN are more discriminative than the fold-specific features extracted by other deep learning techniques, indicating that incorporating the structural motifs into the CNN is able to capture the characteristics of protein folds.
Collapse
Affiliation(s)
- Chen-Chen Li
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
24
|
Chou KC. Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs. Curr Med Chem 2019; 26:4918-4943. [PMID: 31060481 DOI: 10.2174/0929867326666190507082559] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 12/16/2022]
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
25
|
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
26
|
Wong L, Huang YA, You ZH, Chen ZH, Cao MY. LNRLMI: Linear neighbour representation for predicting lncRNA-miRNA interactions. J Cell Mol Med 2019; 24:79-87. [PMID: 31568653 PMCID: PMC6933323 DOI: 10.1111/jcmm.14583] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 06/23/2019] [Accepted: 07/13/2019] [Indexed: 12/14/2022] Open
Abstract
LncRNA and miRNA are key molecules in mechanism of competing endogenous RNAs(ceRNA), and their interactions have been discovered with important roles in gene regulation. As supplementary to the identification of lncRNA‐miRNA interactions from CLIP‐seq experiments, in silico prediction can select the most potential candidates for experimental validation. Although developing computational tool for predicting lncRNA‐miRNA interaction is of great importance for deciphering the ceRNA mechanism, little effort has been made towards this direction. In this paper, we propose an approach based on linear neighbour representation to predict lncRNA‐miRNA interactions (LNRLMI). Specifically, we first constructed a bipartite network by combining the known interaction network and similarities based on expression profiles of lncRNAs and miRNAs. Based on such a data integration, linear neighbour representation method was introduced to construct a prediction model. To evaluate the prediction performance of the proposed model, k‐fold cross validations were implemented. As a result, LNRLMI yielded the average AUCs of 0.8475 ± 0.0032, 0.8960 ± 0.0015 and 0.9069 ± 0.0014 on 2‐fold, 5‐fold and 10‐fold cross validation, respectively. A series of comparison experiments with other methods were also conducted, and the results showed that our method was feasible and effective to predict lncRNA‐miRNA interactions via a combination of different types of useful side information. It is anticipated that LNRLMI could be a useful tool for predicting non‐coding RNA regulation network that lncRNA and miRNA are involved in.
Collapse
Affiliation(s)
- Leon Wong
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Yu-An Huang
- Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhan-Heng Chen
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.,University of Chinese Academy of Sciences, Beijing, China
| | | |
Collapse
|
27
|
Systematic large-scale meta-analysis identifies miRNA-429/200a/b and miRNA-141/200c clusters as biomarkers for necrotizing enterocolitis in newborn. Biosci Rep 2019; 39:BSR20191503. [PMID: 31383782 PMCID: PMC6757181 DOI: 10.1042/bsr20191503] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 07/18/2019] [Accepted: 08/01/2019] [Indexed: 12/14/2022] Open
Abstract
Necrotizing enterocolitis (NEC) is a critical neonatal disease with a high mortality. The possibility that miRNAs may play an important role in NEC has raised great attention. Hence, the present study identified biomarkers that affected NEC in newborn progression through miRNA and gene expression profile analysis. miRNA chip GSE68054 and gene chip GSE46619 of NEC in newborn were analyzed to screen out differentially expressed miRNA and differentially expressed genes (DEGs). Next, target genes of differentially expressed miRNA were predicted, and differentially expressed miRNA-DEG regulatory network was constructed to select key miRNAs. After gene ontology and kyoto encyclopedia of genes and genomes enrichment analysis on target genes of key miRNAs, the target genes enriched in pathways were extracted to establish differentially expressed miRNA-DEG-disease gene network for gene interaction analysis. Targetting relationship between miRNAs and target genes was verified. A total of 15 miRNAs were differentially expressed in NEC in newborn, amongst which miR-429/200a/b and miR-141/200c clusters were poorly expressed and might play a significant role in NEC in newborn. Besides, target genes of miR-429/200a/b and miR-141/200c clusters were enriched in 11 signaling pathways. Vascular endothelial growth factor (VEGFA), E-selectin (SELE), kinase insert domain receptor (KDR), fms-related tyrosine kinase 1 (FLT1), and hepatocyte growth factor (HGF) were highly expressed in NEC in newborn, which were negatively regulated by miR-429/200a/b and miR-141/200c clusters and shared close association with disease genes. miR-429/200a/b and miR-141/200c clusters are poorly expressed while their target genes (VEGFA, SELE, KDR, FLT1, and HGF) are highly expressed in NEC in newborn, which might be identified as important biomarkers for this disease.
Collapse
|
28
|
Touati R, Oueslati AE, Messaoudi I, Lachiri Z. The Helitron family classification using SVM based on Fourier transform features applied on an unbalanced dataset. Med Biol Eng Comput 2019; 57:2289-2304. [PMID: 31422557 DOI: 10.1007/s11517-019-02027-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 08/02/2019] [Indexed: 02/07/2023]
Abstract
Helitrons are mobile sequences which belong to the class 2 of eukaryotic transposons. Their specificity resides in their mechanism of transposition: the rolling circle mechanism. They play an important role in remodeling proteomes due to their ability to modify existing genes and introducing new ones. A major difficulty in identifying and classifying Helitron families comes from the complex structure, the unspecified length, and the unbalanced appearance number of each Helitron type. The Helitron's recognition is still not solved in literature. The purpose of this paper is to characterize and classify Helitron types using spectral features and support vector machine (SVM) classification technique. Thus, the helitronic DNA is transformed into a numerical form using the FCGS2 coding technique. Then, a set of spectral features is extracted from the smoothed Fourier transform applied on the FCGS2 signals. Based on the spectral signature and the classification's confusion matrix, we demonstrated that some specific classes which do not show similarities, such as HelitronY2 and NDNAX3, are easily discriminated with important accuracy rates exceeding 90%. However, some Helitron types have great similarities such as the following: Helitron1, HelitronY1, HelitronY1A, and HelitronY4. Our system is also able to predict them with promising values reaching 70%. Graphical abstract The Helitron recognizer based on features extracted from smoothed Fourier transform.
Collapse
Affiliation(s)
- Rabeb Touati
- LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), University of Tunis El Manar, Tunis, Tunisia.
- SITI Laboratory, National School of Engineers of Tunis (ENIT), University Tunis El Manar, BP 37, le Belvédère, 1002, Tunis, Tunisia.
| | - Afef Elloumi Oueslati
- SITI Laboratory, National School of Engineers of Tunis (ENIT), University Tunis El Manar, BP 37, le Belvédère, 1002, Tunis, Tunisia
- Electrical Engineering Department, National School of Engineers of Carthage (ENICarthage), University of Carthage, Carthage, Tunisia
| | - Imen Messaoudi
- SITI Laboratory, National School of Engineers of Tunis (ENIT), University Tunis El Manar, BP 37, le Belvédère, 1002, Tunis, Tunisia
- Industrial Computing Department, Higher Institute of Information Technologies and Communications (ISTIC), University of Carthage, Carthage, Tunisia
| | - Zied Lachiri
- SITI Laboratory, National School of Engineers of Tunis (ENIT), University Tunis El Manar, BP 37, le Belvédère, 1002, Tunis, Tunisia
| |
Collapse
|
29
|
Peng LX, Liu XH, Lu B, Liao SM, Zhou F, Huang JM, Chen D, Troy FA, Zhou GP, Huang RB. The Inhibition of Polysialyltranseferase ST8SiaIV Through Heparin Binding to Polysialyltransferase Domain (PSTD). Med Chem 2019; 15:486-495. [PMID: 30569872 DOI: 10.2174/1573406415666181218101623] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 11/22/2022]
Abstract
BACKGROUND The polysialic acid (polySia) is a unique carbohydrate polymer produced on the surface Of Neuronal Cell Adhesion Molecule (NCAM) in a number of cancer cells, and strongly correlates with the migration and invasion of tumor cells and with aggressive, metastatic disease and poor clinical prognosis in the clinic. Its synthesis is catalyzed by two polysialyltransferases (polySTs), ST8SiaIV (PST) and ST8SiaII (STX). Selective inhibition of polySTs, therefore, presents a therapeutic opportunity to inhibit tumor invasion and metastasis due to NCAM polysialylation. Heparin has been found to be effective in inhibiting the ST8Sia IV activity, but no clear molecular rationale. It has been found that polysialyltransferase domain (PSTD) in polyST plays a significant role in influencing polyST activity, and thus it is critical for NCAM polysialylation based on the previous studies. OBJECTIVE To determine whether the three different types of heparin (unfractionated hepain (UFH), low molecular heparin (LMWH) and heparin tetrasaccharide (DP4)) is bound to the PSTD; and if so, what are the critical residues of the PSTD for these binding complexes? METHODS Fluorescence quenching analysis, the Circular Dichroism (CD) spectroscopy, and NMR spectroscopy were used to determine and analyze interactions of PSTD-UFH, PSTD-LMWH, and PSTD-DP4. RESULTS The fluorescence quenching analysis indicates that the PSTD-UFH binding is the strongest and the PSTD-DP4 binding is the weakest among these three types of the binding; the CD spectra showed that mainly the PSTD-heparin interactions caused a reduction in signal intensity but not marked decrease in α-helix content; the NMR data of the PSTD-DP4 and the PSTDLMWH interactions showed that the different types of heparin shared 12 common binding sites at N247, V251, R252, T253, S257, R265, Y267, W268, L269, V273, I275, and K276, which were mainly distributed in the long α-helix of the PSTD and the short 3-residue loop of the C-terminal PSTD. In addition, three residues K246, K250 and A254 were bound to the LMWH, but not to DP4. This suggests that the PSTD-LMWH binding is stronger than the PSTD-DP4 binding, and the LMWH is a more effective inhibitor than DP4. CONCLUSION The findings in the present study demonstrate that PSTD domain is a potential target of heparin and may provide new insights into the molecular rationale of heparin-inhibiting NCAM polysialylation.
Collapse
Affiliation(s)
- Li-Xin Peng
- Life Science and Technology College, Guangxi University, Nanning, Guangxi, 530004 China; 2Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007, China
| | - Xue-Hui Liu
- Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Bo Lu
- National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007, China
| | - Si-Ming Liao
- National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007, China
| | - Feng Zhou
- National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007, China
| | - Ji-Min Huang
- National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007, China
| | - Dong Chen
- National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007, China
| | - Frederic A Troy
- Department of Biochemistry and Molecular Medicine, University of California School of Medicine, Davis, CL, United States
| | - Guo-Ping Zhou
- National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007, China.,Gordon Life Science Institute, 53 South Cottage Road Belmont, MA 02478, United States
| | - Ri-Bo Huang
- Life Science and Technology College, Guangxi University, Nanning, Guangxi, 530004 China; 2Institute of Biophysics, Chinese Academy of Sciences, Beijing, China.,National Engineering Research Center for Non-food Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007, China
| |
Collapse
|
30
|
|
31
|
Xiao X, Cheng X, Chen G, Mao Q, Chou KC. pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset. Med Chem 2019; 15:496-509. [DOI: 10.2174/1573406415666181217114710] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 12/17/2022]
Abstract
Background/Objective:Knowledge of protein subcellular localization is vitally important for both basic research and drug development. Facing the avalanche of protein sequences emerging in the post-genomic age, it is urgent to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called “pLoc-mVirus” was developed for identifying the subcellular localization of virus proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, known as “multiplex proteins”, may simultaneously occur in, or move between two or more subcellular location sites. Despite the fact that it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mVirus was trained by an extremely skewed dataset in which some subset was over 10 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset.Methods:Using the Chou's general PseAAC (Pseudo Amino Acid Composition) approach and the IHTS (Inserting Hypothetical Training Samples) treatment to balance out the training dataset, we have developed a new predictor called “pLoc_bal-mVirus” for predicting the subcellular localization of multi-label virus proteins.Results:Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mVirus, the existing state-of-theart predictor for the same purpose.Conclusion:Its user-friendly web-server is available at http://www.jci-bioinfo.cn/pLoc_balmVirus/, by which the majority of experimental scientists can easily get their desired results without the need to go through the detailed complicated mathematics. Accordingly, pLoc_bal-mVirus will become a very useful tool for designing multi-target drugs and in-depth understanding of the biological process in a cell.
Collapse
Affiliation(s)
- Xuan Xiao
- Gordon Life Science Institute, Boston, MA 02478, United States
| | - Xiang Cheng
- Gordon Life Science Institute, Boston, MA 02478, United States
| | - Genqiang Chen
- College of Chemistry, Chemical Engineering and Biotechnology, Donghua University, Shanghai 201620, China
| | - Qi Mao
- College of Information Science and Technology, Donghua University, Shanghai, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
32
|
Yang W, Zhu XJ, Huang J, Ding H, Lin H. A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181113131415] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Background:The location of proteins in a cell can provide important clues to their functions in various biological processes. Thus, the application of machine learning method in the prediction of protein subcellular localization has become a hotspot in bioinformatics. As one of key organelles, the Golgi apparatus is in charge of protein storage, package, and distribution.Objective:The identification of protein location in Golgi apparatus will provide in-depth insights into their functions. Thus, the machine learning-based method of predicting protein location in Golgi apparatus has been extensively explored. The development of protein sub-Golgi apparatus localization prediction should be reviewed for providing a whole background for the fields.Method:The benchmark dataset, feature extraction, machine learning method and published results were summarized.Results:We briefly introduced the recent progresses in protein sub-Golgi apparatus localization prediction using machine learning methods and discussed their advantages and disadvantages.Conclusion:We pointed out the perspective of machine learning methods in protein sub-Golgi localization prediction.
Collapse
Affiliation(s)
- Wuritu Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| | - Xiao-Juan Zhu
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| | - Jian Huang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
| |
Collapse
|
33
|
Zhang J, Liu B. A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods. Curr Bioinform 2019. [DOI: 10.2174/1574893614666181212102749] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Proteins play a crucial role in life activities, such as catalyzing metabolic reactions, DNA replication, responding to stimuli, etc. Identification of protein structures and functions are critical for both basic research and applications. Because the traditional experiments for studying the structures and functions of proteins are expensive and time consuming, computational approaches are highly desired. In key for computational methods is how to efficiently extract the features from the protein sequences. During the last decade, many powerful feature extraction algorithms have been proposed, significantly promoting the development of the studies of protein structures and functions.Objective:To help the researchers to catch up the recent developments in this important field, in this study, an updated review is given, focusing on the sequence-based feature extractions of protein sequences.Method:These sequence-based features of proteins were grouped into three categories, including composition-based features, autocorrelation-based features and profile-based features. The detailed information of features in each group was introduced, and their advantages and disadvantages were discussed. Besides, some useful tools for generating these features will also be introduced.Results:Generally, autocorrelation-based features outperform composition-based features, and profile-based features outperform autocorrelation-based features. The reason is that profile-based features consider the evolutionary information, which is useful for identification of protein structures and functions. However, profile-based features are more time consuming, because the multiple sequence alignment process is required.Conclusion:In this study, some recently proposed sequence-based features were introduced and discussed, such as basic k-mers, PseAAC, auto-cross covariance, top-n-gram etc. These features did make great contributions to the developments of protein sequence analysis. Future studies can be focus on exploring the combinations of these features. Besides, techniques from other fields, such as signal processing, natural language process (NLP), image processing etc., would also contribute to this important field, because natural languages (such as English) and protein sequences share some similarities. Therefore, the proteins can be treated as documents, and the features, such as k-mers, top-n-grams, motifs, can be treated as the words in the languages. Techniques from these filed will give some new ideas and strategies for extracting the features from proteins.
Collapse
Affiliation(s)
- Jun Zhang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong 518055, China
| |
Collapse
|
34
|
Fu X, Zhu W, Cai L, Liao B, Peng L, Chen Y, Yang J. Improved Pre-miRNAs Identification Through Mutual Information of Pre-miRNA Sequences and Structures. Front Genet 2019; 10:119. [PMID: 30858864 PMCID: PMC6397858 DOI: 10.3389/fgene.2019.00119] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 02/04/2019] [Indexed: 11/30/2022] Open
Abstract
Playing critical roles as post-transcriptional regulators, microRNAs (miRNAs) are a family of short non-coding RNAs that are derived from longer transcripts called precursor miRNAs (pre-miRNAs). Experimental methods to identify pre-miRNAs are expensive and time-consuming, which presents the need for computational alternatives. In recent years, the accuracy of computational methods to predict pre-miRNAs has been increasing significantly. However, there are still several drawbacks. First, these methods usually only consider base frequencies or sequence information while ignoring the information between bases. Second, feature extraction methods based on secondary structures usually only consider the global characteristics while ignoring the mutual influence of the local structures. Third, methods integrating high-dimensional feature information is computationally inefficient. In this study, we have proposed a novel mutual information-based feature representation algorithm for pre-miRNA sequences and secondary structures, which is capable of catching the interactions between sequence bases and local features of the RNA secondary structure. In addition, the feature space is smaller than that of most popular methods, which makes our method computationally more efficient than the competitors. Finally, we applied these features to train a support vector machine model to predict pre-miRNAs and compared the results with other popular predictors. As a result, our method outperforms others based on both 5-fold cross-validation and the Jackknife test.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Jialiang Yang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
35
|
Adaboost-SVM-based probability algorithm for the prediction of all mature miRNA sites based on structured-sequence features. Sci Rep 2019; 9:1521. [PMID: 30728425 PMCID: PMC6365589 DOI: 10.1038/s41598-018-38048-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/18/2018] [Indexed: 02/07/2023] Open
Abstract
The significant role of microRNAs (miRNAs) in various biological processes and diseases has been widely studied and reported in recent years. Several computational methods associated with mature miRNA identification suffer various limitations involving canonical biological features extraction, class imbalance, and classifier performance. The proposed classifier, miRFinder, is an accurate alternative for the identification of mature miRNAs. The structured-sequence features were proposed to precisely extract miRNA biological features, and three algorithms were selected to obtain the canonical features based on the classifier performance. Moreover, the center of mass near distance training based on K-means was provided to improve the class imbalance problem. In particular, the AdaBoost-SVM algorithm was used to construct the classifier. The classifier training process focuses on incorrectly classified samples, and the integrated results use the common decision strategies of the weak classifier with different weights. In addition, the all mature miRNA sites were predicted by different classifiers based on the features of different sites. Compared with other methods, the performance of the classifiers has a high degree of efficacy for the identification of mature miRNAs. MiRFinder is freely available at https://github.com/wangying0128/miRFinder .
Collapse
|
36
|
Jia J, Li X, Qiu W, Xiao X, Chou KC. iPPI-PseAAC(CGR): Identify protein-protein interactions by incorporating chaos game representation into PseAAC. J Theor Biol 2019; 460:195-203. [DOI: 10.1016/j.jtbi.2018.10.021] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 09/16/2018] [Accepted: 10/08/2018] [Indexed: 01/11/2023]
|
37
|
Ma Y, Yu Z, Han G, Li J, Anh V. Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs. BMC Bioinformatics 2018; 19:521. [PMID: 30598066 PMCID: PMC6311913 DOI: 10.1186/s12859-018-2518-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information. RESULTS We use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA -NHPred. The performance of MicroRNA -NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets. CONCLUSIONS The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA -NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred .
Collapse
Affiliation(s)
- Yuanlin Ma
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
| | - Zuguo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
- School of Electrical Engineering and Computer Science, Queensland University of Technology, GPO Box 2434, Brisbane, Q4001 Australia
| | - Guosheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
| | - Jinyan Li
- Advanced Analytics Institute, Faculty of Engineering & IT, University of Technology Sydney, P.O Box 123, Broadway, NSW 2007 Australia
| | - Vo Anh
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan, 411105 China
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, Q4001 Australia
| |
Collapse
|
38
|
Zhang W, Yue X, Tang G, Wu W, Huang F, Zhang X. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions. PLoS Comput Biol 2018; 14:e1006616. [PMID: 30533006 PMCID: PMC6331124 DOI: 10.1371/journal.pcbi.1006616] [Citation(s) in RCA: 108] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 01/14/2019] [Accepted: 11/02/2018] [Indexed: 01/12/2023] Open
Abstract
LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. Existing computational methods utilize multiple lncRNA features or multiple protein features to predict lncRNA-protein interactions, but features are not available for all lncRNAs or proteins; most of existing methods are not capable of predicting interacting proteins (or lncRNAs) for new lncRNAs (or proteins), which don’t have known interactions. In this paper, we propose the sequence-based feature projection ensemble learning method, “SFPEL-LPI”, to predict lncRNA-protein interactions. First, SFPEL-LPI extracts lncRNA sequence-based features and protein sequence-based features. Second, SFPEL-LPI calculates multiple lncRNA-lncRNA similarities and protein-protein similarities by using lncRNA sequences, protein sequences and known lncRNA-protein interactions. Then, SFPEL-LPI combines multiple similarities and multiple features with a feature projection ensemble learning frame. In computational experiments, SFPEL-LPI accurately predicts lncRNA-protein associations and outperforms other state-of-the-art methods. More importantly, SFPEL-LPI can be applied to new lncRNAs (or proteins). The case studies demonstrate that our method can find out novel lncRNA-protein interactions, which are confirmed by literature. Finally, we construct a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/. LncRNA-protein interactions play important roles in post-transcriptional gene regulation, poly-adenylation, splicing and translation. Identification of lncRNA-protein interactions helps to understand lncRNA-related activities. In this paper, we propose a novel computational method “SFPEL-LPI” to predict lncRNA-protein interactions. SFPEL-LPI makes use of lncRNA sequences, protein sequences and known lncRNA-protein associations to extract features and calculate similarities for lncRNAs and proteins, and then combines them with a feature projection ensemble learning frame. SFPEL-LPI can predict unobserved interactions between lncRNAs and proteins, and also can make predictions for new lncRNAs (or proteins), which have no interactions with any proteins (or lncRNAs). SFPEL-LPI produces high-accuracy performances on the benchmark dataset when evaluated by five-fold cross validation, and outperforms state-of-the-art methods. The case studies demonstrate that SFPEL-LPI can find out novel associations, which are confirmed by literature. To facilitate the lncRNA-protein interaction prediction, we develop a user-friendly web server, available at http://www.bioinfotech.cn/SFPEL-LPI/.
Collapse
Affiliation(s)
- Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, China
- School of Computer Science, Wuhan University, Wuhan, China
- * E-mail: , (WZ); (XZ)
| | - Xiang Yue
- Department of Computer Science and Engineering, The Ohio State University, Columbus, United States of America
| | - Guifeng Tang
- School of Computer Science, Wuhan University, Wuhan, China
| | - Wenjian Wu
- Electronic Information School, Wuhan University, Wuhan, China
| | - Feng Huang
- School of Computer Science, Wuhan University, Wuhan, China
| | - Xining Zhang
- School of Computer Science, Wuhan University, Wuhan, China
- * E-mail: , (WZ); (XZ)
| |
Collapse
|
39
|
A three-lncRNA expression signature predicts survival in head and neck squamous cell carcinoma (HNSCC). Biosci Rep 2018; 38:BSR20181528. [PMID: 30355656 PMCID: PMC6246764 DOI: 10.1042/bsr20181528] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 10/08/2018] [Accepted: 10/19/2018] [Indexed: 12/19/2022] Open
Abstract
Increasing evidence has shown that long non-coding RNAs (lncRNAs) have important biological functions and can be used as a prognostic biomarker in human cancers. However, investigation of the prognostic value of lncRNAs in head and neck squamous cell carcinoma (HNSCC) is in infancy. In the present study, we analyzed the lncRNA expression data in a large number of HNSCC patients (n=425) derived from The Cancer Genome Atlas (TCGA) to identify an lncRNA expression signature for improving the prognosis of HNSCC. Three lncRNAs are identified to be significantly associated with survival in the training dataset using Cox regression analysis. Three lncRNAs were integrated to construct an lncRNA expression signature that could stratify patients of training dataset into the high-risk group and low-risk group with significantly different survival time (median survival 1.85 years vs. 5.48 years; P=0.0018, log-rank test). The prognostic value of this three-lncRNA signature was confirmed in the testing and entire datasets, respectively. Further analysis revealed that the prognostic power of three-lncRNA signature was independent of clinical features by multivariate Cox regression and stratified analysis. These three lncRNAs were significantly associated with known genetic and epigenetic events by means of functional enrichment analysis. Therefore, our results indicated that the three-lncRNA expression signature can predict HNSCC patients’ survival.
Collapse
|
40
|
Mei J, Fu Y, Zhao J. Analysis and prediction of ion channel inhibitors by using feature selection and Chou's general pseudo amino acid composition. J Theor Biol 2018; 456:41-48. [DOI: 10.1016/j.jtbi.2018.07.040] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 07/20/2018] [Accepted: 07/29/2018] [Indexed: 12/23/2022]
|
41
|
Huang YA, Chan KCC, You ZH. Constructing prediction models from expression profiles for large scale lncRNA-miRNA interaction profiling. Bioinformatics 2018; 34:812-819. [PMID: 29069317 PMCID: PMC6192210 DOI: 10.1093/bioinformatics/btx672] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 10/19/2017] [Indexed: 12/14/2022] Open
Abstract
Motivation The interaction of miRNA and lncRNA is known to be important for gene regulations.
However, not many computational approaches have been developed to analyze known
interactions and predict the unknown ones. Given that there are now more evidences that
suggest that lncRNA–miRNA interactions are closely related to their relative expression
levels in the form of a titration mechanism, we analyzed the patterns in large-scale
expression profiles of known lncRNA–miRNA interactions. From these uncovered patterns,
we noticed that lncRNAs tend to interact collaboratively with miRNAs of similar
expression profiles, and vice versa. Results By representing known interaction between lncRNA and miRNA as a bipartite graph, we
propose here a technique, called EPLMI, to construct a prediction model from such a
graph. EPLMI performs its tasks based on the assumption that lncRNAs that are highly
similar to each other tend to have similar interaction or non-interaction patterns with
miRNAs and vice versa. The effectiveness of the prediction model so constructed has been
evaluated using the latest dataset of lncRNA–miRNA interactions. The results show that
the prediction model can achieve AUCs of 0.8522 and 0.8447 ± 0.0017 based on
leave-one-out cross validation and 5-fold cross validation. Using this model, we show
that lncRNA–miRNA interactions can be reliably predicted. We also show that we can use
it to select the most likely lncRNA targets that specific miRNAs would interact with. We
believe that the prediction models discovered by EPLMI can yield great insights for
further research on ceRNA regulation network. To the best of our knowledge, EPLMI is the
first technique that is developed for large-scale lncRNA–miRNA interaction
profiling. Availability and implementation Matlab codes and dataset are available at https://github.com/yahuang1991polyu/EPLMI/. Supplementary information Supplementary data are
available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-An Huang
- Department of Computing, Hong Kong Polytechnic University, Hong Kong
| | - Keith C C Chan
- Department of Computing, Hong Kong Polytechnic University, Hong Kong
| | - Zhu-Hong You
- Department of Computing, Hong Kong Polytechnic University, Hong Kong.,Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Ürümqi 830011, China
| |
Collapse
|
42
|
Xu Y, Yang Y, Ding J, Li C. iGlu-Lys: A Predictor for Lysine Glutarylation Through Amino Acid Pair Order Features. IEEE Trans Nanobioscience 2018; 17:394-401. [DOI: 10.1109/tnb.2018.2848673] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
43
|
Chen W, Ding H, Zhou X, Lin H, Chou KC. iRNA(m6A)-PseDNC: Identifying N 6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem 2018; 561-562:59-65. [PMID: 30201554 DOI: 10.1016/j.ab.2018.09.002] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 08/31/2018] [Accepted: 09/03/2018] [Indexed: 01/28/2023]
Abstract
As a prevalent post-transcriptional modification, N6-methyladenosine (m6A) plays key roles in a series of biological processes. Although experimental technologies have been developed and applied to identify m6A sites, they are still cost-ineffective for transcriptome-wide detections of m6A. As good complements to the experimental techniques, some computational methods have been proposed to identify m6A sites. However, their performance remains unsatisfactory. In this study, we firstly proposed an Euclidean distance based method to construct a high quality benchmark dataset. By encoding the RNA sequences using pseudo nucleotide composition, a new predictor called iRNA(m6A)-PseDNC was developed to identify m6A sites in the Saccharomyces cerevisiae genome. It has been demonstrated by the 10-fold cross validation test that the performance of iRNA(m6A)-PseDNC is superior to the existing methods. Meanwhile, for the convenience of most experimental scientists, established at the site http://lin-group.cn/server/iRNA(m6A)-PseDNC.php is its web-server, by which users can easily get their desired results without need to go through the detailed mathematics. It is anticipated that iRNA(m6A)-PseDNC will become a useful high throughput tool for identifying m6A sites in the S. cerevisiae genome.
Collapse
Affiliation(s)
- Wei Chen
- School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China; Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611730, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Xu Zhou
- School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| |
Collapse
|
44
|
Akbar S, Hayat M. iMethyl-STTNC: Identification of N 6-methyladenosine sites by extending the idea of SAAC into Chou's PseAAC to formulate RNA sequences. J Theor Biol 2018; 455:205-211. [PMID: 30031793 DOI: 10.1016/j.jtbi.2018.07.018] [Citation(s) in RCA: 86] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 07/14/2018] [Accepted: 07/17/2018] [Indexed: 11/17/2022]
Abstract
N6- methyladenosine (m6A) is a vital post-transcriptional modification, which adds another layer of epigenetic regulation at RNA level. It chemically modifies mRNA that effects protein expression. RNA sequence contains many genetic code motifs (GAC). Among these codes, identification of methylated or not methylated GAC motif is highly indispensable. However, with a large number of RNA sequences generated in post-genomic era, it becomes a challenging task how to accurately and speedily characterize these sequences. In view of this, the concept of an intelligent is incorporated with a computational model that truly and fast reflects the motif of the desired classes. An intelligent computational model "iMethyl-STTNC" model is proposed for identification of methyladenosine sites in RNA. In the proposed study, four feature extraction techniques, such as; Pseudo-dinucleotide-composition, Pseudo-trinucleotide-composition, split-trinucleotide-composition, and split-tetra-nucleotides-composition (STTNC) are utilized for genuine numerical descriptors. Three different classification algorithms including probabilistic neural network, Support vector machine (SVM), and K-nearest neighbor are adopted for prediction. After examining the outcomes of prediction model on each feature spaces, SVM using STTNC feature space reported the highest accuracy of 69.84%, 91.84% on dataset1 and dataset2, respectively. The reported results show that our proposed predictor has achieved encouraging results compared to the present approaches, so far in the research. It is finally reckoned that our developed model might be beneficial for in-depth analysis of genomes and drug development.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
45
|
Pan Y, Gao H, Lin H, Liu Z, Tang L, Li S. Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree. Int J Mol Sci 2018; 19:E1779. [PMID: 29914091 PMCID: PMC6032154 DOI: 10.3390/ijms19061779] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 06/12/2018] [Accepted: 06/12/2018] [Indexed: 01/29/2023] Open
Abstract
Bacteriophages, which are tremendously important to the ecology and evolution of bacteria, play a key role in the development of genetic engineering. Bacteriophage virion proteins are essential materials of the infectious viral particles and in charge of several of biological functions. The correct identification of bacteriophage virion proteins is of great importance for understanding both life at the molecular level and genetic evolution. However, few computational methods are available for identifying bacteriophage virion proteins. In this paper, we proposed a new method to predict bacteriophage virion proteins using a Multinomial Naïve Bayes classification model based on discrete feature generated from the g-gap feature tree. The accuracy of the proposed model reaches 98.37% with MCC of 96.27% in 10-fold cross-validation. This result suggests that the proposed method can be a useful approach in identifying bacteriophage virion proteins from sequence information. For the convenience of experimental scientists, a web server (PhagePred) that implements the proposed predictor is available, which can be freely accessed on the Internet.
Collapse
Affiliation(s)
- Yanyuan Pan
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hui Gao
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Zhen Liu
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Lixia Tang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Songtao Li
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
46
|
Sabooh MF, Iqbal N, Khan M, Khan M, Maqbool HF. Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou's PseKNC. J Theor Biol 2018; 452:1-9. [PMID: 29727634 DOI: 10.1016/j.jtbi.2018.04.037] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Revised: 04/24/2018] [Accepted: 04/27/2018] [Indexed: 02/02/2023]
Abstract
This study examines accurate and efficient computational method for identification of 5-methylcytosine sites in RNA modification. The occurrence of 5-methylcytosine (m5C) plays a vital role in a number of biological processes. For better comprehension of the biological functions and mechanism it is necessary to recognize m5C sites in RNA precisely. The laboratory techniques and procedures are available to identify m5C sites in RNA, but these procedures require a lot of time and resources. This study develops a new computational method for extracting the features of RNA sequence. In this method, first the RNA sequence is encoded via composite feature vector, then, for the selection of discriminate features, the minimum-redundancy-maximum-relevance algorithm was used. Secondly, the classification method used has been based on a support vector machine by using jackknife cross validation test. The suggested method efficiently identifies m5C sites from non- m5C sites and the outcome of the suggested algorithm is 93.33% with sensitivity of 90.0 and specificity of 96.66 on bench mark datasets. The result exhibits that proposed algorithm shown significant identification performance compared to the existing computational techniques. This study extends the knowledge about the occurrence sites of RNA modification which paves the way for better comprehension of the biological uses and mechanism.
Collapse
Affiliation(s)
- M Fazli Sabooh
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| | - Mukhtaj Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Muslim Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - H F Maqbool
- University of Engineering & Technology Lahore, Pakistan
| |
Collapse
|
47
|
Khan A, Shah S, Wahid F, Khan FG, Jabeen S. Identification of microRNA precursors using reduced and hybrid features. MOLECULAR BIOSYSTEMS 2018; 13:1640-1645. [PMID: 28686281 DOI: 10.1039/c7mb00115k] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
MicroRNAs (also called miRNAs) are a group of short non-coding RNA molecules. They play a vital role in the gene expression of transcriptional and post-transcriptional processes. However, abnormality of their expression has been observed in cancer, heart diseases and nervous system disorders. Therefore for basic research and microRNA based therapy, it is imperative to separate real pre-miRNAs from false ones (hairpin sequences similar to pre-miRNA stem loops). Different conservation and machine learning methods have been applied for the identification of miRNAs. However, machine learning algorithms have gained more popularity than conservative based algorithms in terms of sensitivity and overall performance. Due to the avalanche of RNA sequences discovered in a post-genomic age, it is necessary to construct a predictor for the identification of pre-microRNAs in humans. We have developed a predictor called MicroR-Pred in which the RNA sequences are formulated by a hybrid feature vector. The novelty of the new predictor is in the use of the partial least squares technique followed by the Random Forest and SVM (Support Vector Machine) algorithms for dimension reduction and classification. The performance of the MicroR-Pred model is quite promising compared to other state-of-the-art miRNA predictors. It has achieved 88.40% and 93.90% accuracies for RF and SVM.
Collapse
Affiliation(s)
- Asad Khan
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Sajid Shah
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Fazli Wahid
- Department of Environmental Sciences COMSATS Institute of IT, Abbottabad 22060, Pakistan
| | - Fiaz Gul Khan
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| | - Saima Jabeen
- Department of Computer Science COMSATS Institute of IT, Abbottabad 22060, Pakistan.
| |
Collapse
|
48
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Chou KC. iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC. Oncotarget 2018; 7:44310-44321. [PMID: 27322424 PMCID: PMC5190098 DOI: 10.18632/oncotarget.10027] [Citation(s) in RCA: 141] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 05/29/2016] [Indexed: 12/30/2022] Open
Abstract
Protein hydroxylation is a posttranslational modification (PTM), in which a CH group in Pro (P) or Lys (K) residue has been converted into a COH group, or a hydroxyl group (−OH) is converted into an organic compound. Closely associated with cellular signaling activities, this type of PTM is also involved in some major diseases, such as stomach cancer and lung cancer. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of P or K, which ones can be hydroxylated, and which ones cannot? With the explosive growth of protein sequences in the post-genomic age, the problem has become even more urgent. To address such a problem, we have developed a predictor called iHyd-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition (PseAAC) and introducing the “Random Forest” algorithm to operate the calculation. Rigorous jackknife tests indicated that the new predictor remarkably outperformed the existing state-of-the-art prediction method for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for iHyd-PseCp has been established at http://www.jci-bioinfo.cn/iHyd-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Department of Computer Science and Bond Life Science Center, University of Missouri, Columbia, MO, USA
| | - Bi-Qian Sun
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.,Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
49
|
iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition. Oncotarget 2018; 7:69783-69793. [PMID: 27626500 PMCID: PMC5342515 DOI: 10.18632/oncotarget.11975] [Citation(s) in RCA: 157] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/06/2016] [Indexed: 02/07/2023] Open
Abstract
The initiation of replication is an extremely important process in DNA life cycle. Given an uncharacterized DNA sequence, can we identify where its origin of replication (ORI) is located? It is no doubt a fundamental problem in genome analysis. Particularly, with the rapid development of genome sequencing technology that results in a huge amount of sequence data, it is highly desired to develop computational methods for rapidly and effectively identifying the ORIs in these genomes. Unfortunately, by means of the existing computational methods, such as sequence alignment or kmer strategies, it could hardly achieve decent success rates. To address this problem, we developed a predictor called “iOri-Human”. Rigorous jackknife tests have shown that its overall accuracy and stability in identifying human ORIs are over 75% and 50%, respectively. In the predictor, it is through the pseudo nucleotide composition (an extension of pseudo amino acid composition) that 96 physicochemical properties for the 16 possible constituent dinucleotides have been incorporated to reflect the global sequence patterns in DNA as well as its local sequence patterns. Moreover, a user-friendly web-server for iOri-Human has been established at http://lin.uestc.edu.cn/server/iOri-Human.html, by which users can easily get their desired results without the need to through the complicated mathematics involved.
Collapse
|
50
|
Qiu WR, Xiao X, Xu ZC, Chou KC. iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget 2018; 7:51270-51283. [PMID: 27323404 PMCID: PMC5239474 DOI: 10.18632/oncotarget.9987] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 05/23/2016] [Indexed: 11/26/2022] Open
Abstract
Protein phosphorylation is a posttranslational modification (PTM or PTLM), where a phosphoryl group is added to the residue(s) of a protein molecule. The most commonly phosphorylated amino acids occur at serine (S), threonine (T), and tyrosine (Y). Protein phosphorylation plays a significant role in a wide range of cellular processes; meanwhile its dysregulation is also involved with many diseases. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of S, T, or Y, which ones can be phosphorylated, and which ones cannot? To address this problem, we have developed a predictor called iPhos-PseEn by fusing four different pseudo component approaches (amino acids’ disorder scores, nearest neighbor scores, occurrence frequencies, and position weights) into an ensemble classifier via a voting system. Rigorous cross-validations indicated that the proposed predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iPhos-PseEn has been established at http://www.jci-bioinfo.cn/iPhos-PseEn, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Department of Computer Science and Bond Life Science Center, University of Missouri, Columbia, MO, USA
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.,Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|