251
|
Liu B, Wu H, Chou KC. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.94007] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
252
|
He W, Jia C. EnhancerPred2.0: predicting enhancers and their strength based on position-specific trinucleotide propensity and electron–ion interaction potential feature selection. MOLECULAR BIOSYSTEMS 2017; 13:767-774. [DOI: 10.1039/c7mb00054e] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Enhancers arecis-acting elements that play major roles in upregulating eukaryotic gene expression by providing binding sites for transcription factors and their complexes.
Collapse
Affiliation(s)
- Wenying He
- Department of Mathematics
- Dalian Maritime University
- Dalian 116026
- China
| | - Cangzhi Jia
- Department of Mathematics
- Dalian Maritime University
- Dalian 116026
- China
| |
Collapse
|
253
|
Chen W, Lin H. Recent Advances in Identification of RNA Modifications. Noncoding RNA 2016; 3:ncrna3010001. [PMID: 29657273 PMCID: PMC5831996 DOI: 10.3390/ncrna3010001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 12/19/2016] [Accepted: 12/23/2016] [Indexed: 12/18/2022] Open
Abstract
RNA modifications are involved in a broad spectrum of biological and physiological processes. To reveal the functions of RNA modifications, it is important to accurately predict their positions. Although high-throughput experimental techniques have been proposed, they are cost-ineffective. As good complements of experiments, many computational methods have been proposed to predict RNA modification sites in recent years. In this review, we will summarize the existing computational approaches directed at predicting RNA modification sites. We will also discuss the challenges and future perspectives in developing reliable methods for predicting RNA modification sites.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
254
|
Jia C, He W. EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Sci Rep 2016; 6:38741. [PMID: 27941893 PMCID: PMC5150536 DOI: 10.1038/srep38741] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 11/11/2016] [Indexed: 12/31/2022] Open
Abstract
Enhancers are cis elements that play an important role in regulating gene expression by enhancing it. Recent study of modifications revealed that enhancers are a large group of functional elements with many different subgroups, which have different biological activities and regulatory effects on target genes. As powerful auxiliary tools, several computational methods have been proposed to distinguish enhancers from other regulatory elements, but only one method has been considered to clustering them into subgroups. In this study, we developed a predictor (called EnhancerPred) to distinguish between enhancers and nonenhancers and to determine enhancers' strength. A two-step wrapper-based feature selection method was applied in high dimension feature vector from bi-profile Bayes and pseudo-nucleotide composition. Finally, the combination of 104 features from bi-profile Bayes, 1 feature from nucleotide composition and 9 features from pseudo-nucleotide composition yielded the best performance for identifying enhancers and nonenhancers, with overall Acc of 77.39%. The combination of 89 features from bi-profile Bayes and 10 features from pseudo-nucleotide composition yielded the best performance for identifying strong and weak enhancers, with overall Acc of 68.19%. The process and steps of feature optimization illustrated that it is necessary to construct a particular model for identifying strong enhancers and weak enhancers.
Collapse
Affiliation(s)
- Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Wenying He
- Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| |
Collapse
|
255
|
Wei L, Bowen Z, Zhiyong C, Gao X, Liao M. Exploring local discriminative information from evolutionary profiles for cytokine–receptor interaction prediction. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.02.078] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
256
|
Behbahani M, Mohabatkar H, Nosrati M. Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou’s general pseudo amino acid composition. J Theor Biol 2016; 411:1-5. [DOI: 10.1016/j.jtbi.2016.09.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Revised: 07/27/2016] [Accepted: 09/01/2016] [Indexed: 02/02/2023]
|
257
|
Cai L, Yuan W, Zhang Z, He L, Chou KC. In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data. Sci Rep 2016; 6:36540. [PMID: 27874022 PMCID: PMC5118795 DOI: 10.1038/srep36540] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 10/17/2016] [Indexed: 12/26/2022] Open
Abstract
Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.
Collapse
Affiliation(s)
- Lei Cai
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China.,Gordon Life Science Institute, Boston, Massachusetts, 02478, USA
| | - Wei Yuan
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Zhou Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China.,Institute of Biliary Tract Disease, Xinhua Hospital, Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China
| | - Lin He
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China.,Women's Hospital School Of Medicine Zhejiang University, Hangzhou, 310006, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts, 02478, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| |
Collapse
|
258
|
Amiri S, Dinov ID. Comparison of genomic data via statistical distribution. J Theor Biol 2016; 407:318-327. [PMID: 27460589 PMCID: PMC5361063 DOI: 10.1016/j.jtbi.2016.07.032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 06/22/2016] [Accepted: 07/20/2016] [Indexed: 11/28/2022]
Abstract
Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to measuring similarity and capturing rearrangements of large segments contained in the genome. This work is devoted to illustrating different methods recently introduced for quantifying sequence distances and variability. Most of the alignment-free methods rely on counting words, which are small contiguous fragments of the genome. Our approach considers the locations of nucleotides in the sequences and relies more on appropriate statistical distributions. The results of this technique for comparing sequences, by extracting information and comparing matching fidelity and location regularization information, are very encouraging, specifically to classify mutation sequences.
Collapse
Affiliation(s)
- Saeid Amiri
- University of Wisconsin-Green Bay, Department of Natural and Applied Sciences, Green Bay, WI, USA.
| | - Ivo D Dinov
- Statistics Online Computational Resource (SOCR), Michigan Institute for Data Science (MIDAS), School of Nursing, University of Michigan, Ann Arbor, MI 49109, USA.
| |
Collapse
|
259
|
Predicting the Organelle Location of Noncoding RNAs Using Pseudo Nucleotide Compositions. Interdiscip Sci 2016; 9:540-544. [PMID: 27739055 DOI: 10.1007/s12539-016-0193-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Revised: 09/28/2016] [Accepted: 10/06/2016] [Indexed: 11/27/2022]
Abstract
Noncoding RNAs (ncRNAs) are implicated in various biological processes. Recent findings have demonstrated that the function of ncRNAs correlates with their provenance. Therefore, the recognition of ncRNAs from different organelle genomes will be helpful to understand their molecular functions. However, the weakness of experimental techniques limits the progress toward studying organellar ncRNAs and their functional relevance. As a complement of experiments, computational method provides an important choice to identify ncRNA in different organelles. Thus, a computational model was developed to identify ncRNAs from kinetoplast and mitochondrion organelle genomes. In this model, RNA sequences are encoded by "pseudo dinucleotide composition." It was observed by the jackknife test that the overall success rate achieved by the proposed model was 90.08 %. We hope that the proposed method will be helpful in predicting ncRNA organellar locations.
Collapse
|
260
|
Chen W, Feng P, Ding H, Lin H. PAI: Predicting adenosine to inosine editing sites by using pseudo nucleotide compositions. Sci Rep 2016; 6:35123. [PMID: 27725762 PMCID: PMC5057124 DOI: 10.1038/srep35123] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 09/20/2016] [Indexed: 12/24/2022] Open
Abstract
The adenosine to inosine (A-to-I) editing is the most prevalent kind of RNA editing and involves in many biological processes. Accurate identification of A-to-I editing site is invaluable for better understanding its biological functions. Due to the limitations of experimental methods, in the present study, a support vector machine based-model, called PAI, is proposed to identify A-to-I editing site in D. melanogaster. In this model, RNA sequences are encoded by "pseudo dinucleotide composition" into which six RNA physiochemical properties were incorporated. PAI achieves promising performances in jackknife test and independent dataset test, indicating that it holds very high potential to become a useful tool for identifying A-to-I editing site. For the convenience of experimental scientists, a web-server was constructed for PAI and it is freely accessible at http://lin.uestc.edu.cn/server/PAI.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
261
|
Fan GL, Liu YL, Wang H. Identification of thermophilic proteins by incorporating evolutionary and acid dissociation information into Chou's general pseudo amino acid composition. J Theor Biol 2016; 407:138-142. [DOI: 10.1016/j.jtbi.2016.07.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 06/24/2016] [Accepted: 07/07/2016] [Indexed: 10/21/2022]
|
262
|
Characterize the relationship between essential and TATA-containing genes for S. cerevisiae by network topologies in the perturbation sensitivity network. Genomics 2016; 108:177-183. [PMID: 27613113 DOI: 10.1016/j.ygeno.2016.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2016] [Revised: 09/01/2016] [Accepted: 09/01/2016] [Indexed: 01/11/2023]
|
263
|
Liu B, Liu Y, Jin X, Wang X, Liu B. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Sci Rep 2016; 6:33483. [PMID: 27641752 PMCID: PMC5027590 DOI: 10.1038/srep33483] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 08/25/2016] [Indexed: 01/01/2023] Open
Abstract
Meiotic recombination presents an uneven distribution across the genome. Genomic regions that exhibit at relatively high frequencies of recombination are called hotspots, whereas those with relatively low frequencies of recombination are called coldspots. Therefore, hotspots and coldspots would provide useful information for the study of the mechanism of recombination. In this study, we proposed a computational predictor called iRSpot-DACC to predict hot/cold spots across the yeast genome. It combined Support Vector Machines (SVMs) and a feature called dinucleotide-based auto-cross covariance (DACC), which is able to incorporate the global sequence-order information and fifteen local DNA properties into the predictor. Combined with Principal Component Analysis (PCA), its performance was further improved. Experimental results on a benchmark dataset showed that iRSpot-DACC can achieve an accuracy of 82.7%, outperforming some highly related methods.
Collapse
Affiliation(s)
- Bingquan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150080, China
| | - Yumeng Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Xiaopeng Jin
- School of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang 150001, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.,Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.,Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| |
Collapse
|
264
|
Li D, Luo L, Zhang W, Liu F, Luo F. A genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs. BMC Bioinformatics 2016; 17:329. [PMID: 27578422 PMCID: PMC5006569 DOI: 10.1186/s12859-016-1206-3] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 08/24/2016] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Predicting piwi-interacting RNA (piRNA) is an important topic in the small non-coding RNAs, which provides clues for understanding the generation mechanism of gamete. To the best of our knowledge, several machine learning approaches have been proposed for the piRNA prediction, but there is still room for improvements. RESULTS In this paper, we develop a genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs. We construct datasets for three species: Human, Mouse and Drosophila. For each species, we compile the balanced dataset and imbalanced dataset, and thus obtain six datasets to build and evaluate prediction models. In the computational experiments, the genetic algorithm-based weighted ensemble method achieves 10-fold cross validation AUC of 0.932, 0.937 and 0.995 on the balanced Human dataset, Mouse dataset and Drosophila dataset, respectively, and achieves AUC of 0.935, 0.939 and 0.996 on the imbalanced datasets of three species. Further, we use the prediction models trained on the Mouse dataset to identify piRNAs of other species, and the models demonstrate the good performances in the cross-species prediction. CONCLUSIONS Compared with other state-of-the-art methods, our method can lead to better performances. In conclusion, the proposed method is promising for the transposon-derived piRNA prediction. The source codes and datasets are available in https://github.com/zw9977129/piRNAPredictor .
Collapse
Affiliation(s)
- Dingfang Li
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072 China
| | - Longqiang Luo
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072 China
| | - Wen Zhang
- State Key Lab of Software Engineering, Wuhan University, Wuhan, 430072 China
- School of Computer, Wuhan University, Wuhan, 430072 China
| | - Feng Liu
- International School of Software, Wuhan University, Wuhan, 430072 China
| | - Fei Luo
- State Key Lab of Software Engineering, Wuhan University, Wuhan, 430072 China
- School of Computer, Wuhan University, Wuhan, 430072 China
| |
Collapse
|
265
|
ProFold: Protein Fold Classification with Additional Structural Features and a Novel Ensemble Classifier. BIOMED RESEARCH INTERNATIONAL 2016; 2016:6802832. [PMID: 27660761 PMCID: PMC5021882 DOI: 10.1155/2016/6802832] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 07/15/2016] [Accepted: 08/07/2016] [Indexed: 11/17/2022]
Abstract
Protein fold classification plays an important role in both protein functional analysis and drug design. The number of proteins in PDB is very large, but only a very small part is categorized and stored in the SCOPe database. Therefore, it is necessary to develop an efficient method for protein fold classification. In recent years, a variety of classification methods have been used in many protein fold classification studies. In this study, we propose a novel classification method called proFold. We import protein tertiary structure in the period of feature extraction and employ a novel ensemble strategy in the period of classifier training. Compared with existing similar ensemble classifiers using the same widely used dataset (DD-dataset), proFold achieves 76.2% overall accuracy. Another two commonly used datasets, EDD-dataset and TG-dataset, are also tested, of which the accuracies are 93.2% and 94.3%, higher than the existing methods. ProFold is available to the public as a web-server.
Collapse
|
266
|
Identifying the Types of Ion Channel-Targeted Conotoxins by Incorporating New Properties of Residues into Pseudo Amino Acid Composition. BIOMED RESEARCH INTERNATIONAL 2016; 2016:3981478. [PMID: 27631006 PMCID: PMC5008028 DOI: 10.1155/2016/3981478] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 07/31/2016] [Indexed: 12/31/2022]
Abstract
Conotoxins are a kind of neurotoxin which can specifically interact with potassium, sodium type, and calcium channels. They have become potential drug candidates to treat diseases such as chronic pain, epilepsy, and cardiovascular diseases. Thus, correctly identifying the types of ion channel-targeted conotoxins will provide important clue to understand their function and find potential drugs. Based on this consideration, we developed a new computational method to rapidly and accurately predict the types of ion-targeted conotoxins. Three kinds of new properties of residues were proposed to use in pseudo amino acid composition to formulate conotoxins samples. The support vector machine was utilized as classifier. A feature selection technique based on F-score was used to optimize features. Jackknife cross-validated results showed that the overall accuracy of 94.6% was achieved, which is higher than other published results, demonstrating that the proposed method is superior to published methods. Hence the current method may play a complementary role to other existing methods for recognizing the types of ion-target conotoxins.
Collapse
|
267
|
Liu B, Wang S, Long R, Chou KC. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 2016; 33:35-41. [DOI: 10.1093/bioinformatics/btw539] [Citation(s) in RCA: 268] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 08/01/2016] [Accepted: 08/11/2016] [Indexed: 11/13/2022] Open
|
268
|
Chen W, Feng P, Tang H, Ding H, Lin H. RAMPred: identifying the N(1)-methyladenosine sites in eukaryotic transcriptomes. Sci Rep 2016; 6:31080. [PMID: 27511610 PMCID: PMC4980636 DOI: 10.1038/srep31080] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 07/12/2016] [Indexed: 12/23/2022] Open
Abstract
N(1)-methyladenosine (m(1)A) is a prominent RNA modification involved in many biological processes. Accurate identification of m(1)A site is invaluable for better understanding the biological functions of m(1)A. However, limitations in experimental methods preclude the progress towards the identification of m(1)A site. As an excellent complement of experimental methods, a support vector machine based-method called RAMPred is proposed to identify m(1)A sites in H. sapiens, M. musculus and S. cerevisiae genomes for the first time. In this method, RNA sequences are encoded by using nucleotide chemical property and nucleotide compositions. RAMPred achieves promising performances in jackknife tests, cross cell line tests and cross species tests, indicating that RAMPred holds very high potential to become a useful tool for identifying m(1)A sites. For the convenience of experimental scientists, a web-server based on the proposed model was constructed and could be freely accessible at http://lin.uestc.edu.cn/server/RAMPred.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan 063000, China
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
269
|
Periwal V. A comprehensive overview of computational resources to aid in precision genome editing with engineered nucleases. Brief Bioinform 2016; 18:698-711. [DOI: 10.1093/bib/bbw052] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Indexed: 12/26/2022] Open
|
270
|
Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.03.025] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
271
|
Jia J, Zhang L, Liu Z, Xiao X, Chou KC. pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics 2016; 32:3133-3141. [DOI: 10.1093/bioinformatics/btw387] [Citation(s) in RCA: 160] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 06/15/2016] [Indexed: 11/13/2022] Open
|
272
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Chou KC. iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics 2016; 32:3116-3123. [DOI: 10.1093/bioinformatics/btw380] [Citation(s) in RCA: 216] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 06/13/2016] [Indexed: 11/13/2022] Open
|
273
|
Improving N(6)-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. Anal Biochem 2016; 508:104-13. [PMID: 27293216 DOI: 10.1016/j.ab.2016.06.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2016] [Revised: 05/31/2016] [Accepted: 06/01/2016] [Indexed: 12/28/2022]
Abstract
N(6)-methyladenosine (m(6)A) is one of the most common and abundant post-transcriptional RNA modifications found in viruses and most eukaryotes. m(6)A plays an essential role in many vital biological processes to regulate gene expression. Because of its widespread distribution across the genomes, the identification of m(6)A sites from RNA sequences is of significant importance for better understanding the regulatory mechanism of m(6)A. Although progress has been achieved in m(6)A site prediction, challenges remain. This article aims to further improve the performance of m(6)A site prediction by introducing a new heuristic nucleotide physical-chemical property selection (HPCS) algorithm. The proposed HPCS algorithm can effectively extract an optimized subset of nucleotide physical-chemical properties under the prescribed feature representation for encoding an RNA sequence into a feature vector. We demonstrate the efficacy of the proposed HPCS algorithm under different feature representations, including pseudo dinucleotide composition (PseDNC), auto-covariance (AC), and cross-covariance (CC). Based on the proposed HPCS algorithm, we implemented an m(6)A site predictor, called M6A-HPCS, which is freely available at http://csbio.njust.edu.cn/bioinf/M6A-HPCS. Experimental results over rigorous jackknife tests on benchmark datasets demonstrated that the proposed M6A-HPCS achieves higher success rates and outperforms existing state-of-the-art sequence-based m(6)A site predictors.
Collapse
|
274
|
Application of Euclidean distance measurement and principal component analysis for gene identification. Gene 2016; 583:112-120. [DOI: 10.1016/j.gene.2016.02.015] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Revised: 11/27/2015] [Accepted: 02/07/2016] [Indexed: 11/22/2022]
|
275
|
Prediction of aptamer-protein interacting pairs using an ensemble classifier in combination with various protein sequence attributes. BMC Bioinformatics 2016; 17:225. [PMID: 27245069 PMCID: PMC4888498 DOI: 10.1186/s12859-016-1087-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 05/17/2016] [Indexed: 02/05/2023] Open
Abstract
Background Aptamer-protein interacting pairs play a variety of physiological functions and therapeutic potentials in organisms. Rapidly and effectively predicting aptamer-protein interacting pairs is significant to design aptamers binding to certain interested proteins, which will give insight into understanding mechanisms of aptamer-protein interacting pairs and developing aptamer-based therapies. Results In this study, an ensemble method is presented to predict aptamer-protein interacting pairs with hybrid features. The features for aptamers are extracted from Pseudo K-tuple Nucleotide Composition (PseKNC) while the features for proteins incorporate Discrete Cosine Transformation (DCT), disorder information, and bi-gram Position Specific Scoring Matrix (PSSM). We investigate predictive capabilities of various feature spaces. The proposed ensemble method obtains the best performance with Youden’s Index of 0.380, using the hybrid feature space of PseKNC, DCT, bi-gram PSSM, and disorder information by 10-fold cross validation. The Relief-Incremental Feature Selection (IFS) method is adopted to obtain the optimal feature set. Based on the optimal feature set, the proposed method achieves a balanced performance with a sensitivity of 0.753 and a specificity of 0.725 on the training dataset, which indicates that this method can solve the imbalanced data problem effectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous study with a sensitivity of 0.738 and a Youden’s Index of 0.451. Conclusions These results suggest that the proposed method can be a potential candidate for aptamer-protein interacting pair prediction, which may contribute to finding novel aptamer-protein interacting pairs and understanding the relationship between aptamers and proteins. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1087-5) contains supplementary material, which is available to authorized users.
Collapse
|
276
|
Qiu WR, Sun BQ, Xiao X, Xu D, Chou KC. iPhos-PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory. Mol Inform 2016; 36. [DOI: 10.1002/minf.201600010] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 04/05/2016] [Indexed: 01/04/2023]
Affiliation(s)
- Wang-Ren Qiu
- Computer Department; Jingdezhen Ceramic Institute; Jingdezhen 333403 China
- Department of Computer Science and Bond Life Science Center; University of Missouri; Columbia, MO USA
| | - Bi-Qian Sun
- Computer Department; Jingdezhen Ceramic Institute; Jingdezhen 333403 China
| | - Xuan Xiao
- Computer Department; Jingdezhen Ceramic Institute; Jingdezhen 333403 China
- Gordon Life Science Institute, Boston; Massachusetts 02478 USA
| | - Dong Xu
- Department of Computer Science and Bond Life Science Center; University of Missouri; Columbia, MO USA
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston; Massachusetts 02478 USA
- Center of Excellence in Genomic Medicine Research (CEGMR); King Abdulaziz University; Jeddah 21589 Saudi Arabia
| |
Collapse
|
277
|
Prediction of Golgi-resident protein types using general form of Chou's pseudo-amino acid compositions: Approaches with minimal redundancy maximal relevance feature selection. J Theor Biol 2016; 402:38-44. [PMID: 27155042 DOI: 10.1016/j.jtbi.2016.04.032] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 04/19/2016] [Accepted: 04/26/2016] [Indexed: 11/20/2022]
Abstract
Recently, several efforts have been made in predicting Golgi-resident proteins. However, it is still a challenging task to identify the type of a Golgi-resident protein. Precise prediction of the type of a Golgi-resident protein plays a key role in understanding its molecular functions in various biological processes. In this paper, we proposed to use a mutual information based feature selection scheme with the general form Chou's pseudo-amino acid compositions to predict the Golgi-resident protein types. The positional specific physicochemical properties were applied in the Chou's pseudo-amino acid compositions. We achieved 91.24% prediction accuracy in a jackknife test with 49 selected features. It has the best performance among all the present predictors. This result indicates that our computational model can be useful in identifying Golgi-resident protein types.
Collapse
|
278
|
Iqbal M, Hayat M. "iSS-Hyb-mRMR": Identification of splicing sites using hybrid space of pseudo trinucleotide and pseudo tetranucleotide composition. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 128:1-11. [PMID: 27040827 DOI: 10.1016/j.cmpb.2016.02.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 02/16/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND AND OBJECTIVES Gene splicing is a vital source of protein diversity. Perfectly eradication of introns and joining exons is the prominent task in eukaryotic gene expression, as exons are usually interrupted by introns. Identification of splicing sites through experimental techniques is complicated and time-consuming task. With the avalanche of genome sequences generated in the post genomic age, it remains a complicated and challenging task to develop an automatic, robust and reliable computational method for fast and effective identification of splicing sites. METHODS In this study, a hybrid model "iSS-Hyb-mRMR" is proposed for quickly and accurately identification of splicing sites. Two sample representation methods namely; pseudo trinucleotide composition (PseTNC) and pseudo tetranucleotide composition (PseTetraNC) were used to extract numerical descriptors from DNA sequences. Hybrid model was developed by concatenating PseTNC and PseTetraNC. In order to select high discriminative features, minimum redundancy maximum relevance algorithm was applied on the hybrid feature space. The performance of these feature representation methods was tested using various classification algorithms including K-nearest neighbor, probabilistic neural network, general regression neural network, and fitting network. Jackknife test was used for evaluation of its performance on two benchmark datasets S1 and S2, respectively. RESULTS The predictor, proposed in the current study achieved an accuracy of 93.26%, sensitivity of 88.77%, and specificity of 97.78% for S1, and the accuracy of 94.12%, sensitivity of 87.14%, and specificity of 98.64% for S2, respectively. CONCLUSION It is observed, that the performance of proposed model is higher than the existing methods in the literature so for; and will be fruitful in the mechanism of RNA splicing, and other research academia.
Collapse
Affiliation(s)
- Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| |
Collapse
|
279
|
Fang S, Zhang Y, Xu M, Xue C, He L, Cai L, Xing X. Identification of Damaging nsSNVs in HumanERCC2 Gene. Chem Biol Drug Des 2016; 88:441-50. [PMID: 27085493 DOI: 10.1111/cbdd.12772] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Revised: 03/29/2016] [Accepted: 03/30/2016] [Indexed: 01/05/2023]
Abstract
The hERCC2 gene is an important DNA repair molecule for initiating Cutaneous melanoma (CM). Therefore, it is advisable to study the possible functional SNVs in hERCC2. To achieve this goal, we collected total 2, 253 SNVs in hERCC2from the EMBL website, of which 303 are non-synonymous single nucleotide variants (nsSNVs). Then, SIFT and PolyPhen were used to predict the damaging nsSNVs, and four nsSNVs (rs368866996, rs377739017, rs370819591, and rs121913022) were suggested to be damaging mutations. Since I-Mutant2.0 showed a decrease in stability for the mutants containing each of the four nsSNVs, a 3D protein structure was modeled. Based on the comparison of the energy after minimization, RMSD and stabilizing residues between the native and mutant proteins' structure, rs121913022 was proposed to be the most damaging variant among the nsSNVs in hERCC2 gene by decreasing the stability of protein. The mutant G713R of hERCC2 protein caused by rs121913022 was found to have less expression level than native hERCC2 protein in melanoma cells. These results suggest that rs121913022 may have potentially important clinical and drug target implications.
Collapse
Affiliation(s)
- Shuo Fang
- Department of Plastic and Reconstruction, Shanghai Changhai Hospital Affiliated to Second Military Medical University, Shanghai, 200433, China.,Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No. 13dz2260500), Bio-X Institutes, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Yuntong Zhang
- Department of Orthopedics, Shanghai Changhai Hospital Affiliated to Second Military Medical University, Shanghai, 200433, China
| | - Miao Xu
- Department of Plastic and Reconstruction Surgery, Xinhua Hospital, Shanghai Jiaotong University, Shanghai, 200092, China
| | - Chunyu Xue
- Department of Plastic and Reconstruction, Shanghai Changhai Hospital Affiliated to Second Military Medical University, Shanghai, 200433, China
| | - Lin He
- Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No. 13dz2260500), Bio-X Institutes, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Lei Cai
- Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No. 13dz2260500), Bio-X Institutes, Shanghai Jiaotong University, Shanghai, 200240, China
| | - Xin Xing
- Department of Plastic and Reconstruction, Shanghai Changhai Hospital Affiliated to Second Military Medical University, Shanghai, 200433, China
| |
Collapse
|
280
|
Liu B, Long R, Chou KC. iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. ACTA ACUST UNITED AC 2016; 32:2411-8. [PMID: 27153623 DOI: 10.1093/bioinformatics/btw186] [Citation(s) in RCA: 174] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 04/03/2016] [Indexed: 11/13/2022]
Abstract
MOTIVATION Regulatory DNA elements are associated with DNase I hypersensitive sites (DHSs). Accordingly, identification of DHSs will provide useful insights for in-depth investigation into the function of noncoding genomic regions. RESULTS In this study, using the strategy of ensemble learning framework, we proposed a new predictor called iDHS-EL for identifying the location of DHS in human genome. It was formed by fusing three individual Random Forest (RF) classifiers into an ensemble predictor. The three RF operators were respectively based on the three special modes of the general pseudo nucleotide composition (PseKNC): (i) kmer, (ii) reverse complement kmer and (iii) pseudo dinucleotide composition. It has been demonstrated that the new predictor remarkably outperforms the relevant state-of-the-art methods in both accuracy and stability. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a web server for iDHS-EL is established at http://bioinformatics.hitsz.edu.cn/iDHS-EL, which is the first web-server predictor ever established for identifying DHSs, and by which users can easily get their desired results without the need to go through the mathematical details. We anticipate that IDHS-EL: will become a very useful high throughput tool for genome analysis. CONTACT bliu@gordonlifescience.org or bliu@insun.hit.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China Gordon Life Science Institute, Belmont, MA 02478, USA
| | - Ren Long
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA 02478, USA Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
281
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 2016; 394:223-230. [DOI: 10.1016/j.jtbi.2016.01.020] [Citation(s) in RCA: 231] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Revised: 01/06/2016] [Accepted: 01/07/2016] [Indexed: 10/22/2022]
|
282
|
Liu Z, Xiao X, Yu DJ, Jia J, Qiu WR, Chou KC. pRNAm-PC: Predicting N6-methyladenosine sites in RNA sequences via physical–chemical properties. Anal Biochem 2016; 497:60-7. [DOI: 10.1016/j.ab.2015.12.017] [Citation(s) in RCA: 225] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 12/02/2015] [Accepted: 12/23/2015] [Indexed: 11/28/2022]
|
283
|
Zhang Q, Li H, Zhao X, Zheng Y, Meng H, Jia Y, Xue H, Bo S. Analysis on the preference for sequence matching between mRNA sequences and the corresponding introns in ribosomal protein genes. J Theor Biol 2016; 392:113-21. [DOI: 10.1016/j.jtbi.2015.12.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Accepted: 12/10/2015] [Indexed: 10/22/2022]
|
284
|
iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal Biochem 2016; 497:48-56. [DOI: 10.1016/j.ab.2015.12.009] [Citation(s) in RCA: 230] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Revised: 12/02/2015] [Accepted: 12/11/2015] [Indexed: 11/18/2022]
|
285
|
Liu B, Fang L. WITHDRAWN: Identification of microRNA precursor based on gapped n-tuple structure status composition kernel. Comput Biol Chem 2016:S1476-9271(16)30036-6. [PMID: 26935400 DOI: 10.1016/j.compbiolchem.2016.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2016] [Accepted: 02/01/2016] [Indexed: 10/22/2022]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
| | - Longyun Fang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
| |
Collapse
|
286
|
An estimator for local analysis of genome based on the minimal absent word. J Theor Biol 2016; 395:23-30. [PMID: 26829314 DOI: 10.1016/j.jtbi.2016.01.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 01/17/2016] [Accepted: 01/19/2016] [Indexed: 11/22/2022]
Abstract
This study presents an alternative alignment-free relative feature analysis method based on the minimal absent word, which has potential advantages over the local alignment method in local analysis. Smooth-local-analysis-curve and similarity-distribution are constructed for a fast, efficient, and visual comparison. Moreover, when the multi-sequence-comparison is needed, the local-analysis-curves can illustrate some interesting zones.
Collapse
|
287
|
iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets. Molecules 2016; 21:E95. [PMID: 26797600 PMCID: PMC6274413 DOI: 10.3390/molecules21010095] [Citation(s) in RCA: 136] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/18/2015] [Accepted: 01/07/2016] [Indexed: 12/25/2022] Open
Abstract
Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem's essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor's web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved.
Collapse
|
288
|
Aftabi Y, Colagar AH, Mehrnejad F. An in silico approach to investigate the source of the controversial interpretations about the phenotypic results of the human AhR-gene G1661A polymorphism. J Theor Biol 2016; 393:1-15. [PMID: 26776670 DOI: 10.1016/j.jtbi.2016.01.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2015] [Revised: 12/11/2015] [Accepted: 01/01/2016] [Indexed: 12/21/2022]
Abstract
Aryl hydrocarbon receptor (AhR) acts as an enhancer binding ligand-activated intracellular receptor. Chromatin remodeling components and general transcription factors such as TATA-binding protein (TBP) are evoked on AhR-target genes by interaction with its flexible transactivation domain (TAD). AhR-G1661A single nucleotide polymorphism (SNP: rs2066853) causes an arginine to lysine substitution in the acidic sub-domain of TAD at position 554 (R554K). Although, numerous studies associate the SNP with some abnormalities such as cancer, other reliable investigations refuse the associations. Consequently, the interpretation of the phenotypic results of G1661A-transition has been controversial. In this study, an in silico analysis were performed to investigate the possible effects of the transition on AhR-mRNA, protein structure, interaction properties and modifications. The analysis revealed that the R554K substitution affects secondary structure and solvent accessibility of adjacent residues. Also, it causes to decreasing of the AhR stability; altering the hydropathy features of the local sequence and changing the pattern of the residues at the binding site of the TAD-acidic sub-domain. Generating of new sites for ubiquitination and acetylation for AhR-K554 variant respectively at positions 544 and 560 was predicted. Our findings intensify the idea that the AhR-G1661A transition may affects AhR-TAD interactions, especially with the TBP, which influence AhR-target genes expression. However, the previously reported flexibility of the modular TAD could act as an intervening factor, moderate the SNP effects and causes distinct outcomes in different individuals and tissues.
Collapse
Affiliation(s)
- Younes Aftabi
- Department of Molecular and Cell Biology, Faculty of Basic Sciences, University of Mazandaran, Babolsar, Post Code: 47416-95447, Mazandaran, Iran
| | - Abasalt Hosseinzadeh Colagar
- Department of Molecular and Cell Biology, Faculty of Basic Sciences, University of Mazandaran, Babolsar, Post Code: 47416-95447, Mazandaran, Iran.
| | - Faramarz Mehrnejad
- Department of Life Science Engineering, Faculty of New Sciences & Technologies, University of Tehran, P.O. Box: 14395-1561, Tehran, Iran
| |
Collapse
|
289
|
Jiao YS, Du PF. Predicting Golgi-resident protein types using pseudo amino acid compositions: Approaches with positional specific physicochemical properties. J Theor Biol 2015; 391:35-42. [PMID: 26702543 DOI: 10.1016/j.jtbi.2015.11.009] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Revised: 11/17/2015] [Accepted: 11/19/2015] [Indexed: 11/24/2022]
Abstract
Knowing the type of a Golgi-resident protein is an important step in understanding its molecular functions as well as its role in biological processes. In this paper, we developed a novel computational method to predict Golgi-resident protein types using positional specific physicochemical properties and analysis of variance based feature selection methods. Our method achieved 86.9% prediction accuracy in leave-one-out cross-validations with only 59 features. Our method has the potential to be applied in predicting a wide range of protein attributes.
Collapse
Affiliation(s)
- Ya-Sen Jiao
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, China
| | - Pu-Feng Du
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
290
|
Heras J, Domínguez C, Mata E, Pascual V. Surveying and benchmarking techniques to analyse DNA gel fingerprint images. Brief Bioinform 2015; 17:912-925. [PMID: 26634918 DOI: 10.1093/bib/bbv102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Revised: 10/20/2015] [Indexed: 11/13/2022] Open
Abstract
DNA fingerprinting is a genetic typing technique that allows the analysis of the genomic relatedness between samples, and the comparison of DNA patterns. The analysis of DNA gel fingerprint images usually consists of five consecutive steps: image pre-processing, lane segmentation, band detection, normalization and fingerprint comparison. In this article, we firstly survey the main methods that have been applied in the literature in each of these stages. Secondly, we focus on lane-segmentation and band-detection algorithms-as they are the steps that usually require user-intervention-and detect the seven core algorithms used for both tasks. Subsequently, we present a benchmark that includes a data set of images, the gold standards associated with those images and the tools to measure the performance of lane-segmentation and band-detection algorithms. Finally, we implement the core algorithms used both for lane segmentation and band detection, and evaluate their performance using our benchmark. As a conclusion of that study, we obtain that the average profile algorithm is the best starting point for lane segmentation and band detection.
Collapse
|
291
|
Chen W, Feng P, Ding H, Lin H, Chou KC. iRNA-Methyl: Identifying N6-methyladenosine sites using pseudo nucleotide composition. Anal Biochem 2015; 490:26-33. [DOI: 10.1016/j.ab.2015.08.021] [Citation(s) in RCA: 254] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 08/13/2015] [Accepted: 08/16/2015] [Indexed: 10/23/2022]
|
292
|
Ju Z, Cao JZ, Gu H. iLM-2L: A two-level predictor for identifying protein lysine methylation sites and their methylation degrees by incorporating K-gap amino acid pairs into Chou׳s general PseAAC. J Theor Biol 2015; 385:50-7. [DOI: 10.1016/j.jtbi.2015.07.030] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Revised: 07/06/2015] [Accepted: 07/23/2015] [Indexed: 10/23/2022]
|
293
|
Ahmad S, Kabir M, Hayat M. Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 122:165-174. [PMID: 26233307 DOI: 10.1016/j.cmpb.2015.07.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Revised: 06/21/2015] [Accepted: 07/13/2015] [Indexed: 06/04/2023]
Abstract
Heat Shock Proteins (HSPs) are the substantial ingredients for cell growth and viability, which are found in all living organisms. HSPs manage the process of folding and unfolding of proteins, the quality of newly synthesized proteins and protecting cellular homeostatic processes from environmental stress. On the basis of functionality, HSPs are categorized into six major families namely: (i) HSP20 or sHSP (ii) HSP40 or J-proteins types (iii) HSP60 or GroEL/ES (iv) HSP70 (v) HSP90 and (vi) HSP100. Identification of HSPs family and sub-family through conventional approaches is expensive and laborious. It is therefore, highly desired to establish an automatic, robust and accurate computational method for prediction of HSPs quickly and reliably. Regard, a computational model is developed for the prediction of HSPs family. In this model, protein sequences are formulated using three discrete methods namely: Split Amino Acid Composition, Pseudo Amino Acid Composition, and Dipeptide Composition. Several learning algorithms are utilized to choice the best one for high throughput computational model. Leave one out test is applied to assess the performance of the proposed model. The empirical results showed that support vector machine achieved quite promising results using Dipeptide Composition feature space. The predicted outcomes of proposed model are 90.7% accuracy for HSPs dataset and 97.04% accuracy for J-protein types, which are higher than existing methods in the literature so far.
Collapse
Affiliation(s)
- Saeed Ahmad
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| |
Collapse
|
294
|
Liu B, Fang L, Wang S, Wang X, Li H, Chou KC. Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. J Theor Biol 2015; 385:153-9. [DOI: 10.1016/j.jtbi.2015.08.025] [Citation(s) in RCA: 131] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 08/21/2015] [Accepted: 08/24/2015] [Indexed: 10/23/2022]
|
295
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. J Biomol Struct Dyn 2015; 34:1946-61. [PMID: 26375780 DOI: 10.1080/07391102.2015.1095116] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
With the explosive growth of protein sequences entering into protein data banks in the post-genomic era, it is highly demanded to develop automated methods for rapidly and effectively identifying the protein-protein binding sites (PPBSs) based on the sequence information alone. To address this problem, we proposed a predictor called iPPBS-PseAAC, in which each amino acid residue site of the proteins concerned was treated as a 15-tuple peptide segment generated by sliding a window along the protein chains with its center aligned with the target residue. The working peptide segment is further formulated by a general form of pseudo amino acid composition via the following procedures: (1) it is converted into a numerical series via the physicochemical properties of amino acids; (2) the numerical series is subsequently converted into a 20-D feature vector by means of the stationary wavelet transform technique. Formed by many individual "Random Forest" classifiers, the operation engine to run prediction is a two-layer ensemble classifier, with the 1st-layer voting out the best training data-set from many bootstrap systems and the 2nd-layer voting out the most relevant one from seven physicochemical properties. Cross-validation tests indicate that the new predictor is very promising, meaning that many important key features, which are deeply hidden in complicated protein sequences, can be extracted via the wavelets transform approach, quite consistent with the facts that many important biological functions of proteins can be elucidated with their low-frequency internal motions. The web server of iPPBS-PseAAC is accessible at http://www.jci-bioinfo.cn/iPPBS-PseAAC , by which users can easily acquire their desired results without the need to follow the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Jianhua Jia
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Zi Liu
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Xuan Xiao
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China.,c Gordon Life Science Institute , Boston , MA 02478 , USA
| | - Bingxiang Liu
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Kuo-Chen Chou
- b Center of Excellence in Genomic Medicine Research (CEGMR) , King Abdulaziz University , Jeddah 21589 , Saudi Arabia.,c Gordon Life Science Institute , Boston , MA 02478 , USA
| |
Collapse
|
296
|
Liu B, Fang L, Long R, Lan X, Chou KC. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 2015; 32:362-9. [PMID: 26476782 DOI: 10.1093/bioinformatics/btv604] [Citation(s) in RCA: 274] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 10/12/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Enhancers are of short regulatory DNA elements. They can be bound with proteins (activators) to activate transcription of a gene, and hence play a critical role in promoting gene transcription in eukaryotes. With the avalanche of DNA sequences generated in the post-genomic age, it is a challenging task to develop computational methods for timely identifying enhancers from extremely complicated DNA sequences. Although some efforts have been made in this regard, they were limited at only identifying whether a query DNA element being of an enhancer or not. According to the distinct levels of biological activities and regulatory effects on target genes, however, enhancers should be further classified into strong and weak ones in strength. RESULTS In view of this, a two-layer predictor called ' IENHANCER-2L: ' was proposed by formulating DNA elements with the 'pseudo k-tuple nucleotide composition', into which the six DNA local parameters were incorporated. To the best of our knowledge, it is the first computational predictor ever established for identifying not only enhancers, but also their strength. Rigorous cross-validation tests have indicated that IENHANCER-2L: holds very high potential to become a useful tool for genome analysis. AVAILABILITY AND IMPLEMENTATION For the convenience of most experimental scientists, a web server for the two-layer predictor was established at http://bioinformatics.hitsz.edu.cn/iEnhancer-2L/, by which users can easily get their desired results without the need to go through the mathematical details. CONTACT bliu@gordonlifescience.org, bliu@insun.hit.edu.cn, xlan@stanford.edu, kcchou@gordonlifescience.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Computational Biology, Gordon Life Science Institute, Belmont, MA 02478, USA
| | | | - Ren Long
- School of Computer Science and Technology
| | - Xun Lan
- Department of Genetics, Stanford University, Stanford, CA 94305, USA and
| | - Kuo-Chen Chou
- Computational Biology, Gordon Life Science Institute, Belmont, MA 02478, USA, Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
297
|
Using weighted features to predict recombination hotspots in Saccharomyces cerevisiae. J Theor Biol 2015; 382:15-22. [DOI: 10.1016/j.jtbi.2015.06.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Revised: 06/04/2015] [Accepted: 06/20/2015] [Indexed: 01/06/2023]
|
298
|
Chen W, Feng P, Ding H, Lin H, Chou KC. Benchmark data for identifying N(6)-methyladenosine sites in the Saccharomyces cerevisiae genome. Data Brief 2015; 5:376-8. [PMID: 26958595 PMCID: PMC4773366 DOI: 10.1016/j.dib.2015.09.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 08/30/2015] [Accepted: 09/10/2015] [Indexed: 11/19/2022] Open
Abstract
This data article contains the benchmark dataset for training and testing iRNA-Methyl, a web-server predictor for identifying N(6)-methyladenosine sites in RNA (Chen et al., 2015 [15]). It can also be used to develop other predictors for identifying N(6)-methyladenosine sites in the Saccharomyces cerevisiae genome.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063009, China; Gordon Life Science Institute, Belmont, MA, United States
| | - Pengmian Feng
- School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Hui Ding
- School of Public Health, North China University of Science and Technology, Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics and Center for Information in Biomedicine, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China; Gordon Life Science Institute, Belmont, MA, United States
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA, United States; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
299
|
Rodríguez DC, Ocampo M, Reyes C, Arévalo‐Pinzón G, Munoz M, Patarroyo MA, Patarroyo ME. Cell‐Peptide Specific Interaction Can Inhibit
Mycobacterium tuberculosis H37Rv
Infection. J Cell Biochem 2015; 117:946-58. [DOI: 10.1002/jcb.25379] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 09/14/2015] [Indexed: 11/10/2022]
Affiliation(s)
- Deisy Carolina Rodríguez
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Marisol Ocampo
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Cesar Reyes
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Gabriela Arévalo‐Pinzón
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Marina Munoz
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Manuel Alfonso Patarroyo
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad del RosarioBogotáColombia
| | - Manuel Elkin Patarroyo
- Fundacion Instituto de Inmunología de Colombia (FIDIC)BogotáColombia
- Universidad Nacional de ColombiaBogotáColombia
| |
Collapse
|
300
|
Rare k-mer DNA: Identification of sequence motifs and prediction of CpG island and promoter. J Theor Biol 2015; 387:88-100. [PMID: 26427337 DOI: 10.1016/j.jtbi.2015.09.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 09/10/2015] [Accepted: 09/15/2015] [Indexed: 12/20/2022]
Abstract
Empirical analysis on k-mer DNA has been proven as an effective tool in finding unique patterns in DNA sequences which can lead to the discovery of potential sequence motifs. In an extensive study of empirical k-mer DNA on hundreds of organisms, the researchers found unique multi-modal k-mer spectra occur in the genomes of organisms from the tetrapod clade only which includes all mammals. The multi-modality is caused by the formation of the two lowest modes where k-mers under them are referred as the rare k-mers. The suppression of the two lowest modes (or the rare k-mers) can be attributed to the CG dinucleotide inclusions in them. Apart from that, the rare k-mers are selectively distributed in certain genomic features of CpG Island (CGI), promoter, 5' UTR, and exon. We correlated the rare k-mers with hundreds of annotated features using several bioinformatic tools, performed further intrinsic rare k-mer analyses within the correlated features, and modeled the elucidated rare k-mer clustering feature into a classifier to predict the correlated CGI and promoter features. Our correlation results show that rare k-mers are highly associated with several annotated features of CGI, promoter, 5' UTR, and open chromatin regions. Our intrinsic results show that rare k-mers have several unique topological, compositional, and clustering properties in CGI and promoter features. Finally, the performances of our RWC (rare-word clustering) method in predicting the CGI and promoter features are ranked among the top three, in eight of the CGI and promoter evaluations, among eight of the benchmarked datasets.
Collapse
|