Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zien A, Rätsch G, Mika S, Schölkopf B, Lengauer T, Müller KR. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 2000;16:799-807. [PMID: 11108702 DOI: 10.1093/bioinformatics/16.9.799] [Citation(s) in RCA: 145] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Zien A, Rätsch G, Mika S, Schölkopf B, Lengauer T, Müller KR. Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 2000;16:799-807. [PMID: 11108702 DOI: 10.1093/bioinformatics/16.9.799] [Citation(s) in RCA: 145] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Klauschen F, Dippel J, Keyl P, Jurmeister P, Bockmayr M, Mock A, Buchstab O, Alber M, Ruff L, Montavon G, Müller KR. Toward Explainable Artificial Intelligence for Precision Pathology. ANNUAL REVIEW OF PATHOLOGY 2024;19:541-570. [PMID: 37871132 DOI: 10.1146/annurev-pathmechdis-051222-113147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]

Affiliation(s)

Frederick Klauschen Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany; Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
Jonas Dippel Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany;
Philipp Keyl Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
Philipp Jurmeister Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany; German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
Michael Bockmayr Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
Andreas Mock Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany; German Cancer Consortium, German Cancer Research Center (DKTK/DKFZ), Munich Partner Site, Munich, Germany
Oliver Buchstab Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany;
Maximilian Alber Institute of Pathology, Charité Universitätsmedizin Berlin, Berlin, Germany Aignostics, Berlin, Germany
Lukas Ruff Aignostics, Berlin, Germany
Grégoire Montavon Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany; Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
Klaus-Robert Müller Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany Machine Learning Group, Department of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany; Department of Artificial Intelligence, Korea University, Seoul, Korea Max Planck Institute for Informatics, Saarbrücken, Germany

Collapse

Nordin NI, Mustafa WA, Lola MS, Madi EN, Kamil AA, Nasution MD, K. Abdul Hamid AA, Zainuddin NH, Aruchunan E, Abdullah MT. Enhancing COVID-19 Classification Accuracy with a Hybrid SVM-LR Model. Bioengineering (Basel) 2023;10:1318. [PMID: 38002441 PMCID: PMC10669812 DOI: 10.3390/bioengineering10111318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 10/03/2023] [Accepted: 10/09/2023] [Indexed: 11/26/2023] Open

Affiliation(s)

Noor Ilanie Nordin Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia or (N.I.N.); (A.A.K.A.H.) Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Kelantan, Bukit Ilmu, Machang 18500, Kelantan, Malaysia
Wan Azani Mustafa Faculty of Electrical Engineering & Technology, Pauh Putra Campus, Universiti Malaysia Perlis (UniMAP), Arau 02600, Perlis, Malaysia Centre of Excellence for Advanced Computing, Pauh Putra Campus, Universiti Malaysia Perlis (UniMAP), Arau 02600, Perlis, Malaysia
Muhamad Safiih Lola Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia or (N.I.N.); (A.A.K.A.H.) Special Interest Group on Modeling and Data Analytics (SIGMDA), Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia
Elissa Nadia Madi Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin (UniSZA), Besut Campus, Besut 22200, Terengganu, Malaysia;
Anton Abdulbasah Kamil Faculty of Economics, Administrative and Social Sciences, Istanbul Gelisim University, Cihangir Mah. Şehit Jandarma Komando Er Hakan Öner Sk. No:1 Avcılar, İstanbul 34310, Turkey;
Marah Doly Nasution Faculty of Teacher and Education, University Muhammadiyah Sumatera Utara, Jl. Kapten Muchtar Basri No.3, Glugur Darat II, Kec. Medan Tim., Kota Medan 20238, Sumatera Utara, Indonesia;
Abdul Aziz K. Abdul Hamid Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia or (N.I.N.); (A.A.K.A.H.) Special Interest Group on Applied Informatics and Intelligent Applications (AINIA), Universiti Malaysia Terengganu, Kuala Nerus 21030, Terengganu, Malaysia
Nurul Hila Zainuddin Mathematics Department, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, Tanjong Malim 53900, Perak Darul Ridzuan, Malaysia;
Elayaraja Aruchunan Department of Decision Science, Faculty of Business and Economics, University Malaya, Kuala Lumpur 50603, Malaysia;
Mohd Tajuddin Abdullah Fellow Academy of Sciences Malaysia, Level 20, West Wing Tingkat 20, Menara MATRADE, Jalan Sultan Haji Ahmad Shah, Kuala Lumpur 50480, Malaysia;

Collapse

Ditz JC, Reuter B, Pfeifer N. Inherently interpretable position-aware convolutional motif kernel networks for biological sequencing data. Sci Rep 2023;13:17216. [PMID: 37821530 PMCID: PMC10567796 DOI: 10.1038/s41598-023-44175-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 10/04/2023] [Indexed: 10/13/2023] Open

Barbero-Aparicio JA, Cuesta-Lopez S, García-Osorio CI, Pérez-Rodríguez J, García-Pedrajas N. Nonlinear physics opens a new paradigm for accurate transcription start site prediction. BMC Bioinformatics 2022;23:565. [PMID: 36585618 PMCID: PMC9801560 DOI: 10.1186/s12859-022-05129-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022] Open

Eberle O, Buttner J, Krautli F, Muller KR, Valleriani M, Montavon G. Building and Interpreting Deep Similarity Models. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022;44:1149-1161. [PMID: 32870784 DOI: 10.1109/tpami.2020.3020738] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Jankovic B, Gojobori T. From shallow to deep: some lessons learned from application of machine learning for recognition of functional genomic elements in human genome. Hum Genomics 2022;16:7. [PMID: 35180894 PMCID: PMC8855580 DOI: 10.1186/s40246-022-00376-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 01/02/2022] [Indexed: 11/25/2022] Open

Abstract

Identification of genomic signals as indicators for functional genomic elements is one of the areas that received early and widespread application of machine learning methods. With time, the methods applied grew in variety and generally exhibited a tendency to improve their ability to identify some major genomic and transcriptomics signals. The evolution of machine learning in genomics followed a similar path to applications of machine learning in other fields. These were impacted in a major way by three dominant developments, namely an enormous increase in availability and quality of data, a significant increase in computational power available to machine learning applications, and finally, new machine learning paradigms, of which deep learning is the most well-known example. It is not easy in general to distinguish factors leading to improvements in results of applications of machine learning. This is even more so in the field of genomics, where the advent of next-generation sequencing and the increased ability to perform functional analysis of raw data have had a major effect on the applicability of machine learning in OMICS fields. In this paper, we survey the results from a subset of published work in application of machine learning in the recognition of genomic signals and regions in human genome and summarize some lessons learnt from this endeavor. There is no doubt that a significant progress has been made both in terms of accuracy and reliability of models. Questions remain however whether the progress has been sufficient and what these developments bring to the field of genomics in general and human genomics in particular. Improving usability, interpretability and accuracy of models remains an important open challenge for current and future research in application of machine learning and more generally of artificial intelligence methods in genomics.

Collapse

Vinayagam A, Othman ML, Veerasamy V, Saravan Balaji S, Ramaiyan K, Radhakrishnan P, Raman MD, Abdul Wahab NI. A random subspace ensemble classification model for discrimination of power quality events in solar PV microgrid power network. PLoS One 2022;17:e0262570. [PMID: 35085307 PMCID: PMC8794120 DOI: 10.1371/journal.pone.0262570] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 12/29/2021] [Indexed: 11/18/2022] Open

Perez-Rodriguez J, de Haro-Garcia A, Garcia-Pedrajas N. Floating Search Methodology for Combining Classification Models for Site Recognition in DNA Sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2471-2482. [PMID: 32078558 DOI: 10.1109/tcbb.2020.2974221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Keith JA, Vassilev-Galindo V, Cheng B, Chmiela S, Gastegger M, Müller KR, Tkatchenko A. Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems. Chem Rev 2021;121:9816-9872. [PMID: 34232033 PMCID: PMC8391798 DOI: 10.1021/acs.chemrev.1c00107] [Citation(s) in RCA: 186] [Impact Index Per Article: 62.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Indexed: 12/23/2022]

ActTRANS: Functional classification in active transport proteins based on transfer learning and contextual representations. Comput Biol Chem 2021;93:107537. [PMID: 34217007 DOI: 10.1016/j.compbiolchem.2021.107537] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 05/09/2021] [Accepted: 06/26/2021] [Indexed: 01/08/2023]

Abstract

MOTIVATION

Primary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence.

RESULTS

The performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance.

Collapse

Karollus A, Avsec Ž, Gagneur J. Predicting mean ribosome load for 5'UTR of any length using deep learning. PLoS Comput Biol 2021;17:e1008982. [PMID: 33970899 PMCID: PMC8136849 DOI: 10.1371/journal.pcbi.1008982] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 05/20/2021] [Accepted: 04/19/2021] [Indexed: 01/07/2023] Open

Abstract

The 5’ untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5’UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL)—a proxy for translation rate—directly from 5’UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5’UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5’UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.

The human genome carries a complex code. It consists of genes, which provide blueprints to assemble proteins, and regulatory elements, which control when, where, and how often particular genes are transcribed and translated into protein. To read the genome correctly and specifically to find the causes of inherited diseases, we need to be able to find and interpret these regulatory elements. Here, we focus on particular regions of the genome, the so-called 5’ untranslated regions, which play an important role in determining how often a transcribed gene is translated into protein. We develop deep learning models which can quantitatively interpret regulatory elements in human 5’ untranslated regions and use this information to predict a proxy of the translation efficiency. Our model generalizes a previous model to 5’ untranslated regions of any length, just as they are encountered in natural human genes. Because this model requires only the sequence as input, it can give estimates for the impact of mutations in the sequence, even if these particular mutations are very rare or entirely novel. Such estimates could help pinpoint mutations that disrupt the normal functioning of gene regulation, which could be used to better diagnose patients suffering from rare genetic disorders.

Collapse

Wei C, Zhang J, Yuan X, He Z, Liu G, Wu J. NeuroTIS: Enhancing the prediction of translation initiation sites in mRNA sequences via a hybrid dependency network and deep learning framework. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106459] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Goel N, Singh S, Aseri TC. Global sequence features based translation initiation site prediction in human genomic sequences. Heliyon 2020;6:e04825. [PMID: 32964155 PMCID: PMC7490824 DOI: 10.1016/j.heliyon.2020.e04825] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Revised: 05/25/2020] [Accepted: 08/26/2020] [Indexed: 11/26/2022] Open

Kao HJ, Nguyen VN, Huang KY, Chang WC, Lee TY. SuccSite: Incorporating Amino Acid Composition and Informative k-spaced Amino Acid Pairs to Identify Protein Succinylation Sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2020;18:208-219. [PMID: 32592791 PMCID: PMC7647693 DOI: 10.1016/j.gpb.2018.10.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 10/01/2018] [Accepted: 10/11/2018] [Indexed: 12/14/2022]

Yin T, König S. Genomic predictions of growth curves in Holstein dairy cattle based on parameter estimates from nonlinear models combined with different kernel functions. J Dairy Sci 2020;103:7222-7237. [PMID: 32534925 DOI: 10.3168/jds.2019-18010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 04/06/2020] [Indexed: 11/19/2022]

Abstract

Availability of longitudinal body weight (BW) records allows the application of nonlinear models (NLINM) to predict phenotypic and genomic growth curves in dairy cattle. In this regard, we considered a data set including 31,722 BW records from 4,952 female Holstein cattle, during the period from birth (mo 0) to approximately age at first calving (mo 24). Parameters of the growth curves were estimated using 3 NLINM: the logistic (LOG), the Gompertz (GOM), and the Richards (RICH) functions. Residuals for the growth curve parameters from the NLINM applications were used as pseudo-phenotypes in the ongoing genomic analyses with different similarity matrices, including 2 genomic relationship matrices (G1 and G2), a combined pedigree and genomic relationship matrix (H), and 3 kernel matrices. The kernels were a weighted "alike by state" kernel function (K1), an exponential dissimilarity kernel (K2), and a Gaussian kernel (K3). On the basis of G1 and G2 matrices, genomic heritabilities for the growth curve parameters birth weight (W₀), mature weight (W_m), and growth rate (k), and the shape parameter (m; only available from RICH) were moderate to large, in the range from 0.29 (m from RICH) to 0.46 (k from RICH). Fitting the similarity matrices based on kernel functions contributed to an increase of the ratio of the variance explained by the similarity matrix in relation to the total variance (compared with the heritability when modeling G1 or G2). Genetic correlations between W₀, W_m, and k were always positive (>0.30), especially for the same growth curve parameters estimated from different NLINM (>0.90). The shape parameter m from RICH was negatively correlated with other growth curve parameters, from -0.29 to -0.95. In a next step, estimated genomic breeding values for growth curve parameters were input data for the respective NLINM, aiming to construct genomic growth curves. Prediction accuracies were correlations between genomic growth curves and genomic breeding values from random regression models for sires and female cattle. Considering all genotyped female cattle with pseudo-phenotypes, prediction accuracies were larger from RICH than from LOG and GOM. However, differences in prediction accuracies from the NLINM × similarity matrix combinations were quite small. Accordingly, in 5-fold cross-validations using heifer groups with masked phenotypes, very similar prediction accuracies across modeling approaches were identified. Especially for specific age months, genomic growth curve predictions were more accurate for sires than for female cattle, indicating that the relationships between animals in training and validation sets are more important than the selection of specific NLINM × similarity matrix combinations.

Collapse

milRNApredictor: Genome-free prediction of fungi milRNAs by incorporating k-mer scheme and distance-dependent pair potential. Genomics 2019;112:2233-2240. [PMID: 31884158 DOI: 10.1016/j.ygeno.2019.12.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 12/05/2019] [Accepted: 12/25/2019] [Indexed: 11/22/2022]

Sun S, Wang C, Ding H, Zou Q. Machine learning and its applications in plant molecular studies. Brief Funct Genomics 2019;19:40-48. [DOI: 10.1093/bfgp/elz036] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/06/2019] [Accepted: 09/15/2019] [Indexed: 01/16/2023] Open

Huang KY, Hsu JBK, Lee TY. Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method. Sci Rep 2019;9:16175. [PMID: 31700141 PMCID: PMC6838336 DOI: 10.1038/s41598-019-52552-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/18/2019] [Indexed: 12/14/2022] Open

Abstract

Succinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/.

Collapse

Xu H, He L, Zhong B, Qiu J, Tu J. Classification and prediction of inertial cavitation activity induced by pulsed high-intensity focused ultrasound. ULTRASONICS SONOCHEMISTRY 2019;56:77-83. [PMID: 31101291 DOI: 10.1016/j.ultsonch.2019.03.031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2018] [Revised: 02/02/2019] [Accepted: 03/31/2019] [Indexed: 06/09/2023]

Wahba MA, Ashour AS, Guo Y, Napoleon SA, Elnaby MMA. A novel cumulative level difference mean based GLDM and modified ABCD features ranked using eigenvector centrality approach for four skin lesion types classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018;165:163-174. [PMID: 30337071 DOI: 10.1016/j.cmpb.2018.08.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2018] [Revised: 07/20/2018] [Accepted: 08/08/2018] [Indexed: 06/08/2023]

Abstract

BACKGROUND AND OBJECTIVE

Melanoma is one of the major death causes while basal cell carcinoma (BCC) is the utmost incident skin lesion type. At their early stages, medical experts may be confused between both types with benign nevus and pigmented benign keratoses (BKL). This inspired the current study to develop an accurate automated, user-friendly skin lesion identification system.

METHODS

The current work targets a novel discrimination technique of four pre-mentioned skin lesion classes. A novel proposed texture feature, named cumulative level-difference mean (CLDM) based on the gray-level difference method (GLDM) is extracted. The asymmetry, border irregularity, color variation and diameter are summed up as the ABCD rule feature vector is originally used to classify the melanoma from benign lesions. The proposed method improved the ABCD rule to also classify BCC and BKL by using the proposed modified-ABCD feature vector. In the modified set of ABCD features, each border feature, such as compact index, fractal dimension, and edge abruptness is considered a separate feature. Then, the composite feature vector having the pre-mentioned features is ranked using the Eigenvector Centrality (ECFS) feature ranking method. The ranked features are then classified by a cubic support vector machine for different numbers of selected features.

RESULTS

The proposed CLDM texture features combined with the ranked ABCD features achieved outstanding performance to classify the four targeted classes (melanoma, BCC, nevi and BKL). The results report 100% outstanding performance of the sensitivity, accuracy and specificity per each class compared to other features when using the highest seven ranked features.

CONCLUSIONS

The proposed system established that Melanoma, BCC, nevus and BKL are efficiently classified using cubic SVM with the new feature set. In addition, the comparative studies proved the superiority of the cubic SVM to classify the four classes.

Collapse

Jamalabadi H, Alizadeh S, Schönauer M, Leibold C, Gais S. Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing. PLoS Comput Biol 2018;14:e1006486. [PMID: 30260958 PMCID: PMC6177201 DOI: 10.1371/journal.pcbi.1006486] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 10/09/2018] [Accepted: 09/03/2018] [Indexed: 11/29/2022] Open

Abstract

Biological data sets are typically characterized by high dimensionality and low effect sizes. A powerful method for detecting systematic differences between experimental conditions in such multivariate data sets is multivariate pattern analysis (MVPA), particularly pattern classification. However, in virtually all applications, data from the classes that correspond to the conditions of interest are not homogeneous but contain subclasses. Such subclasses can for example arise from individual subjects that contribute multiple data points, or from correlations of items within classes. We show here that in multivariate data that have subclasses nested within its class structure, these subclasses introduce systematic information that improves classifiability beyond what is expected by the size of the class difference. We analytically prove that this subclass bias systematically inflates correct classification rates (CCRs) of linear classifiers depending on the number of subclasses as well as on the portion of variance induced by the subclasses. In simulations, we demonstrate that subclass bias is highest when between-class effect size is low and subclass variance high. This bias can be reduced by increasing the total number of subclasses. However, we can account for the subclass bias by using permutation tests that explicitly consider the subclass structure of the data. We illustrate our result in several experiments that recorded human EEG activity, demonstrating that parametric statistical tests as well as typical trial-wise permutation fail to determine significance of classification outcomes correctly.

When data are analyzed using multivariate pattern classification, any systematic similarities between subsets of trials (e.g. shared physical properties among a subgroup of stimuli, trials belonging to the same session or subject, etc.) form distinct nested subclasses within each class. Pattern classification is sensitive to this kind of structure in the data and uses such groupings to increase classification accuracies even when data from both conditions are sampled from the same distribution, i.e. the null hypothesis is true. Here, we show that the bias is higher for larger subclass variances and that it is directly related to the number of subclasses and the intraclass correlation (ICC). Because the increased classification accuracy in such data sets is not based on class differences, the null distribution should be adjusted to account for this type of bias. To do so, we propose to use blocked permutation testing on subclass levels and show that it can confine the false positive rate to the predefined α-levels.

Collapse

Bioinformatics and Translation Elongation. BIOINFORMATICS AND THE CELL 2018:197-238. [PMCID: PMC7121122 DOI: 10.1007/978-3-319-90684-3_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]

Artificial intelligence used in genome analysis studies. EUROBIOTECH JOURNAL 2018. [DOI: 10.2478/ebtj-2018-0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Abnormal neural activity as a potential biomarker for drug-naive first-episode adolescent-onset schizophrenia with coherence regional homogeneity and support vector machine analyses. Schizophr Res 2018;192:408-415. [PMID: 28476336 DOI: 10.1016/j.schres.2017.04.028] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 04/12/2017] [Accepted: 04/14/2017] [Indexed: 12/19/2022]

Liu Y, Guo W, Zhang Y, Lv L, Hu F, Wu R, Zhao J. Decreased Resting-State Interhemispheric Functional Connectivity Correlated with Neurocognitive Deficits in Drug-Naive First-Episode Adolescent-Onset Schizophrenia. Int J Neuropsychopharmacol 2017;21:33-41. [PMID: 29228204 PMCID: PMC5795351 DOI: 10.1093/ijnp/pyx095] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 10/19/2017] [Indexed: 12/11/2022] Open

Abstract

BACKGROUND

Given that adolescence is a critical epoch in the onset of schizophrenia, studying aberrant brain changes in adolescent-onset schizophrenia, particularly in patients with drug-naive first-episode schizophrenia, is important to understand the biological mechanism of this disorder. Previous resting-state functional magnetic resonance imaging studies have shown abnormal functional connectivity in separate hemispheres in patients with adult-onset schizophrenia. Our aim to study adolescent-onset schizophrenia can provide clues for the early aetiology of schizophrenia.

METHOD

A total of 48 drug-naïve, first-episode, adolescent-onset schizophrenia outpatients and 31 healthy controls underwent resting-state functional magnetic resonance imaging scans. Data were subjected to voxel-mirrored homotopic connectivity and support vector machine analyses.

RESULTS

Compared with the healthy controls, the adolescent-onset schizophrenia group showed significantly lower voxel-mirrored homotopic connectivity values in different brain regions, including the fusiform gyrus, superior temporal gyrus/insula, precentral gyrus, and precuneus. Decreased voxel-mirrored homotopic connectivity values in the superior temporal gyrus/insula were significantly correlated with Trail-Making Test: Part A performance (r = -0.437, P = .002). A combination of the voxel-mirrored homotopic connectivity values in the precentral gyrus and precuneus may be used to discriminate patients with adolescent-onset schizophrenia from controls with satisfactory classification results, which showed sensitivity of 100%, specificity of 87.09%, and accuracy of 94.93%.

CONCLUSION

Our findings highlight resting-state interhemispheric FC abnormalities within the sensorimotor network of patients with adolescent-onset schizophrenia and confirm the relationship between adolescent-onset schizophrenia and adult-onset schizophrenia. These findings suggest that reduced interhemispheric connectivity within the sensorimotor network has a pivotal role in the pathogenesis of schizophrenia.

Collapse

Affiliation(s)

Yi Liu Department of Psychiatry, the Second Xiangya Hospital, Central South University, Changsha, Hunan,Mental Health Institute of the Second Xiangya Hospital, Central South University, Changsha, Hunan, China,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China,National Technology Institute on Mental Disorders, Changsha, Hunan, China,Hunan Key Laboratory of Psychiatry and Mental Health, Changsha, Hunan, China
Wenbin Guo Department of Psychiatry, the Second Xiangya Hospital, Central South University, Changsha, Hunan,Mental Health Institute of the Second Xiangya Hospital, Central South University, Changsha, Hunan, China,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China,National Technology Institute on Mental Disorders, Changsha, Hunan, China,Hunan Key Laboratory of Psychiatry and Mental Health, Changsha, Hunan, China
Yan Zhang Henan Key Laboratory of Biological Psychiatry, Henan Mental Hospital, Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, China
Luxian Lv Henan Key Laboratory of Biological Psychiatry, Henan Mental Hospital, Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, China
Feihu Hu Department of Psychiatry, the Second Xiangya Hospital, Central South University, Changsha, Hunan,Mental Health Institute of the Second Xiangya Hospital, Central South University, Changsha, Hunan, China,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China,National Technology Institute on Mental Disorders, Changsha, Hunan, China,Hunan Key Laboratory of Psychiatry and Mental Health, Changsha, Hunan, China
Renrong Wu Department of Psychiatry, the Second Xiangya Hospital, Central South University, Changsha, Hunan,Mental Health Institute of the Second Xiangya Hospital, Central South University, Changsha, Hunan, China,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China,National Technology Institute on Mental Disorders, Changsha, Hunan, China,Hunan Key Laboratory of Psychiatry and Mental Health, Changsha, Hunan, China
Jingping Zhao Department of Psychiatry, the Second Xiangya Hospital, Central South University, Changsha, Hunan,Henan Key Laboratory of Biological Psychiatry, Henan Mental Hospital, Second Affiliated Hospital of Xinxiang Medical University, Xinxiang, China,Mental Health Institute of the Second Xiangya Hospital, Central South University, Changsha, Hunan, China,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China,National Technology Institute on Mental Disorders, Changsha, Hunan, China,Hunan Key Laboratory of Psychiatry and Mental Health, Changsha, Hunan, China,Correspondence: Jingping Zhao, MD, Department of Psychiatry, the Second Xiangya Hospital, Central South University, Changsha, Hunan 410011, China ()

Collapse

Wahba MA, Ashour AS, Napoleon SA, Abd Elnaby MM, Guo Y. Combined empirical mode decomposition and texture features for skin lesion classification using quadratic support vector machine. Health Inf Sci Syst 2017;5:10. [PMID: 29142740 DOI: 10.1007/s13755-017-0033-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 10/16/2017] [Indexed: 11/30/2022] Open

Zhang S, Hu H, Jiang T, Zhang L, Zeng J. TITER: predicting translation initiation sites by deep learning. Bioinformatics 2017;33:i234-i242. [PMID: 28881981 PMCID: PMC5870772 DOI: 10.1093/bioinformatics/btx247] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

MOTIVATION

Translation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g. GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.

METHODS

We have developed a deep learning-based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.

RESULTS

Extensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames on gene expression and the mutational effects influencing translation initiation efficiency.

AVAILABILITY AND IMPLEMENTATION

TITER is available as an open-source software and can be downloaded from https://github.com/zhangsaithu/titer .

CONTACT

lzhang20@mail.tsinghua.edu.cn or zengjy321@tsinghua.edu.cn.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Nunes Pinto CL, Nobre CN, Zárate LE. Transductive learning as an alternative to translation initiation site identification. BMC Bioinformatics 2017;18:81. [PMID: 28152994 PMCID: PMC5290616 DOI: 10.1186/s12859-017-1502-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 01/28/2017] [Indexed: 11/23/2022] Open

Abstract

Background

The correct protein coding region identification is an important and latent problem in the molecular biology field. This problem becomes a challenge due to the lack of deep knowledge about the biological systems and unfamiliarity of conservative characteristics in the messenger RNA (mRNA). Therefore, it is fundamental to research for computational methods aiming to help the patterns discovery for identification of the Translation Initiation Sites (TIS). In the field of Bioinformatics, machine learning methods have been widely applied based on the inductive inference, as Inductive Support Vector Machine (ISVM). On the other hand, not so much attention has been given to transductive inference-based machine learning methods such as Transductive Support Vector Machine (TSVM). The transductive inference performs well for problems in which the amount of unlabeled sequences is considerably greater than the labeled ones. Similarly, the problem of predicting the TIS may take advantage of transductive methods due to the fact that the amount of new sequences grows rapidly with the progress of Genome Project that allows the study of new organisms. Consequently, this work aims to investigate the transductive learning towards TIS identification and compare the results with those obtained in inductive method.

Results

The transductive inference presents better results both in F-measure and in sensitivity in comparison with the inductive method for predicting the TIS. Additionally, it presents the least failure rate for identifying the TIS, presenting a smaller number of False Negatives (FN) than the ISVM. The ISVM and TSVM methods were validated with the molecules from the most representative organisms contained in the RefSeq database: Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster and Arabidopsis thaliana. The transductive method presented F-measure and sensitivity higher than 90% and also higher than the results obtained with ISVM. The ISVM and TSVM approaches were implemented in the TransduTIS tool, TransduTIS-I and TransduTIS-T respectively, available in a web interface. These approaches were compared with the TISHunter, TIS Miner, NetStart tools, presenting satisfactory results.

Conclusions

In relation to precision, the results are similar for the ISVM and TSVM classifiers. However, the results show that the application of TSVM approach ensured an improvement, specially for F-measure and sensitivity. Moreover, it was possible to identify a potential for the application of TSVM, which is for organisms in the initial study phase with few identified sequences in the databases.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1502-6) contains supplementary material, which is available to authorized users.

Collapse

Xia X. Bioinformatics and Drug Discovery. Curr Top Med Chem 2017;17:1709-1726. [PMID: 27848897 PMCID: PMC5421137 DOI: 10.2174/1568026617666161116143440] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Revised: 09/11/2016] [Accepted: 09/21/2016] [Indexed: 02/07/2023]

Al Bataineh M, Al-qudah Z. A novel gene identification algorithm with Bayesian classification. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2016.07.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Lai CM, Yeh WC, Chang CY. Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.08.089] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Pérez-Rodríguez J, García-Pedrajas N. Stepwise approach for combining many sources of evidence for site-recognition in genomic sequences. BMC Bioinformatics 2016;17:117. [PMID: 26945666 PMCID: PMC4779560 DOI: 10.1186/s12859-016-0968-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 02/22/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Recognizing the different functional parts of genes, such as promoters, translation initiation sites, donors, acceptors and stop codons, is a fundamental task of many current studies in Bioinformatics. Currently, the most successful methods use powerful classifiers, such as support vector machines with various string kernels. However, with the rapid evolution of our ability to collect genomic information, it has been shown that combining many sources of evidence is fundamental to the success of any recognition task. With the advent of next-generation sequencing, the number of available genomes is increasing very rapidly. Thus, methods for making use of such large amounts of information are needed.

RESULTS

In this paper, we present a methodology for combining tens or even hundreds of different classifiers for an improved performance. Our approach can include almost a limitless number of sources of evidence. We can use the evidence for the prediction of sites in a certain species, such as human, or other species as needed. This approach can be used for any of the functional recognition tasks cited above. However, to provide the necessary focus, we have tested our approach in two functional recognition tasks: translation initiation site and stop codon recognition. We have used the entire human genome as a target and another 20 species as sources of evidence and tested our method on five different human chromosomes. The proposed method achieves better accuracy than the best state-of-the-art method both in terms of the geometric mean of the specificity and sensitivity and the area under the receiver operating characteristic and precision recall curves. Furthermore, our approach shows a more principled way for selecting the best genomes to be combined for a given recognition task.

CONCLUSIONS

Our approach has proven to be a powerful tool for improving the performance of functional site recognition, and it is a useful method for combining many sources of evidence for any recognition task in Bioinformatics. The results also show that the common approach of heuristically choosing the species to be used as source of evidence can be improved because the best combinations of genomes for recognition were those not usually selected. Although the experiments were performed for translation initiation site and stop codon recognition, any other recognition task may benefit from our methodology.

Collapse

Koyano H, Hayashida M, Akutsu T. Maximum margin classifier working in a set of strings. Proc Math Phys Eng Sci 2016;472:20150551. [PMID: 27118908 PMCID: PMC4841474 DOI: 10.1098/rspa.2015.0551] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Accepted: 02/02/2016] [Indexed: 11/12/2022] Open

Herndon N, Caragea D. A Study of Domain Adaptation Classifiers Derived From Logistic Regression for the Task of Splice Site Prediction. IEEE Trans Nanobioscience 2016;15:75-83. [PMID: 26849871 DOI: 10.1109/tnb.2016.2522400] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Meher PK, Sahu TK, Rao AR. Prediction of donor splice sites using random forest with a new sequence encoding approach. BioData Min 2016;9:4. [PMID: 26807151 PMCID: PMC4724119 DOI: 10.1186/s13040-016-0086-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 01/19/2016] [Indexed: 11/10/2022] Open

A Comprehensive Review of Emerging Computational Methods for Gene Identification. JOURNAL OF INFORMATION PROCESSING SYSTEMS 2016. [DOI: 10.3745/jips.04.0023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Vidovic MMC, Görnitz N, Müller KR, Rätsch G, Kloft M. SVM2Motif--Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor. PLoS One 2015;10:e0144782. [PMID: 26690911 PMCID: PMC4686957 DOI: 10.1371/journal.pone.0144782] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 11/22/2015] [Indexed: 12/02/2022] Open

Kabir M, Iqbal M, Ahmad S, Hayat M. iTIS-PseKNC: Identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition. Comput Biol Med 2015;66:252-7. [PMID: 26433457 DOI: 10.1016/j.compbiomed.2015.09.010] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2015] [Accepted: 09/14/2015] [Indexed: 10/23/2022]

Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet 2015;16:321-32. [PMID: 25948244 PMCID: PMC5204302 DOI: 10.1038/nrg3920] [Citation(s) in RCA: 806] [Impact Index Per Article: 89.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Kumar R, Srivastava A, Kumari B, Kumar M. Prediction of β-lactamase and its class by Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 2015;365:96-103. [DOI: 10.1016/j.jtbi.2014.10.008] [Citation(s) in RCA: 125] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Revised: 10/01/2014] [Accepted: 10/06/2014] [Indexed: 01/01/2023]

An improved poly(A) motifs recognition method based on decision level fusion. Comput Biol Chem 2014;54:49-56. [PMID: 25594576 DOI: 10.1016/j.compbiolchem.2014.12.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 11/27/2014] [Accepted: 12/27/2014] [Indexed: 01/07/2023]

Abstract

Polyadenylation is the process of addition of poly(A) tail to mRNA 3' ends. Identification of motifs controlling polyadenylation plays an essential role in improving genome annotation accuracy and better understanding of the mechanisms governing gene regulation. The bioinformatics methods used for poly(A) motifs recognition have demonstrated that information extracted from sequences surrounding the candidate motifs can differentiate true motifs from the false ones greatly. However, these methods depend on either domain features or string kernels. To date, methods combining information from different sources have not been found yet. Here, we proposed an improved poly(A) motifs recognition method by combing different sources based on decision level fusion. First of all, two novel prediction methods was proposed based on support vector machine (SVM): one method is achieved by using the domain-specific features and principle component analysis (PCA) method to eliminate the redundancy (PCA-SVM); the other method is based on Oligo string kernel (Oligo-SVM). Then we proposed a novel machine-learning method for poly(A) motif prediction by marrying four poly(A) motifs recognition methods, including two state-of-the-art methods (Random Forest (RF) and HMM-SVM), and two novel proposed methods (PCA-SVM and Oligo-SVM). A decision level information fusion method was employed to combine the decision values of different classifiers by applying the DS evidence theory. We evaluated our method on a comprehensive poly(A) dataset that consists of 14,740 samples on 12 variants of poly(A) motifs and 2750 samples containing none of these motifs. Our method has achieved accuracy up to 86.13%. Compared with the four classifiers, our evidence theory based method reduces the average error rate by about 30%, 27%, 26% and 16%, respectively. The experimental results suggest that the proposed method is more effective for poly(A) motif recognition.

Collapse

Meher PK, Sahu TK, Rao AR, Wahi SD. A statistical approach for 5' splice site prediction using short sequence motifs and without encoding sequence data. BMC Bioinformatics 2014;15:362. [PMID: 25420551 PMCID: PMC4702320 DOI: 10.1186/s12859-014-0362-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Accepted: 10/24/2014] [Indexed: 11/17/2022] Open

Chen W, Feng PM, Deng EZ, Lin H, Chou KC. iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem 2014;462:76-83. [PMID: 25016190 DOI: 10.1016/j.ab.2014.06.022] [Citation(s) in RCA: 218] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 06/26/2014] [Accepted: 06/27/2014] [Indexed: 01/25/2023]

Pérez-Rodríguez J, Arroyo-Peña AG, García-Pedrajas N. Improving translation initiation site and stop codon recognition by using more than two classes. Bioinformatics 2014;30:2702-8. [PMID: 24903421 DOI: 10.1093/bioinformatics/btu369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

The recognition of translation initiation sites and stop codons is a fundamental part of any gene recognition program. Currently, the most successful methods use powerful classifiers, such as support vector machines with various string kernels. These methods all use two classes, one of positive instances and another one of negative instances that are constructed using sequences from the whole genome. However, the features of the negative sequences differ depending on the position of the negative samples in the gene. There are differences depending on whether they are from exons, introns, intergenic regions or any other functional part of the genome. Thus, the positive class is fairly homogeneous, as all its sequences come from the same part of the gene, but the negative class is composed of different instances. The classifier suffers from this problem. In this article, we propose the training of different classifiers with different negative, more homogeneous, classes and the combination of these classifiers for improved accuracy.

RESULTS

The proposed method achieves better accuracy than the best state-of-the-art method, both in terms of the geometric mean of the specificity and sensitivity and the area under the receiver operating characteristic and precision recall curves. The method is tested on the whole human genome. The results for recognizing both translation initiation sites and stop codons indicated improvements in the rates of both false-negative results (FN) and false-positive results (FP). On an average, for translation initiation site recognition, the false-negative ratio was reduced by 30.2% and the FP ratio decreased by 10.9%. For stop codon prediction, FP were reduced by 41.4% and FN by 31.7%.

AVAILABILITY AND IMPLEMENTATION

The source code is licensed under the General Public License and is thus freely available. The datasets and source code can be obtained from http://cib.uco.es/site-recognition.

CONTACT

npedrajas@uco.es.

Collapse

Dameh TA, Abd-Almageed W, Hefeeda M. Distributed Kernel Matrix Approximation and Implementation Using Message Passing Interface. 2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS 2013. [DOI: 10.1109/icmla.2013.17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]

Xie HL, Fu L, Nie XD. Using ensemble SVM to identify human GPCRs N-linked glycosylation sites based on the general form of Chou's PseAAC. Protein Eng Des Sel 2013;26:735-42. [PMID: 24048266 DOI: 10.1093/protein/gzt042] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Chang CCH, Song J, Tey BT, Ramanan RN. Bioinformatics approaches for improved recombinant protein production in Escherichia coli: protein solubility prediction. Brief Bioinform 2013;15:953-62. [DOI: 10.1093/bib/bbt057] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

McEachern A, Ashlock D, Schonfeld J. Sequence classification with side effect machines evolved via ring optimization. Biosystems 2013;113:9-27. [PMID: 23603215 DOI: 10.1016/j.biosystems.2013.03.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Revised: 03/29/2013] [Accepted: 03/31/2013] [Indexed: 10/26/2022]

Xia X. Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction. SCIENTIFICA 2012;2012:917540. [PMID: 24278755 PMCID: PMC3820676 DOI: 10.6064/2012/917540] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 10/11/2012] [Indexed: 05/31/2023]

Li JL, Wang LF, Wang HY, Bai LY, Yuan ZM. High-accuracy splice site prediction based on sequence component and position features. GENETICS AND MOLECULAR RESEARCH 2012;11:3432-51. [PMID: 23079837 DOI: 10.4238/2012.september.25.12] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]