51
|
Abstract
Ever since the signal hypothesis was proposed in 1971, the exact nature of signal peptides has been a focus point of research. The prediction of signal peptides and protein subcellular location from amino acid sequences has been an important problem in bioinformatics since the dawn of this research field, involving many statistical and machine learning technologies. In this review, we provide a historical account of how position-weight matrices, artificial neural networks, hidden Markov models, support vector machines and, lately, deep learning techniques have been used in the attempts to predict where proteins go. Because the secretory pathway was the first one to be studied both experimentally and through bioinformatics, our main focus is on the historical development of prediction methods for signal peptides that target proteins for secretion; prediction methods to identify targeting signals for other cellular compartments are treated in less detail.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark.
| | - Konstantinos D Tsirigos
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Søren Brunak
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Kgs. Lyngby, Denmark
- Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Gunnar von Heijne
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Science for Life Laboratory, Stockholm University, Solna, Sweden
| |
Collapse
|
52
|
Costache R. Flash-Flood Potential assessment in the upper and middle sector of Prahova river catchment (Romania). A comparative approach between four hybrid models. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019; 659:1115-1134. [PMID: 31096326 DOI: 10.1016/j.scitotenv.2018.12.397] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 12/13/2018] [Accepted: 12/25/2018] [Indexed: 06/09/2023]
Abstract
An accurate assessment of Flash-Flood Potential for certain areas is mandatory for the improvement of flash-flood forecast and warnings. The main aim of the present study is represented by the calculation of Flash-Flood Potential Index within the upper and the middle sector of Prahova river catchment (Romania) by using 4 hybrid models: Logistic Regression-Frequency Ratio (LR-FR) model, Logistic Regression-Weights of Evidence (LR-WoE) model, Support Vector Machine-Frequency Ratio (SVM-FR) model and Support Vector Machine-Weights of Evidence (SVM-WoE). The identification of areas affected by torrential phenomena represents the first step performed in the present research. These areas with a total surface of 260 km2 were divided into training areas (70%) and validating areas (30%). By the mean of Linear Support Vector Machine (LSVM) model, 10 flash-flood conditioning factors were selected and further used for the Flash-Flood Potential assessment. Based on the spatial relationship between areas affected by torrential phenomena and flash-floods conditioning factors characteristics, the FR and WoE coefficients were calculated. In order to be integrated into Logistic Regression and Support Vector Machine (RBF) analysis, these values were standardized. According to the results of the 4 hybrid models used for FFPI calculation, the high and very high Flash-Flood Potential are spread over 33% of the study area. The model performance assessment and results validation were carried out by the mean of the three different methods: i) relative frequency distribution of torrential phenomena pixels within FFPI classes; ii) ROC Curve (Success Rate and Prediction Rate) and AUC value; iii) statistical measures represented by Sensitivity, Specificity and Accuracy.
Collapse
Affiliation(s)
- Romulus Costache
- Research Institute of the University of Bucharest, 36-46 Bd. M. Kogalniceanu, 5th District, 050107 Bucharest, Romania; National Institute of Hydrology and Water Management, București-Ploiești Road, 97E, 1st District, 013686 Bucharest, Romania.
| |
Collapse
|
53
|
Zhang Y, Allem JP, Unger JB, Boley Cruz T. Automated Identification of Hookahs (Waterpipes) on Instagram: An Application in Feature Extraction Using Convolutional Neural Network and Support Vector Machine Classification. J Med Internet Res 2018; 20:e10513. [PMID: 30452385 PMCID: PMC6282010 DOI: 10.2196/10513] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 07/30/2018] [Accepted: 08/07/2018] [Indexed: 12/24/2022] Open
Abstract
Background Instagram, with millions of posts per day, can be used to inform public health surveillance targets and policies. However, current research relying on image-based data often relies on hand coding of images, which is time-consuming and costly, ultimately limiting the scope of the study. Current best practices in automated image classification (eg, support vector machine (SVM), backpropagation neural network, and artificial neural network) are limited in their capacity to accurately distinguish between objects within images. Objective This study aimed to demonstrate how a convolutional neural network (CNN) can be used to extract unique features within an image and how SVM can then be used to classify the image. Methods Images of waterpipes or hookah (an emerging tobacco product possessing similar harms to that of cigarettes) were collected from Instagram and used in the analyses (N=840). A CNN was used to extract unique features from images identified to contain waterpipes. An SVM classifier was built to distinguish between images with and without waterpipes. Methods for image classification were then compared to show how a CNN+SVM classifier could improve accuracy. Results As the number of validated training images increased, the total number of extracted features increased. In addition, as the number of features learned by the SVM classifier increased, the average level of accuracy increased. Overall, 99.5% (418/420) of images classified were correctly identified as either hookah or nonhookah images. This level of accuracy was an improvement over earlier methods that used SVM, CNN, or bag-of-features alone. Conclusions A CNN extracts more features of images, allowing an SVM classifier to be better informed, resulting in higher accuracy compared with methods that extract fewer features. Future research can use this method to grow the scope of image-based studies. The methods presented here might help detect increases in the popularity of certain tobacco products over time on social media. By taking images of waterpipes from Instagram, we place our methods in a context that can be utilized to inform health researchers analyzing social media to understand user experience with emerging tobacco products and inform public health surveillance targets and policies.
Collapse
Affiliation(s)
- Youshan Zhang
- Department of Computer Science, Lehigh University, Bethlehem, PA, United States
| | | | | | - Tess Boley Cruz
- Keck School of Medicine of USC, Los Angeles, CA, United States
| |
Collapse
|
54
|
An Y, Wang J, Li C, Leier A, Marquez-Lago T, Wilksch J, Zhang Y, Webb GI, Song J, Lithgow T. Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI. Brief Bioinform 2018; 19:148-161. [PMID: 27777222 DOI: 10.1093/bib/bbw100] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Indexed: 11/15/2022] Open
Abstract
Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperform all the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp. We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS.
Collapse
|
55
|
Punjabi M, Bharadvaja N, Sachdev A, Krishnan V. Molecular characterization, modeling, and docking analysis of late phytic acid biosynthesis pathway gene, inositol polyphosphate 6-/ 3-/ 5-kinase, a potential candidate for developing low phytate crops. 3 Biotech 2018; 8:344. [PMID: 30073129 PMCID: PMC6064606 DOI: 10.1007/s13205-018-1343-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 07/06/2018] [Indexed: 01/08/2023] Open
Abstract
The coding sequence of inositol polyphosphate 6-/3-/5-kinase (GmIPK2) gene was identified and cloned from popular Indian soybean cultivar Pusa-16. The clone was predicted to encode 279 amino acids long, 30.97 kDa protein. Multiple sequence alignment revealed an inositol phosphate-binding motif, PxxxDxKxG throughout the IPK2 sequences along with other motifs unique to inositol phosphate kinase superfamily. Eight α-helices and eight β-strands in antiparallel β-sheets arrangement were predicted in the secondary structure of GmIPK2. The temporal analysis of GmIPK2 revealed maximum expression in the seed tissues during later stages of development while spatially the transcript levels were lowest in leaf and stem tissues. Endosperm-specific cis-regulatory motifs (GCN4 and Skn_1) which support high levels of expression, as observed in the developing seeds, were detected in its promoter region. The protein structure of GmIPK2 was modeled based on the crystal structure of inositol polyphosphate multikinase from Arabidopsis thaliana (PDB:4FRF) and subsequently docked with inositol phosphate ligands (PDB: 5GUG-I3P and PDB: 4A69-I0P). Molecular dynamics (MD) simulation established the structural stability of both, modeled enzyme and ligand-bound complexes. Docking in combination with trajectory analysis for 50 ns MD run confirmed the participation of Lys105, Lys126 and Arg153 residues in the formation of a network of hydrogen bonds to stabilize the ligand-receptor interaction. Results of the present study thus provide valuable information on structural and functional aspects of GmIPK2 which shall assist in strategizing our long-term goal of achieving phytic acid reduction in soybean by genetic modification of its biosynthetic pathway to develop a nutritionally enhanced crop in the future.
Collapse
Affiliation(s)
- Mansi Punjabi
- Department of Biotechnology, Delhi Technological University (Formerly Delhi College of Engineering), New Delhi, 110042 India
- Division of Biochemistry, Indian Agricultural Research Institute, New Delhi, 110012 India
| | - Navneeta Bharadvaja
- Department of Biotechnology, Delhi Technological University (Formerly Delhi College of Engineering), New Delhi, 110042 India
| | - Archana Sachdev
- Division of Biochemistry, Indian Agricultural Research Institute, New Delhi, 110012 India
| | - Veda Krishnan
- Division of Biochemistry, Indian Agricultural Research Institute, New Delhi, 110012 India
| |
Collapse
|
56
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
57
|
da Costa WLO, Araújo CLDA, Dias LM, Pereira LCDS, Alves JTC, Araújo FA, Folador EL, Henriques I, Silva A, Folador ARC. Functional annotation of hypothetical proteins from the Exiguobacterium antarcticum strain B7 reveals proteins involved in adaptation to extreme environments, including high arsenic resistance. PLoS One 2018; 13:e0198965. [PMID: 29940001 PMCID: PMC6016940 DOI: 10.1371/journal.pone.0198965] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 05/28/2018] [Indexed: 02/07/2023] Open
Abstract
Exiguobacterium antarcticum strain B7 is a psychrophilic Gram-positive bacterium that possesses enzymes that can be used for several biotechnological applications. However, many proteins from its genome are considered hypothetical proteins (HPs). These functionally unknown proteins may indicate important functions regarding the biological role of this bacterium, and the use of bioinformatics tools can assist in the biological understanding of this organism through functional annotation analysis. Thus, our study aimed to assign functions to proteins previously described as HPs, present in the genome of E. antarcticum B7. We used an extensive in silico workflow combining several bioinformatics tools for function annotation, sub-cellular localization and physicochemical characterization, three-dimensional structure determination, and protein-protein interactions. This genome contains 2772 genes, of which 765 CDS were annotated as HPs. The amino acid sequences of all HPs were submitted to our workflow and we successfully attributed function to 132 HPs. We identified 11 proteins that play important roles in the mechanisms of adaptation to adverse environments, such as flagellar biosynthesis, biofilm formation, carotenoids biosynthesis, and others. In addition, three predicted HPs are possibly related to arsenic tolerance. Through an in vitro assay, we verified that E. antarcticum B7 can grow at high concentrations of this metal. The approach used was important to precisely assign function to proteins from diverse classes and to infer relationships with proteins with functions already described in the literature. This approach aims to produce a better understanding of the mechanism by which this bacterium adapts to extreme environments and to the finding of targets with biotechnological interest.
Collapse
Affiliation(s)
- Wana Lailan Oliveira da Costa
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Carlos Leonardo de Aragão Araújo
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Larissa Maranhão Dias
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Lino César de Sousa Pereira
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Jorianne Thyeska Castro Alves
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Fabrício Almeida Araújo
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Edson Luiz Folador
- Biotechnology Center, Federal University of Paraiba, João Pessoa, Paraíba, Brazil
| | - Isabel Henriques
- Biology Department & CESAM, University of Aveiro, Aveiro, Portugal
| | - Artur Silva
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
| | - Adriana Ribeiro Carneiro Folador
- Laboratory of Genomic and Bioinformatics, Center of Genomics and System Biology, Institute of Biological Science, Federal University of Para, Belém, Pará, Brazil
- * E-mail: ,
| |
Collapse
|
58
|
Intelligent Classifier: a Tool to Impel Drug Technology Transfer from Academia to Industry. J Pharm Innov 2018. [DOI: 10.1007/s12247-018-9332-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
59
|
Abstract
The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC+, a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC+ classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC+ classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC+ classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.
Collapse
|
60
|
Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules. BMC Res Notes 2018; 11:290. [PMID: 29751818 PMCID: PMC5948687 DOI: 10.1186/s13104-018-3383-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/30/2018] [Indexed: 02/06/2023] Open
Abstract
Objectives The arrival of free oxygen on the globe, aerobic life is becoming possible. However, it has become very clear that the oxygen binding proteins are widespread in the biosphere and are found in all groups of organisms, including prokaryotes, eukaryotes as well as in fungi, plants, and animals. The exponential growth and availability of fresh annotated protein sequences in the databases motivated us to develop an improved version of “Oxypred” for identifying oxygen-binding proteins. Results In this study, we have proposed a method for identifying oxy-proteins with two different sequence similarity cutoffs 50 and 90%. A different amino acid composition based Support Vector Machines models was developed, including the evolutionary profiles in the form position-specific scoring matrix (PSSM). The fivefold cross-validation techniques were applied to evaluate the prediction performance. Also, we compared with existing methods, which shows nearly 97% recognition, but, our newly developed models were able to recognize almost 99.99 and 100% in both oxy-50 and 90% similarity models respectively. Our result shows that our approaches are faster and achieve a better prediction performance over the existing methods. The web-server Oxypred2 was developed for an alternative method for identifying oxy-proteins with more additional modules including PSSM, available at http://bioinfo.imtech.res.in/servers/muthu/oxypred2/home.html. Electronic supplementary material The online version of this article (10.1186/s13104-018-3383-9) contains supplementary material, which is available to authorized users.
Collapse
|
61
|
Pang X, Xu C, Xu Y. Scaling KNN multi-class twin support vector machine via safe instance reduction. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.02.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
62
|
Popli K, Maries V, Afacan A, Liu Q, Prasad V. Development of a vision-based online soft sensor for oil sands flotation using support vector regression and its application in the dynamic monitoring of bitumen extraction. CAN J CHEM ENG 2018. [DOI: 10.1002/cjce.23164] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Khushaal Popli
- Department of Chemical and Materials Engineering; University of Alberta; Edmonton, AB T6G 1H9 Canada
| | - Victor Maries
- Canadian Natural Resources Limited; P.O. Bag 4025, Fort McMurray, AB T9H 3H5 Canada
| | - Artin Afacan
- Department of Chemical and Materials Engineering; University of Alberta; Edmonton, AB T6G 1H9 Canada
| | - Qi Liu
- Department of Chemical and Materials Engineering; University of Alberta; Edmonton, AB T6G 1H9 Canada
| | - Vinay Prasad
- Department of Chemical and Materials Engineering; University of Alberta; Edmonton, AB T6G 1H9 Canada
| |
Collapse
|
63
|
Deo P, Chow SH, Hay ID, Kleifeld O, Costin A, Elgass KD, Jiang JH, Ramm G, Gabriel K, Dougan G, Lithgow T, Heinz E, Naderer T. Outer membrane vesicles from Neisseria gonorrhoeae target PorB to mitochondria and induce apoptosis. PLoS Pathog 2018; 14:e1006945. [PMID: 29601598 PMCID: PMC5877877 DOI: 10.1371/journal.ppat.1006945] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 02/21/2018] [Indexed: 01/31/2023] Open
Abstract
Neisseria gonorrhoeae causes the sexually transmitted disease gonorrhoea by evading innate immunity. Colonizing the mucosa of the reproductive tract depends on the bacterial outer membrane porin, PorB, which is essential for ion and nutrient uptake. PorB is also targeted to host mitochondria and regulates apoptosis pathways to promote infections. How PorB traffics from the outer membrane of N. gonorrhoeae to mitochondria and whether it modulates innate immune cells, such as macrophages, remains unclear. Here, we show that N. gonorrhoeae secretes PorB via outer membrane vesicles (OMVs). Purified OMVs contained primarily outer membrane proteins including oligomeric PorB. The porin was targeted to mitochondria of macrophages after exposure to purified OMVs and wild type N. gonorrhoeae. This was associated with loss of mitochondrial membrane potential, release of cytochrome c, activation of apoptotic caspases and cell death in a time-dependent manner. Consistent with this, OMV-induced macrophage death was prevented with the pan-caspase inhibitor, Q-VD-PH. This shows that N. gonorrhoeae utilizes OMVs to target PorB to mitochondria and to induce apoptosis in macrophages, thus affecting innate immunity. Neisseria gonorrhoeae causes the sexually transmitted disease gonorrhoea in more than 100 million people worldwide every year. The bacteria replicate in the reproductive tract by evading innate and adaptive immunity. In the absence of effective vaccines and the rise of antibiotic resistance, understanding the molecular interactions between innate immune cells and N. gonorrhoeae may lead to new strategies to combat bacterial growth and the symptoms of gonorrhoea. It has long been known that the N. gonorrhoeae porin, PorB, promotes bacterial survival but also targets host mitochondria in infections. The mechanism by which PorB traffics form the bacterial outer membrane to host mitochondria remains unclear. Here, we utilized proteomics and super-resolution microscopy to show that N. gonorrhoeae secretes PorB via outer membrane vesicles. These vesicles are taken up by macrophages and deliver PorB to mitochondria. Macrophages treated with N. gonorrhoeae vesicles contained damaged mitochondria and active caspase-3. A caspase inhibitor prevented apoptosis of macrophages treated with N. gonorrhoeae vesicles. This suggests that N. gonorrhoeae secretes membrane vesicles, which are readily detectable in gonorrhoea patients, to target macrophages and to promote infections.
Collapse
Affiliation(s)
- Pankaj Deo
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Clayton, Victoria, Australia
| | - Seong H Chow
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Clayton, Victoria, Australia
| | - Iain D Hay
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, Victoria, Australia
| | - Oded Kleifeld
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Clayton, Victoria, Australia
| | - Adam Costin
- Monash Ramaciotti Centre for Cryo Electron Microscopy, Monash University, Clayton, Victoria, Australia
| | - Kirstin D Elgass
- Monash Micro Imaging, Monash University, Clayton, Victoria, Australia
| | - Jhih-Hang Jiang
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, Victoria, Australia
| | - Georg Ramm
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Clayton, Victoria, Australia.,Monash Ramaciotti Centre for Cryo Electron Microscopy, Monash University, Clayton, Victoria, Australia
| | - Kipros Gabriel
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Clayton, Victoria, Australia
| | - Gordon Dougan
- Infection Genomics Program, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, Victoria, Australia
| | - Eva Heinz
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, Victoria, Australia.,Infection Genomics Program, Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | - Thomas Naderer
- Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Clayton, Victoria, Australia
| |
Collapse
|
64
|
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Technical University of Denmark, Kemitorvet, Building 208, DK-2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
65
|
An Efficient Approach for Prediction of Nuclear Receptor and Their Subfamilies Based on Fuzzy k-Nearest Neighbor with Maximum Relevance Minimum Redundancy. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES 2018. [DOI: 10.1007/s40010-016-0325-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
66
|
Guo J, Islam MA, Lin H, Ji C, Duan Y, Liu P, Zeng Q, Day B, Kang Z, Guo J. Genome-Wide Identification of Cyclic Nucleotide-Gated Ion Channel Gene Family in Wheat and Functional Analyses of TaCNGC14 and TaCNGC16. FRONTIERS IN PLANT SCIENCE 2018; 9:18. [PMID: 29403523 PMCID: PMC5786745 DOI: 10.3389/fpls.2018.00018] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 01/04/2018] [Indexed: 05/18/2023]
Abstract
Cyclic nucleotide gated channels (CNGCs) play multifaceted roles in plants, particularly with respect to signaling processes associated with abiotic stress signaling and during host-pathogen interactions. Despite key roles during plant survival and response to environment, little is known about the activity and function of CNGC family in common wheat (Triticum aestivum L.), a key stable food around the globe. In this study, we performed a genome-wide identification of CNGC family in wheat and identified a total 47 TaCNGCs in wheat, classifying these genes into four major groups (I-IV) with two sub-groups (IVa and IVb). Sequence analysis revealed the presence of several conserved motifs, including a phosphate binding cassette (PBC) and a "hinge" region, both of which have been hypothesized to be critical for the function of wheat CNGCs. During wheat infection with Pst, the transcript levels of TaCNGC14 and TaCNGC16, both members of group IVb, showed significant induction during a compatible interaction, while a reduction in gene expression was observed in incompatible interactions. In addition, TaCNGC14 and TaCNGC16 mRNA accumulation was significantly influenced by exogenously applied hormones, including abscisic acid (ABA), methyl jasmonate (MeJA), and salicylic acid (SA), suggesting a role in hormone signaling and/or perception. Silencing of TaCNGC14 and TaCNGC16 limited Pst growth and increased wheat resistance against Pst. The results presented herein contribute to our understanding of the wheat CNGC gene family and the mechanism of TaCNGCs signaling during wheat-Pst interaction.
Collapse
Affiliation(s)
- Jia Guo
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Md Ashraful Islam
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Haocheng Lin
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Changan Ji
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Yinghui Duan
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Peng Liu
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Qingdong Zeng
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Brad Day
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI, United States
| | - Zhensheng Kang
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| | - Jun Guo
- State Key Laboratory of Crop Stress Biology for Arid Areas, College of Plant Protection, Northwest A&F University, Yangling, China
| |
Collapse
|
67
|
A Comparison Study of Kernel Functions in the Support Vector Machine and Its Application for Termite Detection. INFORMATION 2018. [DOI: 10.3390/info9010005] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
68
|
Khan AA, Khan Z, Kalam MA, Khan AA. Inter-kingdom prediction certainty evaluation of protein subcellular localization tools: microbial pathogenesis approach for deciphering host microbe interaction. Brief Bioinform 2018; 19:12-22. [PMID: 27758808 DOI: 10.1093/bib/bbw093] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Indexed: 05/14/2025] Open
Abstract
Microbial pathogenesis involves several aspects of host-pathogen interactions, including microbial proteins targeting host subcellular compartments and subsequent effects on host physiology. Such studies are supported by experimental data, but recent detection of bacterial proteins localization through computational eukaryotic subcellular protein targeting prediction tools has also come into practice. We evaluated inter-kingdom prediction certainty of these tools. The bacterial proteins experimentally known to target host subcellular compartments were predicted with eukaryotic subcellular targeting prediction tools, and prediction certainty was assessed. The results indicate that these tools alone are not sufficient for inter-kingdom protein targeting prediction. The correct prediction of pathogen's protein subcellular targeting depends on several factors, including presence of localization signal, transmembrane domain and molecular weight, etc., in addition to approach for subcellular targeting prediction. The detection of protein targeting in endomembrane system is comparatively difficult, as the proteins in this location are channelized to different compartments. In addition, the high specificity of training data set also creates low inter-kingdom prediction accuracy. Current data can help to suggest strategy for correct prediction of bacterial protein's subcellular localization in host cell.
Collapse
|
69
|
Accurate prediction of subcellular location of apoptosis proteins combining Chou's PseAAC and PsePSSM based on wavelet denoising. Oncotarget 2017; 8:107640-107665. [PMID: 29296195 PMCID: PMC5746097 DOI: 10.18632/oncotarget.22585] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 10/30/2017] [Indexed: 02/05/2023] Open
Abstract
Apoptosis proteins subcellular localization information are very important for understanding the mechanism of programmed cell death and the development of drugs. The prediction of subcellular localization of an apoptosis protein is still a challenging task because the prediction of apoptosis proteins subcellular localization can help to understand their function and the role of metabolic processes. In this paper, we propose a novel method for protein subcellular localization prediction. Firstly, the features of the protein sequence are extracted by combining Chou's pseudo amino acid composition (PseAAC) and pseudo-position specific scoring matrix (PsePSSM), then the feature information of the extracted is denoised by two-dimensional (2-D) wavelet denoising. Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of apoptosis proteins. Quite promising predictions are obtained using the jackknife test on three widely used datasets and compared with other state-of-the-art methods. The results indicate that the method proposed in this paper can remarkably improve the prediction accuracy of apoptosis protein subcellular localization, which will be a supplementary tool for future proteomics research.
Collapse
|
70
|
ScMED7, a sugarcane mediator subunit gene, acts as a regulator of plant immunity and is responsive to diverse stress and hormone treatments. Mol Genet Genomics 2017; 292:1363-1375. [DOI: 10.1007/s00438-017-1352-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Accepted: 07/27/2017] [Indexed: 10/19/2022]
|
71
|
Nielsen H. Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms. Curr Top Microbiol Immunol 2017; 404:129-158. [PMID: 26728066 DOI: 10.1007/82_2015_5006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
When predicting the subcellular localization of proteins from their amino acid sequences, there are basically three approaches: signal-based, global property-based, and homology-based. Each of these has its advantages and drawbacks, and it is important when comparing methods to know which approach was used. Various statistical and machine learning algorithms are used with all three approaches, and various measures and standards are employed when reporting the performances of the developed methods. This chapter presents a number of available methods for prediction of sorting signals and subcellular localization, but rather than providing a checklist of which predictors to use, it aims to function as a guide for critical assessment of prediction methods.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kemitorvet building 208, 2800, Lyngby, Denmark.
| |
Collapse
|
72
|
In silico analysis to identify vaccine candidates common to multiple serotypes of Shigella and evaluation of their immunogenicity. PLoS One 2017; 12:e0180505. [PMID: 28767653 PMCID: PMC5540609 DOI: 10.1371/journal.pone.0180505] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Accepted: 06/18/2017] [Indexed: 12/20/2022] Open
Abstract
Shigellosis or bacillary dysentery is an important cause of diarrhea, with the majority of the cases occurring in developing countries. Considering the high disease burden, increasing antibiotic resistance, serotype-specific immunity and the post-infectious sequelae associated with shigellosis, there is a pressing need of an effective vaccine against multiple serotypes of the pathogen. In the present study, we used bio-informatics approach to identify antigens shared among multiple serotypes of Shigella spp. This approach led to the identification of many immunogenic peptides. The five most promising peptides based on MHC binding efficiency were a putative lipoprotein (EL PGI I), a putative heat shock protein (EL PGI II), Spa32 (EL PGI III), IcsB (EL PGI IV) and a hypothetical protein (EL PGI V). These peptides were synthesized and the immunogenicity was evaluated in BALB/c mice by ELISA and cytokine assays. The putative heat shock protein (HSP) and the hypothetical protein elicited good humoral response, whereas putative lipoprotein, Spa32 and IcsB elicited good T-cell response as revealed by increased IFN-γ and TNF-α cytokine levels. The patient sera from confirmed cases of shigellosis were also evaluated for the presence of peptide specific antibodies with significant IgG and IgA antibodies against the HSP and the hypothetical protein, bestowing them as potential future vaccine candidates. The antigens reported in this study are novel and have not been tested as vaccine candidates against Shigella. This study offers time and cost-effective way of identifying unprecedented immunogenic antigens to be used as potential vaccine candidates. Moreover, this approach should easily be extendable to find new potential vaccine candidates for other pathogenic bacteria.
Collapse
|
73
|
Orfanoudaki G, Markaki M, Chatzi K, Tsamardinos I, Economou A. MatureP: prediction of secreted proteins with exclusive information from their mature regions. Sci Rep 2017; 7:3263. [PMID: 28607462 PMCID: PMC5468347 DOI: 10.1038/s41598-017-03557-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 04/28/2017] [Indexed: 11/09/2022] Open
Abstract
More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.
Collapse
Affiliation(s)
- Georgia Orfanoudaki
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece
| | - Maria Markaki
- Computer Science Department, University of Crete, Heraklion, Greece
| | - Katerina Chatzi
- KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium
| | - Ioannis Tsamardinos
- Computer Science Department, University of Crete, Heraklion, Greece.,Gnosis Data Analysis PC, Heraklion, Greece
| | - Anastassios Economou
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece. .,KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium.
| |
Collapse
|
74
|
Mu Z, Hu J, Min J, Yin J. Comparison of different entropies as features for person authentication based on EEG signals. IET BIOMETRICS 2017. [DOI: 10.1049/iet-bmt.2016.0144] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Zhendong Mu
- The Center of Collaboration and InnovationJiangxi University of TechnologyNanchang330098People's Republic of China
| | - Jianfeng Hu
- The Center of Collaboration and InnovationJiangxi University of TechnologyNanchang330098People's Republic of China
| | - Jianliang Min
- The Center of Collaboration and InnovationJiangxi University of TechnologyNanchang330098People's Republic of China
| | - Jinghai Yin
- The Center of Collaboration and InnovationJiangxi University of TechnologyNanchang330098People's Republic of China
| |
Collapse
|
75
|
Li R, Lai Y, Zhang Y, Yao L, Wu X. Classification of Cognitive Level of Patients with Leukoaraiosis on the Basis of Linear and Non-Linear Functional Connectivity. Front Neurol 2017; 8:2. [PMID: 28154549 PMCID: PMC5243822 DOI: 10.3389/fneur.2017.00002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/04/2017] [Indexed: 11/18/2022] Open
Abstract
Leukoaraiosis (LA) describes diffuse white matter abnormalities apparent in computed tomography (CT) or magnetic resonance (MR) brain scans. Patients with LA generally show varying degrees of cognitive impairment, which can be classified as cognitively normal (CN), mild cognitive impairment (MCI), and dementia. However, a consistent relationship between the degree of LA and the level of cognitive impairment has not yet been established. We used functional magnetic resonance imaging (fMRI) to explore possible neuroimaging biomarkers for classification of cognitive level in LA. Functional connectivity (FC) between brain regions was calculated using Pearson’s correlation coefficient (PCC), maximal information coefficient (MIC), and extended maximal information coefficient (eMIC). Next, FCs with high discriminative power for different cognitive levels in LA were used as features for classification based on support vector machine. CN and MCI were classified with accuracies of 75.0, 61.9, and 91.1% based on features from PCC, MIC, and eMIC, respectively. MCI and dementia were classified with accuracies of 80.1, 86.2, and 87.4% based on features from PCC, MIC, and eMIC, respectively. CN and dementia were classified with accuracies of 80.1, 89.9, and 94.4% based on features from PCC, MIC, and eMIC, respectively. Our results suggest that features extracted from fMRI were efficient for classification of cognitive impairment level in LA, especially, when features were based on a non-linear method (eMIC).
Collapse
Affiliation(s)
- Ranran Li
- College of Information Science and Technology, Beijing Normal University , Beijing , China
| | - Youzhi Lai
- College of Information Science and Technology, Beijing Normal University , Beijing , China
| | - Yumei Zhang
- Neurology Department, Beijing Tiantan Hospital Affiliated with Capital Medical University , Beijing , China
| | - Li Yao
- College of Information Science and Technology, Beijing Normal University, Beijing, China; State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| | - Xia Wu
- College of Information Science and Technology, Beijing Normal University, Beijing, China; State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| |
Collapse
|
76
|
Abstract
Bioinformatic analysis can not only accelerate drug target identification and drug candidate screening and refinement, but also facilitate characterization of side effects and predict drug resistance. High-throughput data such as genomic, epigenetic, genome architecture, cistromic, transcriptomic, proteomic, and ribosome profiling data have all made significant contribution to mechanismbased drug discovery and drug repurposing. Accumulation of protein and RNA structures, as well as development of homology modeling and protein structure simulation, coupled with large structure databases of small molecules and metabolites, paved the way for more realistic protein-ligand docking experiments and more informative virtual screening. I present the conceptual framework that drives the collection of these high-throughput data, summarize the utility and potential of mining these data in drug discovery, outline a few inherent limitations in data and software mining these data, point out news ways to refine analysis of these diverse types of data, and highlight commonly used software and databases relevant to drug discovery.
Collapse
Affiliation(s)
- Xuhua Xia
- Department of Biology, Faculty of Science, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada
- Ottawa Institute of Systems Biology, Ottawa K1H 8M5, Canada
| |
Collapse
|
77
|
Francis A, Dhaka N, Bakshi M, Jung KH, Sharma MK, Sharma R. Comparative phylogenomic analysis provides insights into TCP gene functions in Sorghum. Sci Rep 2016; 6:38488. [PMID: 27917941 PMCID: PMC5137041 DOI: 10.1038/srep38488] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 11/10/2016] [Indexed: 12/30/2022] Open
Abstract
Sorghum is a highly efficient C4 crop with potential to mitigate challenges associated with food, feed and fuel. TCP proteins are of particular interest for crop improvement programs due to their well-demonstrated roles in crop domestication and shaping plant architecture thereby, affecting agronomic traits. We identified 20 TCP genes from Sorghum. Except SbTCP8, all are either intronless or contain introns in the untranslated regions. Comparative phylogenetic analysis of Arabidopsis, rice, Brachypodium and Sorghum TCP proteins revealed two distinct classes categorized into ten sub-clades. Sub-clade F is dicot-specific, whereas A2, G1 and I1 groups only contained genes from grasses. Sub-clade B was missing in Sorghum, whereas group A1 was missing in rice indicating species-specific divergence of TCP proteins. TCP proteins of Sorghum are enriched in disorder promoting residues with class I containing higher percent disorder than class II proteins. Seven pairs of paralogous TCP genes were identified from Sorghum, five of which seem to predate Rice-Sorghum divergence. All of them have diverged in their expression. Based on the expression and orthology analysis, five Sorghum genes have been shortlisted for further investigation for their roles in regulating plant morphology. Whereas, three genes have been identified as candidates for engineering abiotic stress tolerance.
Collapse
Affiliation(s)
- Aleena Francis
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| | - Namrata Dhaka
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| | - Mohit Bakshi
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| | - Ki-Hong Jung
- Graduate School of Biotechnology & Crop Biotech Institute, Kyung Hee University, Yongin, 17104, Republic of Korea
| | - Manoj K. Sharma
- School of Biotechnology, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| | - Rita Sharma
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| |
Collapse
|
78
|
Cloning and expression of SgCYP450-4 from Siraitia grosvenorii. Acta Pharm Sin B 2016; 6:614-622. [PMID: 27818929 PMCID: PMC5071632 DOI: 10.1016/j.apsb.2016.06.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Revised: 06/22/2016] [Accepted: 06/23/2016] [Indexed: 11/21/2022] Open
Abstract
CYP450 plays an essential role in the development and growth of the fruits of Siraitia grosvenorii. However, little is known about the SgCYP450-4 gene in S. grosvenorii. Here, based on transcriptome data, a full-length cDNA sequence of SgCYP450-4 was cloned by reverse transcriptase-polymerase chain reaction (RT-PCR) and rapid-amplification of cDNA ends (RACE) strategies. SgCYP450-4 is 1677 bp in length (GenBank accession No. AEM42985.1) and contains a complete open reading frame (ORF) of 1422 bp. The deduced protein was composed of 473 amino acids, the molecular weight is 54.01 kDa, the theoretical isoelectric point (PI) is 8.8, and the protein was predicted to possess cytochrome P450 domains. SgCYP450-4 gene was highly expressed in root, diploid fruit and fruit treated with hormone and pollination. At 10 days after treatment with pollination and hormones, the expression of SgCYP450-4 had the highest level and then decreased over time, which was consistent with the development of fruits of S. Grosvenorii. Hormonal treatment could significantly induce the expression of SgCYP450-4. These results provide a reference for regulation of fruit development and the use of parthenocarpy to generate seedless fruit, and provide a scientific basis for the production of growth regulator application agents.
Collapse
|
79
|
Chrysostomou C, Seker H. Prediction of protein allergenicity based on signal-processing bioinformatics approach. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2014:808-11. [PMID: 25570082 DOI: 10.1109/embc.2014.6943714] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Current bioinformatics tools accomplish high accuracies in classifying allergenic protein sequences with high homology and generally perform poorly with low homology protein sequences. Although some homologous regions explained Immunoglobulin E (IgE) cross-reactivity in groups of allergens, no universal molecular structure could be associated with allergenicity. In addition, studies have showed that cross-reactivity is not directly linked to the homology between protein sequences. Therefore, a new homology independent method needs to be developed to determine if a protein is an allergen or not. The aim of this study is therefore to differentiate sets of allergenic and non-allergenic proteins using a signal-processing based bioinformatics approach. In this paper, a new method was proposed for characterisation and classification of allergenic protein sequences. For this method hydrophobicity amino acid index was used to encode proteins to numerical sequences and Discrete Fourier Transform to extract features for each protein. Finally, a classifier was constructed based on Support Vector Machines. In order to demonstrate the applicability of the proposed method 857 allergen and 1000 non-allergen proteins were collected from UniProt online database. The results obtained from the proposed method yielded: MCC: 0.752 ± 0.007, Specificity: 0.912 ± 0.005, Sensitivity: 0.835 ± 0.008 and Total Accuracy: 87.65% ± 0.004.
Collapse
|
80
|
Explaining Support Vector Machines: A Color Based Nomogram. PLoS One 2016; 11:e0164568. [PMID: 27723811 PMCID: PMC5056733 DOI: 10.1371/journal.pone.0164568] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 09/27/2016] [Indexed: 02/05/2023] Open
Abstract
Problem setting Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting the models is far from obvious, especially when non-linear kernels are used. Hence, the methods are used as black boxes. As a consequence, the use of SVMs is less supported in areas where interpretability is important and where people are held responsible for the decisions made by models. Objective In this work, we investigate whether SVMs using linear, polynomial and RBF kernels can be explained such that interpretations for model-based decisions can be provided. We further indicate when SVMs can be explained and in which situations interpretation of SVMs is (hitherto) not possible. Here, explainability is defined as the ability to produce the final decision based on a sum of contributions which depend on one single or at most two input variables. Results Our experiments on simulated and real-life data show that explainability of an SVM depends on the chosen parameter values (degree of polynomial kernel, width of RBF kernel and regularization constant). When several combinations of parameter values yield the same cross-validation performance, combinations with a lower polynomial degree or a larger kernel width have a higher chance of being explainable. Conclusions This work summarizes SVM classifiers obtained with linear, polynomial and RBF kernels in a single plot. Linear and polynomial kernels up to the second degree are represented exactly. For other kernels an indication of the reliability of the approximation is presented. The complete methodology is available as an R package and two apps and a movie are provided to illustrate the possibilities offered by the method.
Collapse
|
81
|
Tiwari AK. Prediction of G-protein coupled receptors and their subfamilies by incorporating various sequence features into Chou's general PseAAC. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 134:197-213. [PMID: 27480744 DOI: 10.1016/j.cmpb.2016.07.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Revised: 05/27/2016] [Accepted: 07/01/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVE The G-protein coupled receptors are the largest superfamilies of membrane proteins and important targets for the drug design. G-protein coupled receptors are responsible for many physiochemical processes such as smell, taste, vision, neurotransmission, metabolism, cellular growth and immune response. So it is necessary to design a robust and efficient approach for the prediction of G-protein coupled receptors and their subfamilies. METHODS In this paper, the protein samples are represented by amino acid composition, dipeptide composition, correlation features, composition, transition, distribution, sequence order descriptors and pseudo amino acid composition with total 1497 number of sequence derived features. To address the issue of efficient classification of G-protein coupled receptors and their subfamilies, we propose to use a weighted k-nearest neighbor classifier with UNION of best 50 features, selected by Fisher score based feature selection, ReliefF, fast correlation based filter, minimum redundancy maximum relevancy, and support vector machine based recursive elimination feature selection methods to exploit the advantages of these feature selection methods. RESULTS The proposed method achieved an overall accuracy of 99.9%, 98.3%, 95.4%, MCC values of 1.00, 0.98, 0.95, ROC area values of 1.00, 0.998, 0.996 and precision of 99.9%, 98.3% and 95.5% using 10-fold cross-validation to predict the G-protein coupled receptors and non-G-protein coupled receptors, subfamilies of G-protein coupled receptors, and subfamilies of class A G-protein coupled receptors, respectively. CONCLUSIONS The high accuracies, MCC, ROC area values, and precision values indicate that the proposed method is better for the prediction of G-protein coupled receptors families and their subfamilies.
Collapse
|
82
|
Identification and Sorting of PVC Polymer in Recycling Process by Laser-Induced Breakdown Spectroscopy (LIBS) Combined with Support Vector Machine (SVM) Model. IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION A-SCIENCE 2016. [DOI: 10.1007/s40995-016-0084-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
83
|
Sun Y, Jiang Y, Wang Y, Li X, Yang R, Yu Z, Qin L. The Toll Signaling Pathway in the Chinese Oak Silkworm, Antheraea pernyi: Innate Immune Responses to Different Microorganisms. PLoS One 2016; 11:e0160200. [PMID: 27483463 PMCID: PMC4970820 DOI: 10.1371/journal.pone.0160200] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2016] [Accepted: 07/16/2016] [Indexed: 11/19/2022] Open
Abstract
The Toll pathway is one of the most important signaling pathways regulating insect innate immunity. Spatzle is a key protein that functions as a Toll receptor ligand to trigger Toll-dependent expression of immunity-related genes. In this study, a novel spatzle gene (ApSPZ) from the Chinese oak silkworm Antheraea pernyi was identified. The ApSPZ cDNA is 1065 nucleotides with an open reading frame (ORF) of 777 bp encoding a protein of 258 amino acids. The protein has an estimated molecular weight of 29.71 kDa and an isoelectric point (PI) of 8.53. ApSPZ is a nuclear and secretory protein with no conserved domains or membrane helices and shares 40% amino acid identity with SPZ from Manduca sexta. Phylogenetic analysis indicated that ApSPZ might be a new member of the Spatzle type 1 family, which belongs to the Spatzle superfamily. The expression patterns of several genes involved in the Toll pathway were examined at different developmental stages and various tissues in 5th instar larvae. The examined targets included A. pernyi spatzle, GNBP, MyD88, Tolloid, cactus and dorsalA. The RT-PCR results showed that these genes were predominantly expressed in immune-responsive fat body tissue, indicating that the genes play a crucial role in A. pernyi innate immunity. Moreover, A. pernyi infection with the fungus Nosema pernyi and the gram-positive bacterium Enterococcus pernyi, but not the gram-negative bacterium Escherichia coli, activated the Toll signaling pathway. These results represent the first study of the Toll pathway in A. pernyi, which provides insight into the A. pernyi innate immune system.
Collapse
Affiliation(s)
- Ying Sun
- College of Plant Protection, Shenyang Agricultural University, Shenyang, 110866, China
- College of Bioscience and Biotechnology, Liaoning Engineering & Technology Research Center for Insect Resources, Shenyang Agricultural University, Shenyang, 110866, China
| | - Yiren Jiang
- College of Bioscience and Biotechnology, Liaoning Engineering & Technology Research Center for Insect Resources, Shenyang Agricultural University, Shenyang, 110866, China
| | - Yong Wang
- College of Bioscience and Biotechnology, Liaoning Engineering & Technology Research Center for Insect Resources, Shenyang Agricultural University, Shenyang, 110866, China
| | - Xisheng Li
- College of Plant Protection, Shenyang Agricultural University, Shenyang, 110866, China
- Sericultural Research Institute of Liaoning Province, Fengcheng, 118100, China
| | - Ruisheng Yang
- College of Bioscience and Biotechnology, Liaoning Engineering & Technology Research Center for Insect Resources, Shenyang Agricultural University, Shenyang, 110866, China
| | - Zhiguo Yu
- College of Plant Protection, Shenyang Agricultural University, Shenyang, 110866, China
- * E-mail: (ZY); (LQ)
| | - Li Qin
- College of Plant Protection, Shenyang Agricultural University, Shenyang, 110866, China
- College of Bioscience and Biotechnology, Liaoning Engineering & Technology Research Center for Insect Resources, Shenyang Agricultural University, Shenyang, 110866, China
- * E-mail: (ZY); (LQ)
| |
Collapse
|
84
|
Kumar R, Kumari B, Kumar M. PredHSP: Sequence Based Proteome-Wide Heat Shock Protein Prediction and Classification Tool to Unlock the Stress Biology. PLoS One 2016; 11:e0155872. [PMID: 27195495 PMCID: PMC4873250 DOI: 10.1371/journal.pone.0155872] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 05/05/2016] [Indexed: 01/09/2023] Open
Abstract
Heat shock proteins are chaperonic proteins, which are present in every domain of life. They play a crucial role in folding/unfolding of proteins, their sorting and assembly into multi-protein complex, cell cycle control and also protect the cell during stress. Considering the fact that no web-based predictor is available for simultaneous prediction and classification of HSPs, it is imperative to develop a method, which can predict and classify them efficiently. In this study, we have developed coupled amino acid composition and support vector machine based two-tier method, PredHSP that identifies heat shock proteins (1st tier) and classifies it to different families (at 2nd tier). At 1st tier, we achieved maximum accuracy 76.66% with MCC 0.43, while at 2nd tier we achieved maximum accuracy 96.36% with MCC 0.87 for HSP20, 91.91% with MCC 0.83 for HSP40, 95.96% with MCC 0.72 for HSP60, 91.87% with MCC 0.71 for HSP70, 98.43% with MCC 0.70 for HSP90 and 97.48% with MCC 0.71 for HSP100. We have also developed a webserver, as well as standalone package for the use of scientific community, which can be accessed at http://14.139.227.92/mkumar/predhsp/index.html.
Collapse
Affiliation(s)
- Ravindra Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Bandana Kumari
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Manish Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| |
Collapse
|
85
|
Breckels LM, Holden SB, Wojnar D, Mulvey CM, Christoforou A, Groen A, Trotter MWB, Kohlbacher O, Lilley KS, Gatto L. Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics. PLoS Comput Biol 2016; 12:e1004920. [PMID: 27175778 PMCID: PMC4866734 DOI: 10.1371/journal.pcbi.1004920] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 04/16/2016] [Indexed: 11/19/2022] Open
Abstract
Sub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis. Sub-cellular localisation of proteins is critical to their function in all cellular processes; proteins localising to their intended micro-environment, e.g organelles, vesicles or macro-molecular complexes, will meet the interaction partners and biochemical conditions suitable to pursue their molecular function. Therefore, sound data and methods to reliably and systematically study protein localisation, and hence their mis-localisation and the disruption of protein trafficking, that are relied upon by the cell biology community, are essential. Here we present a method to infer protein localisation relying on the optimal integration of experimental mass spectrometry-based data and auxiliary sources, such as GO annotation, outputs from third-party software, protein-protein interactions or immunocytochemistry data. We found that the application of transfer learning algorithms across these diverse data sources considerably improves on the quantity and reliability of sub-cellular protein assignment, compared to single data classifiers previously applied to infer sub-cellular localisation using experimental data only. We show how our method does not compromise biologically relevant experimental-specific signal after integration with heterogeneous freely available third-party resources. The integration of different data sources is an important challenge in the data intensive world of biology and we anticipate the transfer learning methods presented here will prove useful to many areas of biology, to unify data obtained from different but complimentary sources.
Collapse
Affiliation(s)
- Lisa M. Breckels
- Computational Proteomics Unit, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Sean B. Holden
- Computer Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - David Wojnar
- Quantitative Biology Center, Universität Tübingen, Tübingen, Germany
| | - Claire M. Mulvey
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Andy Christoforou
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Arnoud Groen
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | | | - Oliver Kohlbacher
- Quantitative Biology Center, Universität Tübingen, Tübingen, Germany
- Center for Bioinformatics, Universität Tübingen, Tübingen, Germany
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Kathryn S. Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Laurent Gatto
- Computational Proteomics Unit, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
- * E-mail:
| |
Collapse
|
86
|
Wang R, Xu Y, Liu B. Recombination spot identification Based on gapped k-mers. Sci Rep 2016; 6:23934. [PMID: 27030570 PMCID: PMC4814916 DOI: 10.1038/srep23934] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 03/16/2016] [Indexed: 12/14/2022] Open
Abstract
Recombination is crucial for biological evolution, which provides many new combinations of genetic diversity. Accurate identification of recombination spots is useful for DNA function study. To improve the prediction accuracy, researchers have proposed several computational methods for recombination spot identification. The k-mer feature is one of the most useful features for modeling the properties and function of DNA sequences. However, it suffers from the inherent limitation. If the value of word length k is large, the occurrences of k-mers are closed to a binary variable, with a few k-mers present once and most k-mers are absent. This usually causes the sparse problem and reduces the classification accuracy. To solve this problem, we add gaps into k-mer and introduce a new feature called gapped k-mer (GKM) for identification of recombination spots. By using this feature, we present a new predictor called SVM-GKM, which combines the gapped k-mers and Support Vector Machine (SVM) for recombination spot identification. Experimental results on a widely used benchmark dataset show that SVM-GKM outperforms other highly related predictors. Therefore, SVM-GKM would be a powerful predictor for computational genomics.
Collapse
Affiliation(s)
- Rong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Yong Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| |
Collapse
|
87
|
Crystal structure of Rv2258c from Mycobacterium tuberculosis H37Rv, an S -adenosyl- l -methionine-dependent methyltransferase. J Struct Biol 2016; 193:172-180. [DOI: 10.1016/j.jsb.2016.01.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 01/05/2016] [Indexed: 11/21/2022]
|
88
|
Huang Y, Zhang L, Lian G, Zhan R, Xu R, Huang Y, Mitra B, Wu J, Luo G. A novel mathematical model to predict prognosis of burnt patients based on logistic regression and support vector machine. Burns 2016; 42:291-9. [PMID: 26774603 DOI: 10.1016/j.burns.2015.08.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Revised: 07/09/2015] [Accepted: 08/07/2015] [Indexed: 11/25/2022]
Abstract
OBJECTIVE To develop a mathematical model of predicting mortality based on the admission characteristics of 6220 burn cases. METHODS Data on all the burn patients presenting to Institute of Burn Research, Southwest Hospital, Third Military Medical University from January of 1999 to December of 2008 were extracted from the departmental registry. The distributions of burn cases were scattered by principal component analysis. Univariate associations with mortality were identified and independent associations were derived from multivariate logistic regression analysis. Using variables independently and significantly associated with mortality, a mathematical model to predict mortality was developed using the support vector machine (SVM) model. The predicting ability of this model was evaluated and verified. RESULTS The overall mortality in this study was 1.8%. Univariate associations with mortality were identified and independent associations were derived from multivariate logistic regression analysis. Variables at admission independently associated with mortality were gender, age, total burn area, full thickness burn area, inhalation injury, shock, period before admission and others. The sensitivity and specificity of logistic model were 99.75% and 85.84% respectively, with an area under the receiver operating curve of 0.989 (95% CI: 0.979-1.000; p<0.01). The model correctly classified 99.50% of cases. The subsequently developed support vector machine (SVM) model correctly classified nearly 100% of test cases, which could not only predict adult group but also pediatric group, with pretty high robustness (92%-100%). CONCLUSION A mathematical model based on logistic regression and SVM could be used to predict the survival prognosis according to the admission characteristics.
Collapse
Affiliation(s)
- Yinghui Huang
- Institute of Burn Research, Southwest Hospital, Third Military Medical University, Chongqing, China; Institute of Combined Injury, State Key Laboratory of Trauma, Burns and Combined Injury, Chongqing Engineering Research Center for Nanomedicine, College of Preventive Medicine, Third Military Medical University, Chongqing, China; Department of Biochemistry and Molecular Biology, Third Military Medical University, Chongqing, China.
| | - Lei Zhang
- College of Communication Engineering, Chongqing University, Chongqing 400044, China.
| | - Guan Lian
- Institute of Burn Research, Southwest Hospital, Third Military Medical University, Chongqing, China.
| | - Rixing Zhan
- Institute of Burn Research, Southwest Hospital, Third Military Medical University, Chongqing, China.
| | - Rufu Xu
- The Department of Epidemiology, Third Military Medical University, Chongqing, China.
| | - Yan Huang
- Department of Biochemistry and Molecular Biology, Third Military Medical University, Chongqing, China.
| | - Biswadev Mitra
- Trauma Service Center, Alfred Hospital, 55 Commercial Road, Melbourne, VIC 3004, Australia.
| | - Jun Wu
- Institute of Burn Research, Southwest Hospital, Third Military Medical University, Chongqing, China.
| | - Gaoxing Luo
- Institute of Burn Research, Southwest Hospital, Third Military Medical University, Chongqing, China.
| |
Collapse
|
89
|
He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
90
|
Thakur A, Rajput A, Kumar M. MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine. MOLECULAR BIOSYSTEMS 2016; 12:2572-86. [DOI: 10.1039/c6mb00241b] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth.
Collapse
Affiliation(s)
- Anamika Thakur
- Bioinformatics Centre
- Institute of Microbial Technology
- Council of Scientific and Industrial Research
- Chandigarh-160036
- India
| | - Akanksha Rajput
- Bioinformatics Centre
- Institute of Microbial Technology
- Council of Scientific and Industrial Research
- Chandigarh-160036
- India
| | - Manoj Kumar
- Bioinformatics Centre
- Institute of Microbial Technology
- Council of Scientific and Industrial Research
- Chandigarh-160036
- India
| |
Collapse
|
91
|
Tachibana K, Gotoh E, Kawamata N, Ishimoto K, Uchihara Y, Iwanari H, Sugiyama A, Kawamura T, Mochizuki Y, Tanaka T, Sakai J, Hamakubo T, Kodama T, Doi T. Analysis of the subcellular localization of the human histone methyltransferase SETDB1. Biochem Biophys Res Commun 2015; 465:725-31. [PMID: 26296461 DOI: 10.1016/j.bbrc.2015.08.065] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Accepted: 08/14/2015] [Indexed: 01/03/2023]
Abstract
SET domain, bifurcated 1 (SETDB1) is a histone methyltransferase that methylates lysine 9 on histone H3. Although it is important to know the localization of proteins to elucidate their physiological function, little is known of the subcellular localization of human SETDB1. In the present study, to investigate the subcellular localization of hSETDB1, we established a human cell line constitutively expressing enhanced green fluorescent protein fused to hSETDB1. We then generated a monoclonal antibody against the hSETDB1 protein. Expression of both exogenous and endogenous hSETDB1 was observed mainly in the cytoplasm of various human cell lines. Combined treatment with the nuclear export inhibitor leptomycin B and the proteasome inhibitor MG132 led to the accumulation of hSETDB1 in the nucleus. These findings suggest that hSETDB1, localized in the nucleus, might undergo degradation by the proteasome and be exported to the cytosol, resulting in its detection mainly in the cytosol.
Collapse
Affiliation(s)
- Keisuke Tachibana
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan.
| | - Eiko Gotoh
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Natsuko Kawamata
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Kenji Ishimoto
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan; Laboratory for System Biology and Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904, Japan
| | - Yoshie Uchihara
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan
| | - Hiroko Iwanari
- Department of Quantitative Biology and Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904, Japan
| | - Akira Sugiyama
- Radioisotope Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo, Tokyo 113-0032, Japan
| | - Takeshi Kawamura
- Radioisotope Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo, Tokyo 113-0032, Japan
| | - Yasuhiro Mochizuki
- Department of Quantitative Biology and Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904, Japan
| | - Toshiya Tanaka
- Laboratory for System Biology and Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904, Japan
| | - Juro Sakai
- Division of Metabolic Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904, Japan
| | - Takao Hamakubo
- Department of Quantitative Biology and Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904, Japan
| | - Tatsuhiko Kodama
- Laboratory for System Biology and Medicine, Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904, Japan
| | - Takefumi Doi
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan.
| |
Collapse
|
92
|
Wang X, Zhang J, Li GZ. Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble. BMC Bioinformatics 2015; 16 Suppl 12:S1. [PMID: 26329681 PMCID: PMC4705491 DOI: 10.1186/1471-2105-16-s12-s1] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Background It has become a very important and full of challenge task to predict bacterial protein subcellular locations using computational methods. Although there exist a lot of prediction methods for bacterial proteins, the majority of these methods can only deal with single-location proteins. But unfortunately many multi-location proteins are located in the bacterial cells. Moreover, multi-location proteins have special biological functions capable of helping the development of new drugs. So it is necessary to develop new computational methods for accurately predicting subcellular locations of multi-location bacterial proteins. Results In this article, two efficient multi-label predictors, Gpos-ECC-mPLoc and Gneg-ECC-mPLoc, are developed to predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. The two multi-label predictors construct the GO vectors by using the GO terms of homologous proteins of query proteins and then adopt a powerful multi-label ensemble classifier to make the final multi-label prediction. The two multi-label predictors have the following advantages: (1) they improve the prediction performance of multi-label proteins by taking the correlations among different labels into account; (2) they ensemble multiple CC classifiers and further generate better prediction results by ensemble learning; and (3) they construct the GO vectors by using the frequency of occurrences of GO terms in the typical homologous set instead of using 0/1 values. Experimental results show that Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently predict the subcellular locations of multi-label gram-positive and gram-negative bacterial proteins respectively. Conclusions Gpos-ECC-mPLoc and Gneg-ECC-mPLoc can efficiently improve prediction accuracy of subcellular localization of multi-location gram-positive and gram-negative bacterial proteins respectively. The online web servers for Gpos-ECC-mPLoc and Gneg-ECC-mPLoc predictors are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/gpos-ecc-mploc/ and http://biomed.zzuli.edu.cn/bioinfo/gneg-ecc-mploc/ respectively.
Collapse
|
93
|
An efficient approach for the prediction of ion channels and their subfamilies. Comput Biol Chem 2015; 58:205-21. [PMID: 26256801 DOI: 10.1016/j.compbiolchem.2015.07.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 06/25/2015] [Accepted: 07/08/2015] [Indexed: 01/25/2023]
Abstract
Ion channels are integral membrane proteins that are responsible for controlling the flow of ions across the cell. There are various biological functions that are performed by different types of ion channels. Therefore for new drug discovery it is necessary to develop a novel computational intelligence techniques based approach for the reliable prediction of ion channels families and their subfamilies. In this paper random forest based approach is proposed to predict ion channels families and their subfamilies by using sequence derived features. Here, seven feature vectors are used to represent the protein sample, including amino acid composition, dipeptide composition, correlation features, composition, transition and distribution and pseudo amino acid composition. The minimum redundancy and maximum relevance feature selection is used to find the optimal number of features for improving the prediction performance. The proposed method achieved an overall accuracy of 100%, 98.01%, 91.5%, 93.0%, 92.2%, 78.6%, 95.5%, 84.9%, MCC values of 1.00, 0.92, 0.88, 0.88, 0.90, 0.79, 0.91, 0.81 and ROC area values of 1.00, 0.99, 0.99, 0.99, 0.99, 0.95, 0.99 and 0.96 using 10-fold cross validation to predict the ion channels and non-ion channels, voltage gated ion channels and ligand gated ion channels, four subfamilies (calcium, potassium, sodium and chloride) of voltage gated ion channels, and four subfamilies of ligand gated ion channels and predict subfamilies of voltage gated calcium, potassium, sodium and chloride ion channels respectively.
Collapse
|
94
|
Islam SMA, Sajed T, Kearney CM, Baker EJ. PredSTP: a highly accurate SVM based model to predict sequential cystine stabilized peptides. BMC Bioinformatics 2015; 16:210. [PMID: 26142484 PMCID: PMC4491269 DOI: 10.1186/s12859-015-0633-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 06/01/2015] [Indexed: 02/07/2023] Open
Abstract
Background Numerous organisms have evolved a wide range of toxic peptides for self-defense and predation. Their effective interstitial and macro-environmental use requires energetic and structural stability. One successful group of these peptides includes a tri-disulfide domain arrangement that offers toxicity and high stability. Sequential tri-disulfide connectivity variants create highly compact disulfide folds capable of withstanding a variety of environmental stresses. Their combination of toxicity and stability make these peptides remarkably valuable for their potential as bio-insecticides, antimicrobial peptides and peptide drug candidates. However, the wide sequence variation, sources and modalities of group members impose serious limitations on our ability to rapidly identify potential members. As a result, there is a need for automated high-throughput member classification approaches that leverage their demonstrated tertiary and functional homology. Results We developed an SVM-based model to predict sequential tri-disulfide peptide (STP) toxins from peptide sequences. One optimized model, called PredSTP, predicted STPs from training set with sensitivity, specificity, precision, accuracy and a Matthews correlation coefficient of 94.86 %, 94.11 %, 84.31 %, 94.30 % and 0.86, respectively, using 200 fold cross validation. The same model outperforms existing prediction approaches in three independent out of sample testsets derived from PDB. Conclusion PredSTP can accurately identify a wide range of cystine stabilized peptide toxins directly from sequences in a species-agnostic fashion. The ability to rapidly filter sequences for potential bioactive peptides can greatly compress the time between peptide identification and testing structural and functional properties for possible antimicrobial and insecticidal candidates. A web interface is freely available to predict STP toxins from http://crick.ecs.baylor.edu/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0633-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Tanvir Sajed
- Department of Computer Science, University of Alberta, Edmonton, AB, Canada.
| | - Christopher Michel Kearney
- Institute of Biomedical Studies, Baylor University, Waco, TX, USA. .,Department of Biology, Baylor University, Waco, TX, USA.
| | - Erich J Baker
- Institute of Biomedical Studies, Baylor University, Waco, TX, USA. .,Department of Computer Science, Baylor University, One Bear Place #97356, Waco, TX, USA.
| |
Collapse
|
95
|
Ofer D, Linial M. ProFET: Feature engineering captures high-level protein functions. Bioinformatics 2015; 31:3429-36. [DOI: 10.1093/bioinformatics/btv345] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/29/2015] [Indexed: 11/13/2022] Open
|
96
|
Prediction of drug indications based on chemical interactions and chemical similarities. BIOMED RESEARCH INTERNATIONAL 2015; 2015:584546. [PMID: 25821813 PMCID: PMC4363546 DOI: 10.1155/2015/584546] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2014] [Accepted: 09/11/2014] [Indexed: 12/13/2022]
Abstract
Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs.
Collapse
|
97
|
Fukunaga T, Kubota S, Oda S, Iwasaki W. GroupTracker: Video tracking system for multiple animals under severe occlusion. Comput Biol Chem 2015; 57:39-45. [PMID: 25736254 DOI: 10.1016/j.compbiolchem.2015.02.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/03/2015] [Indexed: 10/24/2022]
Abstract
Quantitative analysis of behaviors shown by interacting multiple animals can provide a key for revealing high-order functions of their nervous systems. To resolve these complex behaviors, a video tracking system that preserves individual identity even under severe overlap in positions, i.e., occlusion, is needed. We developed GroupTracker, a multiple animal tracking system that accurately tracks individuals even under severe occlusion. As maximum likelihood estimation of Gaussian mixture model whose components can severely overlap is theoretically an ill-posed problem, we devised an expectation-maximization scheme with additional constraints on the eigenvalues of the covariance matrix of the mixture components. Our system was shown to accurately track multiple medaka (Oryzias latipes) which freely swim around in three dimensions and frequently overlap each other. As an accurate multiple animal tracking system, GroupTracker will contribute to revealing unexplored structures and patterns behind animal interactions. The Java source code of GroupTracker is available at https://sites.google.com/site/fukunagatsu/software/group-tracker.
Collapse
|
98
|
Arango-Argoty GA, Jaramillo-Garzón JA, Castellanos-Domínguez G. Feature extraction by statistical contact potentials and wavelet transform for predicting subcellular localizations in gram negative bacterial proteins. J Theor Biol 2015; 364:121-30. [PMID: 25219623 DOI: 10.1016/j.jtbi.2014.08.051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 08/27/2014] [Accepted: 08/28/2014] [Indexed: 11/16/2022]
Abstract
Predicting the localization of a protein has become a useful practice for inferring its function. Most of the reported methods to predict subcellular localizations in Gram-negative bacterial proteins make use of standard protein representations that generally do not take into account the distribution of the amino acids and the structural information of the proteins. Here, we propose a protein representation based on the structural information contained in the pairwise statistical contact potentials. The wavelet transform decodes the information contained in the primary structure of the proteins, allowing the identification of patterns along the proteins, which are used to characterize the subcellular localizations. Then, a support vector machine classifier is trained to categorize them. Cellular compartments like periplasm and extracellular medium are difficult to predict, having a high false negative rate. The wavelet-based method achieves an overall high performance while maintaining a low false negative rate, particularly, on "periplasm" and "extracellular medium". Our results suggest the proposed protein characterization is a useful alternative to representing and predicting protein sequences over the classical and cutting edge protein depictions.
Collapse
Affiliation(s)
- G A Arango-Argoty
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia; Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, 3501 Fifth Ave, Pittsburgh, PA 15260, USA.
| | - J A Jaramillo-Garzón
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia; Research Center of the Instituto Tecnologico Metropolitano, Calle 73 No 76A-354, Medellín, Colombia
| | - G Castellanos-Domínguez
- Signal Processing and Recognition Group, Universidad Nacional de Colombia, s. Manizales, Campus La Nubia, km 7 via al Magdalena, Manizales, Colombia
| |
Collapse
|
99
|
Zhu PP, Li WC, Zhong ZJ, Deng EZ, Ding H, Chen W, Lin H. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. MOLECULAR BIOSYSTEMS 2015; 11:558-63. [DOI: 10.1039/c4mb00645c] [Citation(s) in RCA: 97] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Mycobacterium tuberculosis is a bacterium that causes tuberculosis, one of the most prevalent infectious diseases.
Collapse
Affiliation(s)
- Pan-Pan Zhu
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - Wen-Chao Li
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - Zhe-Jin Zhong
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - En-Ze Deng
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| | - Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054
| |
Collapse
|
100
|
Wang X, Ma J, Li X, Zhao X, Lin Z, Chen J, Shao Z. Optimization of Chemical Fungicide Combinations Targeting the Maize Fungal Pathogen, Bipolaris maydis: A Systematic Quantitative Approach. IEEE Trans Biomed Eng 2015; 62:80-7. [DOI: 10.1109/tbme.2014.2339295] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|