1
|
Feng J, Sun M, Liu C, Zhang W, Xu C, Wang J, Wang G, Wan S. SAMP: Identifying antimicrobial peptides by an ensemble learning model based on proportionalized split amino acid composition. Brief Funct Genomics 2024; 23:879-890. [PMID: 39573886 PMCID: PMC11631067 DOI: 10.1093/bfgp/elae046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 08/23/2024] [Accepted: 11/01/2024] [Indexed: 11/24/2024] Open
Abstract
It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.
Collapse
Affiliation(s)
- Junxi Feng
- Department of Biostatistics, School of Public Health, Harvard University, Boston, MA 02115, United States
| | - Mengtao Sun
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, United States
| | - Cong Liu
- Department of Mathematics, Data Science, University of Waterloo, Waterloo, ON N2L3G1, Canada
| | - Weiwei Zhang
- Department of Pathology, Microbiology, and Immunology, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, United States
| | - Changmou Xu
- Department of Food Science and Human Nutrition, College of Agricultural, Consumer and Environmental Sciences, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, United States
| | - Guangshun Wang
- Department of Pathology, Microbiology, and Immunology, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, United States
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, United States
| |
Collapse
|
2
|
Feng J, Sun M, Liu C, Zhang W, Xu C, Wang J, Wang G, Wan S. SAMP: Identifying Antimicrobial Peptides by an Ensemble Learning Model Based on Proportionalized Split Amino Acid Composition. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.25.590553. [PMID: 38712184 PMCID: PMC11071531 DOI: 10.1101/2024.04.25.590553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of features called Proportionalized Split Amino Acid Composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at around both the N-terminus and the C-terminus, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, MCC, G-measure and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package freely available at https://github.com/wan-mlab/SAMP.
Collapse
Affiliation(s)
- Junxi Feng
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, United States, 02115
| | - Mengtao Sun
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States, 68198
| | - Cong Liu
- Department of Mathematics, Data Science, University of Waterloo, Waterloo, ON, Canada, N2L3G1
| | - Weiwei Zhang
- Department of Pathology, Microbiology, and Immunology, University of Nebraska Medical Center, Omaha, NE, United States, 68198
| | - Changmou Xu
- Department of Food Science and Human Nutrition, University of Illinois Urbana-Champaign, Urbana, IL, United States, 61801
| | - Jieqiong Wang
- Department of Neurological Sciences, University of Nebraska Medical Center, Omaha, NE, United States, 68198
| | - Guangshun Wang
- Department of Pathology, Microbiology, and Immunology, University of Nebraska Medical Center, Omaha, NE, United States, 68198
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States, 68198
| |
Collapse
|
3
|
Köseoğlu AE, Can H, Güvendi M, Erkunt Alak S, Değirmenci Döşkaya A, Karakavuk M, Döşkaya M, Ün C. Molecular characterization of Anaplasma ovis Msp4 protein in strains isolated from ticks in Turkey: A multi-epitope synthetic vaccine antigen design against Anaplasma ovis using immunoinformatic tools. Biologicals 2024; 85:101749. [PMID: 38325003 DOI: 10.1016/j.biologicals.2024.101749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 01/07/2024] [Accepted: 01/29/2024] [Indexed: 02/09/2024] Open
Abstract
Tick-borne pathogens increasingly threaten animal and human health as well as cause great economic loss in the livestock industry. Among these pathogens, Anaplasma ovis causing a decrease in meat and milk yield is frequently detected in sheep in many countries including Turkey. This study aimed to reveal potential vaccine candidate epitopes in Msp4 protein using sequence data from Anaplasma ovis isolates and then to design a multi-epitope protein to be used in vaccine formulations against Anaplasma ovis. For this purpose, Msp4 gene was sequenced from Anaplasma ovis isolates (n:6) detected in ticks collected from sheep in Turkey and the sequence data was compared with previous sequences from different countries in order to detect the variations of Msp4 gene/protein. Potential vaccine candidate and diagnostic epitopes were predicted using various immunoinformatics tools. Among the discovered vaccine candidate epitopes, antigenic and conserved were selected, and then a multi-epitope protein was designed. The designed vaccine protein was tested for the assessment of TLR-2, IgG, and IFN-g responses by molecular docking and immune simulation analyses. Among the discovered epitopes, EVASEGSGVM and YQFTPEISLV epitopes with properties of high antigenicity, non-allergenicity, and non-toxicity were proposed to be used for Anaplasma ovis in further serodiagnostic and vaccine studies.
Collapse
Affiliation(s)
- Ahmet Efe Köseoğlu
- Duisburg-Essen University, Faculty of Chemistry, Department of Environmental Microbiology and Biotechnology, Essen, Germany
| | - Hüseyin Can
- Ege University, Faculty of Science, Department of Biology, Molecular Biology Section, İzmir, Turkiye; Ege University, Vaccine Development Application and Research Center, İzmir, Turkiye; Ege University, Institute of Health Sciences, Department of Vaccine Studies, İzmir, Turkiye
| | - Mervenur Güvendi
- Ege University, Faculty of Science, Department of Biology, Molecular Biology Section, İzmir, Turkiye
| | - Sedef Erkunt Alak
- Ege University, Faculty of Science, Department of Biology, Molecular Biology Section, İzmir, Turkiye; Ege University, Vaccine Development Application and Research Center, İzmir, Turkiye
| | - Aysu Değirmenci Döşkaya
- Ege University, Vaccine Development Application and Research Center, İzmir, Turkiye; Ege University, Institute of Health Sciences, Department of Vaccine Studies, İzmir, Turkiye; Ege University, Faculty of Medicine, Department of Parasitology, İzmir, Turkiye
| | - Muhammet Karakavuk
- Ege University, Vaccine Development Application and Research Center, İzmir, Turkiye; Ege University, Institute of Health Sciences, Department of Vaccine Studies, İzmir, Turkiye; Ege University, Odemis Vocational School, İzmir, Turkiye
| | - Mert Döşkaya
- Ege University, Vaccine Development Application and Research Center, İzmir, Turkiye; Ege University, Institute of Health Sciences, Department of Vaccine Studies, İzmir, Turkiye; Ege University, Faculty of Medicine, Department of Parasitology, İzmir, Turkiye
| | - Cemal Ün
- Ege University, Faculty of Science, Department of Biology, Molecular Biology Section, İzmir, Turkiye; Ege University, Vaccine Development Application and Research Center, İzmir, Turkiye; Ege University, Institute of Health Sciences, Department of Vaccine Studies, İzmir, Turkiye.
| |
Collapse
|
4
|
Russo L, Capra E, Franceschi V, Cavazzini D, Sala R, Lazzari B, Cavirani S, Donofrio G. Characterization of BoHV-4 ORF45. Front Microbiol 2023; 14:1171770. [PMID: 37234529 PMCID: PMC10206056 DOI: 10.3389/fmicb.2023.1171770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 04/12/2023] [Indexed: 05/28/2023] Open
Abstract
Bovine herpesvirus 4 (BoHV-4) is a Gammaherpesvirus belonging to the Rhadinovirus genus. The bovine is BoHV-4's natural host, and the African buffalo is BoHV-4's natural reservoir. In any case, BoHV-4 infection is not associated with a specific disease. Genome structure and genes are well-conserved in Gammaherpesvirus, and the orf 45 gene and its product, ORF45, are one of those. BoHV-4 ORF45 has been suggested to be a tegument protein; however, its structure and function have not yet been experimentally characterized. The present study shows that BoHV-4 ORF45, despite its poor homology with other characterized Rhadinovirus ORF45s, is structurally related to Kaposi's sarcoma-associated herpesvirus (KSHV), is a phosphoprotein, and localizes in the host cell nuclei. Through the generation of an ORF45-null mutant BoHV-4 and its pararevertant, it was possible to demonstrate that ORF45 is essential for BoHV-4 lytic replication and is associated with the viral particles, as for the other characterized Rhadinovirus ORF45s. Finally, the impact of BoHV-4 ORF45 on cellular transcriptome was investigated, an aspect poorly explored or not at all for other Gammaherpesvirus. Many cellular transcriptional pathways were found to be altered, mainly those involving p90 ribosomal S6 kinase (RSK) and signal-regulated kinase (ERK) complex (RSK/ERK). It was concluded that BoHV-4 ORF45 has similar characteristics to those of KSHV ORF45, and its unique and incisive impact on the cell transcriptome paves the way for further investigations.
Collapse
Affiliation(s)
- Luca Russo
- Dipartimento di Scienze Medico Veterinarie, Università di Parma, Parma, Italy
| | - Emanuele Capra
- Istituto di Biologia e Biotecnologia Agraria, Consiglio Nazionale delle Ricerche IBBA CNR, Lodi, Italy
| | | | - Davide Cavazzini
- Dipartimento di Scienze Chimiche, della Vita e della Sostenibilità Ambientale, Università di Parma, Parma, Italy
| | - Roberto Sala
- Dipartimento di Medicina e Chirurgia, Università di Parma, Parma, Italy
| | - Barbara Lazzari
- Istituto di Biologia e Biotecnologia Agraria, Consiglio Nazionale delle Ricerche IBBA CNR, Lodi, Italy
| | - Sandro Cavirani
- Dipartimento di Scienze Medico Veterinarie, Università di Parma, Parma, Italy
| | - Gaetano Donofrio
- Dipartimento di Scienze Medico Veterinarie, Università di Parma, Parma, Italy
| |
Collapse
|
5
|
Biswal MR, Padmanabhan S, Manjithaya R, Prakash MK. Early Bioinformatic Implication of Triacidic Amino Acid Motifs in Autophagy-Dependent Unconventional Secretion of Mammalian Proteins. Front Cell Dev Biol 2022; 10:863825. [PMID: 35646924 PMCID: PMC9136135 DOI: 10.3389/fcell.2022.863825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 04/11/2022] [Indexed: 11/22/2022] Open
Abstract
Several proteins are secreted outside the cell, and in many cases, they may be identified by a characteristic signal peptide. However, more and more studies point to the evidence for an “unconventional” secretion, where proteins without a hitherto unknown signal are secreted, possibly in conditions of starvation. In this work, we analyse a set of 202 RNA binding mammalian proteins, whose unconventional secretion has recently been established. Analysis of these proteins secreted by LC3 mediation, the largest unconventionally secreted dataset to our knowledge, identifies the role of KKX motif as well as triacidic amino acid motif in unconventional secretion, the latter being an extension of the recent implicated diacidic amino acid motif. Further data analysis evolves a hypothesis on the sequence or structural proximity of the triacidic or KKX motifs to the LC3 interacting region, and a phosphorylatable amino acid such as serine as a statistically significant feature among these unconventionally secreted proteins. This hypothesis, although needs to be validated in experiments that challenge the specific details of each of these aspects, appears to be one of the early steps in defining what may be a plausible signal for unconventional protein secretion.
Collapse
Affiliation(s)
- Malay Ranjan Biswal
- Computational Biology, Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Bangalore, India
| | - Sreedevi Padmanabhan
- Autophagy Laboratory, Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Bangalore, India
| | - Ravi Manjithaya
- Autophagy Laboratory, Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Bangalore, India
- *Correspondence: Ravi Manjithaya, ; Meher K. Prakash,
| | - Meher K. Prakash
- Computational Biology, Theoretical Sciences Unit, Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Bangalore, India
- *Correspondence: Ravi Manjithaya, ; Meher K. Prakash,
| |
Collapse
|
6
|
Jiang Y, Wang D, Wang W, Xu D. Computational methods for protein localization prediction. Comput Struct Biotechnol J 2021; 19:5834-5844. [PMID: 34765098 PMCID: PMC8564054 DOI: 10.1016/j.csbj.2021.10.023] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 10/12/2021] [Accepted: 10/13/2021] [Indexed: 12/16/2022] Open
Abstract
The accurate annotation of protein localization is crucial in understanding protein function in tandem with a broad range of applications such as pathological analysis and drug design. Since most proteins do not have experimentally-determined localization information, the computational prediction of protein localization has been an active research area for more than two decades. In particular, recent machine-learning advancements have fueled the development of new methods in protein localization prediction. In this review paper, we first categorize the main features and algorithms used for protein localization prediction. Then, we summarize a list of protein localization prediction tools in terms of their coverage, characteristics, and accessibility to help users find suitable tools based on their needs. Next, we evaluate some of these tools on a benchmark dataset. Finally, we provide an outlook on the future exploration of protein localization methods.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Weiwei Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| |
Collapse
|
7
|
Gürsoy S, Hazan F, Öztürk T, Çolak R, Çalkavur Ş. Evaluation of Sporadic and Familial Cases with Craniofrontonasal Syndrome: A Wide Clinical Spectrum and Identification of a Novel EFNB1 Gene Mutation. Mol Syndromol 2021; 12:269-278. [PMID: 34602953 DOI: 10.1159/000515697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 03/05/2021] [Indexed: 11/19/2022] Open
Abstract
Craniofrontonasal syndrome (CFNS) is a rare X-linked genetic disorder which is characterized by coronal synostosis, widely spaced eyes, a central nasal groove, and various skeletal anomalies. Mutations in the EFNB1 gene in Xq13.1 are responsible for familial and sporadic cases. In the present study, we aimed to evaluate the clinical characteristics and molecular results of 4 patients with CFNS. Genomic DNA was extracted from the peripheral blood lymphocytes of all patients and their parents, and Sanger sequencing of the EFNB1 gene was performed. A novel EFNB1 gene mutation (c.65delG; p.Cys22SerfsTer24) was detected in a newborn who had only dysmorphic facial features and bicornuate uterus. The other 3 patients (2 familial cases and 1 sporadic case) shared the same mutation (c.196C>T; p.R66X). However, the clinical features of these patients were highly variable. Additionally, central (meso-axial) polydactyly and deep palmar creases were detected, which have not been previously reported. CFNS has a wide clinical spectrum, but there is no clear genotype-phenotype correlation. However, central (meso-axial) polydactyly and deep palmar creases may be part of the clinical spectrum seen in CFNS. In addition, our findings expand the mutational spectrum in patients with CFNS.
Collapse
Affiliation(s)
- Semra Gürsoy
- Department of Pediatric Genetics, Dr. Behcet Uz Children's Hospital, Izmir, Turkey
| | - Filiz Hazan
- Department of Medical Genetics, Dr. Behcet Uz Children's Hospital, Izmir, Turkey
| | - Tülay Öztürk
- Department of Pediatric Radiology, Dr. Behcet Uz Children's Hospital, Izmir, Turkey
| | - Rüya Çolak
- Department of Neonatology, Dr. Behcet Uz Children's Hospital, Izmir, Turkey
| | - Şebnem Çalkavur
- Department of Neonatology, Dr. Behcet Uz Children's Hospital, Izmir, Turkey
| |
Collapse
|
8
|
Celik S, Demirag AD, E. Ozel A, Akyuz S. Molecular Structure, Molecular Docking and Absorption, Distribution, Metabolism, Excretion and Toxicity study of cellulose II. J CHIN CHEM SOC-TAIP 2021. [DOI: 10.1002/jccs.202000515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Sefa Celik
- Faculty of Science, Department of Physics Istanbul University Istanbul Turkey
| | - Aliye Demet Demirag
- Department of Physics, Institute of Graduate Studies in Sciences Istanbul University Istanbul Turkey
| | - Aysen E. Ozel
- Faculty of Science, Department of Physics Istanbul University Istanbul Turkey
| | - Sevim Akyuz
- Faculty of Science and Letters, Department of Physics Istanbul Kultur University Istanbul Turkey
| |
Collapse
|
9
|
Can H, Köseoğlu AE, Erkunt Alak S, Güvendi M, Döşkaya M, Karakavuk M, Gürüz AY, Ün C. In silico discovery of antigenic proteins and epitopes of SARS-CoV-2 for the development of a vaccine or a diagnostic approach for COVID-19. Sci Rep 2020; 10:22387. [PMID: 33372181 PMCID: PMC7769971 DOI: 10.1038/s41598-020-79645-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 12/10/2020] [Indexed: 12/11/2022] Open
Abstract
In the genome of SARS-CoV-2, the 5′-terminus encodes a polyprotein, which is further cleaved into 15 non-structural proteins whereas the 3′ terminus encodes four structural proteins and eight accessory proteins. Among these 27 proteins, the present study aimed to discover likely antigenic proteins and epitopes to be used for the development of a vaccine or serodiagnostic assay using an in silico approach. For this purpose, after the full genome analysis of SARS-CoV-2 Wuhan isolate and variant proteins that are detected frequently, surface proteins including spike, envelope, and membrane proteins as well as proteins with signal peptide were determined as probable vaccine candidates whereas the remaining were considered as possible antigens to be used during the development of serodiagnostic assays. According to results obtained, among 27 proteins, 26 of them were predicted as probable antigen. In 26 proteins, spike protein was selected as the best vaccine candidate because of having a signal peptide, negative GRAVY value, one transmembrane helix, moderate aliphatic index, a big molecular weight, a long-estimated half-life, beta wrap motifs as well as having stable, soluble and non-allergic features. In addition, orf7a, orf8, and nsp-10 proteins with signal peptide were considered as potential vaccine candidates. Nucleocapsid protein and a highly antigenic GGDGKMKD epitope were identified as ideal antigens to be used in the development of serodiagnostic assays. Moreover, considering MHC-I alleles, highly antigenic KLNDLCFTNV and ITLCFTLKRK epitopes can be used to develop an epitope-based peptide vaccine.
Collapse
Affiliation(s)
- Hüseyin Can
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey
| | - Ahmet Efe Köseoğlu
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey
| | - Sedef Erkunt Alak
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey
| | - Mervenur Güvendi
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey
| | - Mert Döşkaya
- Department of Parasitology, Faculty of Medicine, Ege University, Bornova, İzmir, Turkey
| | | | - Adnan Yüksel Gürüz
- Department of Parasitology, Faculty of Medicine, Ege University, Bornova, İzmir, Turkey
| | - Cemal Ün
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey.
| |
Collapse
|
10
|
Ding Y, Tang J, Guo F. Human protein subcellular localization identification via fuzzy model on Kernelized Neighborhood Representation. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106596] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
11
|
Shakweer WMES, Abd EL-Rahman HH. Cloning, nucleotide sequencing, and bioinformatics analyses of growth hormone mRNA of Assaf sheep and Boer goats reared in Egypt. J Genet Eng Biotechnol 2020; 18:30. [PMID: 32661950 PMCID: PMC7359211 DOI: 10.1186/s43141-020-00046-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Accepted: 06/26/2020] [Indexed: 11/10/2022]
Abstract
BACKGROUND Identification of molecular characterization of genes underlying livestock productive traits may allow applying advanced biotechnology techniques to improve animal productivity. Growth hormone (GH) controls body growth rate, milk production, reproduction as well as carbohydrate, lipid, and protein metabolism. Therefore, the present study aims to investigate the genetic variations of growth hormone cDNA sequences between Assaf sheep (As_GH) and Boer goat (Bo_GH) that mainly used for genetic improvement in Egypt using bioinformatics analysis. Growth hormone cDNA was isolated from the pituitary gland tissue of Assaf sheep Boer goat and subcloned into pTZ57R/T cloning vector for sequencing. RESULTS Molecular weight of As_GH cDNA was 665 bp and was 774 bp for Bo_GH cDNA. The complete coding sequences (CDS) of As_GH and Bo_GH were registered in the GenBank database under accession number (AC: MH128986 and AC: MG744290, respectively). High homology percentage was observed (99.5%) between AS_GH and Bo_GH protein sequences with one different amino acid in the As_GH protein sequence (Arg194). The protein sequence of As_GH has only one motif signature; Somatotropin_1 from 79 to 112 aa compared to Bo_GH protein sequences and GenBank database that had two motifs signature. The growth hormone cDNA sequence of Assaf sheep has a unique three single nucleotide polymorphisms (SNPs) (A637A638G639) that encodes for arginine (Arg194); this insertion mutation (AAG) was not found in the growth hormone cDNA sequences of Boer goat in the present study and GenBank database breeds. This mutation can be used to develop SNPs markers for Assaf sheep. CONCLUSIONS GH sequence of Assaf and Boer goat is highly conserved and the homogeny in the codon region (99.5%). The Assaf sheep GH sequence has a unique three SNPs that may be used to develop SNPs markers for such breed. Further studies are needed to investigate the genetic variations of growth hormone gene in different sheep and goat breeds in Egypt and document the relationship between these variations and the productive performance of animals.
Collapse
Affiliation(s)
- Waleid Mohamed El-Sayed Shakweer
- Animal Production Department, Agricultural and Biological Research, Division, National Research Centre, 33 El Bohouth St. (Former El-Tahrir St.), Dokki, Giza, P.O. 12622 Egypt
| | - Hashem Hamed Abd EL-Rahman
- Animal Production Department, Agricultural and Biological Research, Division, National Research Centre, 33 El Bohouth St. (Former El-Tahrir St.), Dokki, Giza, P.O. 12622 Egypt
| |
Collapse
|
12
|
Shahzadi Z, Abbas G, Azam SS. Relational dynamics obtained through simulation studies of thioredoxin reductase: From a multi-drug resistant Entamoeba histolytica. J Mol Liq 2020. [DOI: 10.1016/j.molliq.2020.112939] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
13
|
Some illuminating remarks on molecular genetics and genomics as well as drug development. Mol Genet Genomics 2020; 295:261-274. [PMID: 31894399 DOI: 10.1007/s00438-019-01634-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 02/07/2023]
Abstract
Facing the explosive growth of biological sequences unearthed in the post-genomic age, one of the most important but also most difficult problems in computational biology is how to express a biological sequence with a discrete model or a vector, but still keep it with considerable sequence-order information or its special pattern. To deal with such a challenging problem, the ideas of "pseudo amino acid components" and "pseudo K-tuple nucleotide composition" have been proposed. The ideas and their approaches have further stimulated the birth for "distorted key theory", "wenxing diagram", and substantially strengthening the power in treating the multi-label systems, as well as the establishment of the famous "5-steps rule". All these logic developments are quite natural that are very useful not only for theoretical scientists but also for experimental scientists in conducting genetics/genomics analysis and drug development. Presented in this review paper are also their future perspectives; i.e., their impacts will become even more significant and propounding.
Collapse
|
14
|
pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset. Genomics 2019; 111:1274-1282. [DOI: 10.1016/j.ygeno.2018.08.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 08/14/2018] [Accepted: 08/16/2018] [Indexed: 12/17/2022]
|
15
|
Chou KC. Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs. Curr Med Chem 2019; 26:4918-4943. [PMID: 31060481 DOI: 10.2174/0929867326666190507082559] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 12/16/2022]
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
16
|
Su ZD, Huang Y, Zhang ZY, Zhao YW, Wang D, Chen W, Chou KC, Lin H. iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 2019; 34:4196-4204. [PMID: 29931187 DOI: 10.1093/bioinformatics/bty508] [Citation(s) in RCA: 150] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 06/19/2018] [Indexed: 12/20/2022] Open
Abstract
Motivation Long non-coding RNAs (lncRNAs) are a class of RNA molecules with more than 200 nucleotides. They have important functions in cell development and metabolism, such as genetic markers, genome rearrangements, chromatin modifications, cell cycle regulation, transcription and translation. Their functions are generally closely related to their localization in the cell. Therefore, knowledge about their subcellular locations can provide very useful clues or preliminary insight into their biological functions. Although biochemical experiments could determine the localization of lncRNAs in a cell, they are both time-consuming and expensive. Therefore, it is highly desirable to develop bioinformatics tools for fast and effective identification of their subcellular locations. Results We developed a sequence-based bioinformatics tool called 'iLoc-lncRNA' to predict the subcellular locations of LncRNAs by incorporating the 8-tuple nucleotide features into the general PseKNC (Pseudo K-tuple Nucleotide Composition) via the binomial distribution approach. Rigorous jackknife tests have shown that the overall accuracy achieved by the new predictor on a stringent benchmark dataset is 86.72%, which is over 20% higher than that by the existing state-of-the-art predictor evaluated on the same tests. Availability and implementation A user-friendly webserver has been established at http://lin-group.cn/server/iLoc-LncRNA, by which users can easily obtain their desired results. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen-Dong Su
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yan Huang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zhao-Yue Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Ya-Wei Zhao
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Dong Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Boston, MA, USA
| |
Collapse
|
17
|
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
18
|
Xiao X, Cheng X, Chen G, Mao Q, Chou KC. pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset. Med Chem 2019; 15:496-509. [DOI: 10.2174/1573406415666181217114710] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 12/17/2022]
Abstract
Background/Objective:Knowledge of protein subcellular localization is vitally important for both basic research and drug development. Facing the avalanche of protein sequences emerging in the post-genomic age, it is urgent to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called “pLoc-mVirus” was developed for identifying the subcellular localization of virus proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, known as “multiplex proteins”, may simultaneously occur in, or move between two or more subcellular location sites. Despite the fact that it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mVirus was trained by an extremely skewed dataset in which some subset was over 10 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset.Methods:Using the Chou's general PseAAC (Pseudo Amino Acid Composition) approach and the IHTS (Inserting Hypothetical Training Samples) treatment to balance out the training dataset, we have developed a new predictor called “pLoc_bal-mVirus” for predicting the subcellular localization of multi-label virus proteins.Results:Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mVirus, the existing state-of-theart predictor for the same purpose.Conclusion:Its user-friendly web-server is available at http://www.jci-bioinfo.cn/pLoc_balmVirus/, by which the majority of experimental scientists can easily get their desired results without the need to go through the detailed complicated mathematics. Accordingly, pLoc_bal-mVirus will become a very useful tool for designing multi-target drugs and in-depth understanding of the biological process in a cell.
Collapse
Affiliation(s)
- Xuan Xiao
- Gordon Life Science Institute, Boston, MA 02478, United States
| | - Xiang Cheng
- Gordon Life Science Institute, Boston, MA 02478, United States
| | - Genqiang Chen
- College of Chemistry, Chemical Engineering and Biotechnology, Donghua University, Shanghai 201620, China
| | - Qi Mao
- College of Information Science and Technology, Donghua University, Shanghai, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
19
|
Characterization of human proteins with different subcellular localizations by topological and biological properties. Genomics 2018; 111:1831-1838. [PMID: 30543849 DOI: 10.1016/j.ygeno.2018.12.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 12/02/2018] [Accepted: 12/07/2018] [Indexed: 11/20/2022]
Abstract
Knowing the protein localization can provide valuable information resource for elucidating protein function. In recent years, with the advances of human genomics and proteomics, it is possible to characterize human proteins that are located in different subcellular localizations. In this study, we used the topological properties and biological properties to characterize human proteins with six subcellular localizations. Almost all of these properties were found to be significantly different among six protein categories. Network topology analysis indicated that several significant topological properties, including the degree and k-core, were higher for the mitochondrial proteins. Biological property analysis showed that the nuclear proteins appeared to be correlated with important biological function. We hope these findings may provide some important help for comprehensive understanding the biological function of proteins, and prediction of protein subcellular localizations in human.
Collapse
|
20
|
Fagerquist CK, Zaragoza WJ. Proteolytic Surface-Shaving and Serotype-Dependent Expression of SPI-1 Invasion Proteins in Salmonella enterica Subspecies enterica. Front Nutr 2018; 5:124. [PMID: 30619870 PMCID: PMC6295468 DOI: 10.3389/fnut.2018.00124] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Accepted: 11/23/2018] [Indexed: 12/15/2022] Open
Abstract
We performed proteolytic surface-shaving with trypsin on three strains/sevovars of Salmonella enterica enterica (SEE): Newport, Kentucky, and Thompson. Surfaced-exposed proteins of live bacterial cells were digested for 15 min. A separate 20 h re-digestion was also performed on the supernatant of each shaving experiment to more completely digest protein fragments into detectable peptides for proteomic analysis by nano-liquid chromatography-electrospray ionization-Orbitrap mass spectrometry. Control samples (i.e., no trypsin during surface-shaving step) were also performed in parallel. We detected peptides of flagella proteins: FliC (filament), FliD (cap), and FlgL (hook-filament junction) as well as peptides of FlgM (anti-σ28 factor), i.e., the negative regulator of flagella synthesis. For SEE Newport and Thompson, we detected Salmonella pathogenicity island 1 (SPI-1) secreted effector/invasion proteins: SipA, SipB, SipC, and SipD, whereas no Sip proteins were detected in control samples. No Sip proteins were detected for SEE Kentucky (or its control) although sip genes were confirmed to be present. Our results may suggest a biological response (<15 min) to proteolysis of live cells for these SEE strains and, in the case of Newport and Thompson, a possible invasion response.
Collapse
Affiliation(s)
- Clifton K Fagerquist
- Produce Safety & Microbiology Research Unit, Western Regional Research Center, Agricultural Research Service, U.S. Department of Agriculture, Albany, CA, United States
| | - William J Zaragoza
- Produce Safety & Microbiology Research Unit, Western Regional Research Center, Agricultural Research Service, U.S. Department of Agriculture, Albany, CA, United States
| |
Collapse
|
21
|
Cheng X, Xiao X, Chou KC. pLoc_bal-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by quasi-balancing training dataset and general PseAAC. J Theor Biol 2018; 458:92-102. [DOI: 10.1016/j.jtbi.2018.09.005] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 09/05/2018] [Accepted: 09/07/2018] [Indexed: 01/03/2023]
|
22
|
Padmanabhan S, Biswal MR, Manjithaya R, Prakash MK. Exploring the context of diacidic motif DE as a signal for unconventional protein secretion in eukaryotic proteins. Wellcome Open Res 2018; 3:148. [PMID: 30607372 PMCID: PMC6305234 DOI: 10.12688/wellcomeopenres.14914.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/09/2018] [Indexed: 12/18/2022] Open
Abstract
Unconventional protein secretion (UPS) is an important phenomenon with fundamental implications to cargo export. How eukaryotic proteins transported by UPS are recognized without a conventional signal peptide has been an open question. It was recently observed that a diacidic amino acid motif (ASP-GLU or DE) is necessary for the secretion of superoxide dismutase 1 (SOD1) from yeast under nutrient starvation. Taking cue from this discovery, we explore the hypothesis of whether the diacidic motif DE, which can occur fairly ubiquitously, along with its context, can be a generic signal for unconventional secretion of proteins. Four different contexts were evaluated: a physical context encompassing the structural order and charge signature in the neighbourhood of DE, two signalling contexts reflecting the presence of either a phosphorylatable amino acid ('X' in XDE, DXE, DEX) or an LC3 interacting region (LIR) which can trigger autophagy and a co-evolutionary constraint relative to other amino acids in the protein interpreted by examining sequences across different species. Among the 100 proteins we curated from different physiological or pathological conditions, we observe a pattern in the unconventional secretion of heat shock proteins in the cancer secretome, where DE in an ordered structural region has higher odds of being a UPS signal.
Collapse
Affiliation(s)
- Sreedevi Padmanabhan
- Autophagy Laboratory, Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre For Advanced Scientific Research, Bangalore, Karnataka, 560064, India
| | - Malay Ranjan Biswal
- Computational Biophysics Group, Theoretical Sciences Unit, Jawaharlal Nehru Centre For Advanced Scientific Research, Bangalore, Karnataka, 560064, India
| | - Ravi Manjithaya
- Autophagy Laboratory, Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre For Advanced Scientific Research, Bangalore, Karnataka, 560064, India
| | - Meher K Prakash
- Computational Biophysics Group, Theoretical Sciences Unit, Jawaharlal Nehru Centre For Advanced Scientific Research, Bangalore, Karnataka, 560064, India
| |
Collapse
|
23
|
Shen Y, Tang J, Guo F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC. J Theor Biol 2018; 462:230-239. [PMID: 30452958 DOI: 10.1016/j.jtbi.2018.11.012] [Citation(s) in RCA: 106] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 11/07/2018] [Accepted: 11/15/2018] [Indexed: 01/07/2023]
Abstract
Identifying the location of proteins in a cell plays an important role in understanding their functions, such as drug design, therapeutic target discovery and biological research. However, the traditional subcellular localization experiments are time-consuming, laborious and small scale. With the development of next-generation sequencing technology, the number of proteins has grown exponentially, which lays the foundation of the computational method for identifying protein subcellular localization. Although many methods for predicting subcellular localization of proteins have been proposed, most of them are limited to single-location. In this paper, we propose a multi-kernel SVM to predict subcellular localization of both multi-location and single-location proteins. First, we make use of the evolutionary information extracted from position specific scoring matrix (PSSM) and physicochemical properties of proteins, by Chou's general PseAAC and other efficient functions. Then, we propose a multi-kernel support vector machine (SVM) model to identify multi-label protein subcellular localization. As a result, our method has a good performance on predicting subcellular localization of proteins. It achieves an average precision of 0.7065 and 0.6889 on two human datasets, respectively. All results are higher than those achieved by other existing methods. Therefore, we provide an efficient system via a novel perspective to study the protein subcellular localization.
Collapse
Affiliation(s)
- Yinan Shen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China; School of Computational Science and Engineering, University of South Carolina, Columbia, USA.
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
| |
Collapse
|
24
|
Zhang Q, Wang S, Pan Y, Su D, Lu Q, Zuo Y, Yang L. Characterization of proteins in different subcellular localizations for Escherichia coli K12. Genomics 2018; 111:1134-1141. [PMID: 30026105 DOI: 10.1016/j.ygeno.2018.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Revised: 07/07/2018] [Accepted: 07/11/2018] [Indexed: 10/28/2022]
Abstract
Knowing the comprehensive knowledge about the protein subcellular localization is an important step to understand the function of the proteins. Recent advances in system biology have allowed us to develop more accurate methods for characterizing the proteins at subcellular localization level. In this study, the analysis method was developed to characterize the topological properties and biological properties of the cytoplasmic proteins, inner membrane proteins, outer membrane proteins and periplasmic proteins in Escherichia coli (E. coli). Statistical significant differences were found in all topological properties and biological properties among proteins in different subcellular localizations. In addition, investigation was carried out to analyze the differences in 20 amino acid compositions for four protein categories. We also found that there were significant differences in all of the 20 amino acid compositions. These findings may be helpful for understanding the comprehensive relationship between protein subcellular localization and biological function.
Collapse
Affiliation(s)
- Qi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation, Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
25
|
Evans BA, Smith OL, Pickerill ES, York MK, Buenconsejo KJP, Chambers AE, Bernstein DA. Restriction digest screening facilitates efficient detection of site-directed mutations introduced by CRISPR in C. albicans UME6. PeerJ 2018; 6:e4920. [PMID: 29892505 PMCID: PMC5994162 DOI: 10.7717/peerj.4920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 05/17/2018] [Indexed: 01/14/2023] Open
Abstract
Introduction of point mutations to a gene of interest is a powerful tool when determining protein function. CRISPR-mediated genome editing allows for more efficient transfer of a desired mutation into a wide range of model organisms. Traditionally, PCR amplification and DNA sequencing is used to determine if isolates contain the intended mutation. However, mutation efficiency is highly variable, potentially making sequencing costly and time consuming. To more efficiently screen for correct transformants, we have identified restriction enzymes sites that encode for two identical amino acids or one or two stop codons. We used CRISPR to introduce these restriction sites directly upstream of the Candida albicans UME6 Zn2+-binding domain, a known regulator of C. albicans filamentation. While repair templates coding for different restriction sites were not equally successful at introducing mutations, restriction digest screening enabled us to rapidly identify isolates with the intended mutation in a cost-efficient manner. In addition, mutated isolates have clear defects in filamentation and virulence compared to wild type C. albicans. Our data suggest restriction digestion screening efficiently identifies point mutations introduced by CRISPR and streamlines the process of identifying residues important for a phenotype of interest.
Collapse
Affiliation(s)
- Ben A Evans
- Department of Biology, Ball State University, Muncie, IN, United States of America
| | - Olivia L Smith
- Department of Biology, Ball State University, Muncie, IN, United States of America
| | - Ethan S Pickerill
- Department of Biology, Ball State University, Muncie, IN, United States of America
| | - Mary K York
- Department of Biology, Ball State University, Muncie, IN, United States of America
| | - Kristen J P Buenconsejo
- Department of Microbiology and Immunology, Drexel University, Philadelphia, PA, United States of America
| | - Antonio E Chambers
- Department of Biology, Ball State University, Muncie, IN, United States of America
| | - Douglas A Bernstein
- Department of Biology, Ball State University, Muncie, IN, United States of America
| |
Collapse
|
26
|
pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics 2018; 111:886-892. [PMID: 29842950 DOI: 10.1016/j.ygeno.2018.05.017] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 05/14/2018] [Accepted: 05/18/2018] [Indexed: 12/12/2022]
Abstract
Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called "pLoc-mGpos" was developed for identifying the subcellular localization of Gram-positive bacterial proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called "multiplex proteins", may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mGpos was trained by an extremely skewed dataset in which some subset (subcellular location) was over 11 times the size of the other subsets. Accordingly, it cannot avoid the bias consequence caused by such an uneven training dataset. To alleviate such bias consequence, we have developed a new and bias-reducing predictor called pLoc_bal-mGpos by quasi-balancing the training dataset. Rigorous target jackknife tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mGpos, the existing state-of-the-art predictor in identifying the subcellular localization of Gram-positive bacterial proteins. To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mGpos/, by which users can easily get their desired results without the need to go through the detailed mathematics.
Collapse
|
27
|
Characterizing Cancer Drug Response and Biological Correlates: A Geometric Network Approach. Sci Rep 2018; 8:6402. [PMID: 29686393 PMCID: PMC5913269 DOI: 10.1038/s41598-018-24679-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 04/09/2018] [Indexed: 12/13/2022] Open
Abstract
In the present work, we apply a geometric network approach to study common biological features of anticancer drug response. We use for this purpose the panel of 60 human cell lines (NCI-60) provided by the National Cancer Institute. Our study suggests that mathematical tools for network-based analysis can provide novel insights into drug response and cancer biology. We adopted a discrete notion of Ricci curvature to measure, via a link between Ricci curvature and network robustness established by the theory of optimal mass transport, the robustness of biological networks constructed with a pre-treatment gene expression dataset and coupled the results with the GI50 response of the cell lines to the drugs. Based on the resulting drug response ranking, we assessed the impact of genes that are likely associated with individual drug response. For genes identified as important, we performed a gene ontology enrichment analysis using a curated bioinformatics database which resulted in biological processes associated with drug response across cell lines and tissue types which are plausible from the point of view of the biological literature. These results demonstrate the potential of using the mathematical network analysis in assessing drug response and in identifying relevant genomic biomarkers and biological processes for precision medicine.
Collapse
|
28
|
Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. J Theor Biol 2018; 437:239-250. [DOI: 10.1016/j.jtbi.2017.10.030] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 09/29/2017] [Accepted: 10/27/2017] [Indexed: 12/27/2022]
|
29
|
Kameshwar AKS, Barber R, Qin W. Comparative modeling and molecular docking analysis of white, brown and soft rot fungal laccases using lignin model compounds for understanding the structural and functional properties of laccases. J Mol Graph Model 2017; 79:15-26. [PMID: 29127854 DOI: 10.1016/j.jmgm.2017.10.019] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/25/2017] [Accepted: 10/25/2017] [Indexed: 11/19/2022]
Abstract
Extrinsic catalytic properties of laccase enable it to oxidize a wide range of aromatic (phenolic and non-phenolic) compounds which makes it commercially an important enzyme. In this study, we have extensively compared and analyzed the physico-chemical, structural and functional properties of white, brown and soft rot fungal laccases using standard protein analysis software. We have computationally predicted the three-dimensional comparative models of these laccases and later performed the molecular docking studies using the lignin model compounds. We also report a customizable rapid and reliable protein modelling and docking pipeline for developing structurally and functionally stable protein structures. We have observed that soft rot fungal laccases exhibited comparatively higher structural variation (higher random coil) when compared to brown and white rot fungal laccases. White and brown rot fungal laccase sequences exhibited higher similarity for conserved domains of Trametes versicolor laccase, whereas soft rot fungal laccases shared higher similarity towards conserved domains of Melanocarpus albomyces laccase. Results obtained from molecular docking studies showed that aminoacids PRO, PHE, LEU, LYS and GLN were commonly found to interact with the ligands. We have also observed that white and brown rot fungal laccases showed similar docking patterns (topologically monomer, dimer and trimer bind at same pocket location and tetramer binds at another pocket location) when compared to soft rot fungal laccases. Finally, the binding efficiencies of white and brown rot fungal laccases with lignin model compounds were higher compared to the soft rot fungi. These findings can be further applied in developing genetically efficient laccases which can be applied in growing biofuel and bioremediation industries.
Collapse
Affiliation(s)
| | - Richard Barber
- Department of Biology, Lakehead University, 955 Oliver Road, Thunder Bay, Ontario, P7 B 5E1, Canada
| | - Wensheng Qin
- Department of Biology, Lakehead University, 955 Oliver Road, Thunder Bay, Ontario, P7 B 5E1, Canada.
| |
Collapse
|
30
|
Cheng X, Xiao X, Chou KC. pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information. Bioinformatics 2017; 34:1448-1456. [DOI: 10.1093/bioinformatics/btx711] [Citation(s) in RCA: 127] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 10/31/2017] [Indexed: 01/19/2023] Open
Affiliation(s)
- Xiang Cheng
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
| | - Xuan Xiao
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
| | - Kuo-Chen Chou
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
31
|
Cheng X, Xiao X, Chou KC. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics 2017; 110:S0888-7543(17)30102-7. [PMID: 28989035 DOI: 10.1016/j.ygeno.2017.10.002] [Citation(s) in RCA: 92] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 09/28/2017] [Accepted: 10/04/2017] [Indexed: 01/21/2023]
Abstract
Information of the proteins' subcellular localization is crucially important for revealing their biological functions in a cell, the basic unit of life. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational tools for timely identifying their subcellular locations based on the sequence information alone. The current study is focused on the Gram-negative bacterial proteins. Although considerable efforts have been made in protein subcellular prediction, the problem is far from being solved yet. This is because mounting evidences have indicated that many Gram-negative bacterial proteins exist in two or more location sites. Unfortunately, most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions important for both basic research and drug design. In this study, by using the multi-label theory, we developed a new predictor called "pLoc-mGneg" for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple locations. Rigorous cross-validation on a high quality benchmark dataset indicated that the proposed predictor is remarkably superior to "iLoc-Gneg", the state-of-the-art predictor for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for the novel predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGneg/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Collapse
Affiliation(s)
- Xiang Cheng
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
32
|
Cheng X, Zhao SG, Lin WZ, Xiao X, Chou KC. pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics 2017; 33:3524-3531. [DOI: 10.1093/bioinformatics/btx476] [Citation(s) in RCA: 167] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/22/2017] [Indexed: 12/24/2022] Open
Affiliation(s)
- Xiang Cheng
- College of Information Science and Technology, Donghua University, Shanghai, China
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Shu-Guang Zhao
- College of Information Science and Technology, Donghua University, Shanghai, China
| | - Wei-Zhong Lin
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
- The Gordon Life Science Institute, Boston, MA, USA
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA, USA
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
33
|
Bezabih G, Cheng H, Han B, Feng M, Xue Y, Hu H, Li J. Phosphoproteome Analysis Reveals Phosphorylation Underpinnings in the Brains of Nurse and Forager Honeybees (Apis mellifera). Sci Rep 2017; 7:1973. [PMID: 28512345 PMCID: PMC5434016 DOI: 10.1038/s41598-017-02192-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 04/24/2017] [Indexed: 11/09/2022] Open
Abstract
The honeybee brain is a central organ in regulating wide ranges of honeybee biology, including life transition from nurse to forager bees. Knowledge is still lacking on how protein phosphorylation governs the neural activity to drive the age-specific labor division. The cerebral phosphoproteome of nurse and forager honeybees was characterized using Ti4+-IMAC phosphopeptide enrichment mass-spectrometry-based proteomics and protein kinases (PKs) were predicted. There were 3,077 phosphosites residing on 3,234 phosphopeptides from 1004 phosphoproteins in the nurse bees. For foragers the numbers were 3,056, 3,110, and 958, respectively. Notably, among the total 231 PKs in honeybee proteome, 179 novel PKs were predicted in the honeybee brain, of which 88 were experimentally identified. Proteins involved in wide scenarios of pathways were phosphorylated depending on age: glycolysis/gluconeogenesis, AGE/RAGE and phosphorylation in nurse bees and metal ion transport, ATP metabolic process and phototransduction in forager bees. These observations suggest that phosphorylation is vital to the tuning of protein activity to regulate cerebral function according to the biological duties as nursing and foraging bees. The data provides valuable information on phosphorylation signaling in the honeybee brain and potentially useful resource to understand the signaling mechanism in honeybee neurobiology and in other social insects as well.
Collapse
Affiliation(s)
- Gebreamlak Bezabih
- Institute of Apicultural Research/Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture, Chinese Academy of Agricultural Science, Beijing, 100093, China
| | - Han Cheng
- School of Life Sciences, Zhengzhou University, Zhengzhou, Henan, 450001, China
| | - Bin Han
- Institute of Apicultural Research/Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture, Chinese Academy of Agricultural Science, Beijing, 100093, China
| | - Mao Feng
- Institute of Apicultural Research/Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture, Chinese Academy of Agricultural Science, Beijing, 100093, China
| | - Yu Xue
- Department of Bioinformatics & Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China
| | - Han Hu
- Institute of Apicultural Research/Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture, Chinese Academy of Agricultural Science, Beijing, 100093, China
| | - Jianke Li
- Institute of Apicultural Research/Key Laboratory of Pollinating Insect Biology, Ministry of Agriculture, Chinese Academy of Agricultural Science, Beijing, 100093, China.
| |
Collapse
|
34
|
Salvatore M, Warholm P, Shu N, Basile W, Elofsson A. SubCons: a new ensemble method for improved human subcellular localization predictions. Bioinformatics 2017; 33:2464-2470. [DOI: 10.1093/bioinformatics/btx219] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Accepted: 04/11/2017] [Indexed: 12/24/2022] Open
Affiliation(s)
- M Salvatore
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - P Warholm
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - N Shu
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
- Sweden Bioinformatics Infrastructure for Life Sciences (BILS), Stockholm University, Solna, Stockholm, Sweden
| | - W Basile
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - A Elofsson
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Solna, Sweden
| |
Collapse
|
35
|
López M, Gómez E, Faye C, Gerentes D, Paul W, Royo J, Hueros G, Muñiz LM. zmsbt1 and zmsbt2, two new subtilisin-like serine proteases genes expressed in early maize kernel development. PLANTA 2017; 245:409-424. [PMID: 27830397 DOI: 10.1007/s00425-016-2615-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 10/27/2016] [Indexed: 06/06/2023]
Abstract
Two subtilisin-like proteases show highly specific and complementary expression patterns in developing grains. These genes label the complete surface of the filial-maternal interface, suggesting a role in filial epithelial differentiation. The cereal endosperm is the most important source of nutrition and raw materials for mankind, as well as the storage compartment enabling initial growth of the germinating plantlets. The development of the different cell types in this tissue is regulated environmentally, genetically and epigenetically, resulting in the formation of top-bottom, adaxial-abaxial and surface-central axes. However, the mechanisms governing the interactions among the different inputs are mostly unknown. We have screened a kernel cDNA library for tissue-specific transcripts as initial step to identify genes relevant in cell differentiation. We report here on the isolation of two maize subtilisin-related genes that show grain-specific, surficial expression. zmsbt1 (Zea mays Subtilisin1) is expressed at the developing aleurone in a time-regulated manner, while zmsbt2 concentrates at the pedicel in front of the endosperm basal transfer layer. We have shown that their presence, early in the maize caryopsis development, is dependent on proper initial tissue determination, and have isolated their promoters to produce transgenic reporter lines that assist in the study of their regulation.
Collapse
Affiliation(s)
- Maribel López
- Departamento Biomedicina and Biotecnología (Genética), Universidad de Alcalá, Alcalá de Henares, Spain
| | - Elisa Gómez
- Departamento Biomedicina and Biotecnología (Genética), Universidad de Alcalá, Alcalá de Henares, Spain
| | - Christian Faye
- GM Trait Discovery, Biogemma, Centre de Recherche de Chappes, Chappes, France
| | - Denise Gerentes
- GM Trait Discovery, Biogemma, Centre de Recherche de Chappes, Chappes, France
| | - Wyatt Paul
- GM Trait Discovery, Biogemma, Centre de Recherche de Chappes, Chappes, France
| | - Joaquín Royo
- Departamento Biomedicina and Biotecnología (Genética), Universidad de Alcalá, Alcalá de Henares, Spain
| | - Gregorio Hueros
- Departamento Biomedicina and Biotecnología (Genética), Universidad de Alcalá, Alcalá de Henares, Spain.
| | - Luis M Muñiz
- Departamento Biomedicina and Biotecnología (Genética), Universidad de Alcalá, Alcalá de Henares, Spain
| |
Collapse
|
36
|
Xiao X, Cheng X, Su S, Mao Q, Chou KC. pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.99032] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
37
|
Cheng X, Xiao X, Chou KC. pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. MOLECULAR BIOSYSTEMS 2017; 13:1722-1727. [DOI: 10.1039/c7mb00267j] [Citation(s) in RCA: 172] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
One of the fundamental goals in cellular biochemistry is to identify the functions of proteins in the context of compartments that organize them in the cellular environment.
Collapse
Affiliation(s)
- Xiang Cheng
- Computer Department
- Jingdezhen Ceramic Institute
- Jingdezhen
- China
| | - Xuan Xiao
- Computer Department
- Jingdezhen Ceramic Institute
- Jingdezhen
- China
- The Gordon Life Science Institute
| | - Kuo-Chen Chou
- The Gordon Life Science Institute
- Boston
- USA
- Center for Informational Biology
- University of Electronic Science and Technology of China
| |
Collapse
|
38
|
Wan S, Mak MW, Kung SY. Ensemble Linear Neighborhood Propagation for Predicting Subchloroplast Localization of Multi-Location Proteins. J Proteome Res 2016; 15:4755-4762. [DOI: 10.1021/acs.jproteome.6b00686] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Shibiao Wan
- Department
of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Man-Wai Mak
- Department
of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Sun-Yuan Kung
- Department
of Electrical Engineering, Princeton University, New Jersey 08540, United States
| |
Collapse
|
39
|
Bakhtyukov AA, Galkina OV, Eshchenko ND. The activities of key antioxidant enzymes in the early postnatal development of rats. NEUROCHEM J+ 2016. [DOI: 10.1134/s1819712416030041] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
40
|
Wan S, Mak MW, Kung SY. Mem-mEN: Predicting Multi-Functional Types of Membrane Proteins by Interpretable Elastic Nets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:706-718. [PMID: 26336143 DOI: 10.1109/tcbb.2015.2474407] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Membrane proteins play important roles in various biological processes within organisms. Predicting the functional types of membrane proteins is indispensable to the characterization of membrane proteins. Recent studies have extended to predicting single- and multi-type membrane proteins. However, existing predictors perform poorly and more importantly, they are often lack of interpretability. To address these problems, this paper proposes an efficient predictor, namely Mem-mEN, which can produce sparse and interpretable solutions for predicting membrane proteins with single- and multi-label functional types. Given a query membrane protein, its associated gene ontology (GO) information is retrieved by searching a compact GO-term database with its homologous accession number, which is subsequently classified by a multi-label elastic net (EN) classifier. Experimental results show that Mem-mEN significantly outperforms existing state-of-the-art membrane-protein predictors. Moreover, by using Mem-mEN, 338 out of more than 7,900 GO terms are found to play more essential roles in determining the functional types. Based on these 338 essential GO terms, Mem-mEN can not only predict the functional type of a membrane protein, but also explain why it belongs to that type. For the reader's convenience, the Mem-mEN server is available online at http://bioinfo.eie.polyu.edu.hk/MemmENServer/.
Collapse
|
41
|
Wan S, Mak MW, Kung SY. Mem-ADSVM: A two-layer multi-label predictor for identifying multi-functional types of membrane proteins. J Theor Biol 2016; 398:32-42. [DOI: 10.1016/j.jtbi.2016.03.013] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Revised: 03/07/2016] [Accepted: 03/07/2016] [Indexed: 02/06/2023]
|
42
|
Wan S, Mak MW, Kung SY. Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins. BMC Bioinformatics 2016; 17:97. [PMID: 26911432 PMCID: PMC4765148 DOI: 10.1186/s12859-016-0940-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 01/27/2016] [Indexed: 11/10/2022] Open
Abstract
Background Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. Results This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and 429 out of more than 8,000 GO terms, respectively, which play essential roles in determining subcellular localization. More interestingly, many of the GO terms selected by mEN are from the biological process and molecular function categories, suggesting that the GO terms of these categories also play vital roles in the prediction. With these essential GO terms, not only where a protein locates can be decided, but also why it resides there can be revealed. Conclusions Experimental results show that the output of both mEN and mLASSO are interpretable and they perform significantly better than existing state-of-the-art predictors. Moreover, mEN selects more features and performs better than mLASSO on a stringent human benchmark dataset. For readers’ convenience, an online server called SpaPredictor for both mLASSO and mEN is available at http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/.
Collapse
Affiliation(s)
- Shibiao Wan
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, SAR, China.
| | - Man-Wai Mak
- Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong, SAR, China.
| | - Sun-Yuan Kung
- Department of Electrical Engineering, Princeton University, New Jersey, USA.
| |
Collapse
|
43
|
Qu X, Wang D, Chen Y, Qiao S, Zhao Q. Predicting the Subcellular Localization of Proteins with Multiple Sites Based on Multiple Features Fusion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:36-42. [PMID: 26452288 DOI: 10.1109/tcbb.2015.2485207] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Protein sub-cellular localization prediction has attracted much attention in recent years because of its importance for protein function studying and targeted drug discovery, and that makes it to be an important research field in bioinformatics. Traditional experimental methods which ascertain the protein sub-cellular locations are costly and time consuming. In the last two decades, machine learning methods got increasing development, and a large number of machine learning based protein sub-cellular location predictors have been developed. However, most of such predictors can only predict proteins in only one subcellular location. With the development of biology techniques, more and more proteins which have two or even more sub-cellular locations have been found. It is much more significant to study such proteins because they have extremely useful implication for both basic biology and bioinformatics research. In order to improve the accuracy of prediction, much more feature information which can represent the protein sequence should be extracted. In this paper, several feature extraction methods were fused together to extract the feature information, then the multi-label k nearest neighbors (ML-KNN) algorithm was used to predict protein sub-cellular locations. The best overall accuracies we got for dataset s1 in constructing Gpos-mploc is 66.7304 and 59.9206 percent for dataset s2 in constructing Virus-mPLoc.
Collapse
|
44
|
Liu W, Cheng C, Lai G, Lin Y, Lai Z. Molecular cloning and expression analysis of KIN10 and cold-acclimation related genes in wild banana 'Huanxi' (Musa itinerans). SPRINGERPLUS 2015; 4:829. [PMID: 26753116 PMCID: PMC4695468 DOI: 10.1186/s40064-015-1617-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 12/13/2015] [Indexed: 12/20/2022]
Abstract
Banana cultivars may experience chilling or freezing injury in some of their cultivated regions, where wild banana can still grow very well. The clarification of the cold-resistant mechanism of wild banana is vital for cold-resistant banana breeding. In this study, the central stress integrator gene KIN10 and some cold-acclimation related genes (HOS1 and ICE1s) from the cold-resistant wild banana ‘Huanxi’ (Musa itinerans) were cloned and their expression patterns under different temperature treatments were analyzed. Thirteen full-length cDNA transcripts including 6 KIN10s, 1 HOS1 and 6 ICE1s were successfully cloned. Quantitative real-time PCR (qRT-PCR) results showed that all these genes had the highest expression levels at the critical temperature of banana (13 °C). Under chilling temperature (4 °C), the expression level of KIN10 reduced significantly but the expression of HOS1 was still higher than that at the optimal temperature (28 °C, control). Both KIN10 and HOS1 showed the lowest expression levels at 0 °C, the expression level of ICE1, however, was higher than control. As sucrose plays role in plant cold-acclimation and in regulation of KIN10 and HOS1 bioactivities, the sucrose contents of wild banana under different temperatures were detected. Results showed that the sucrose content increased as temperature lowered. Our result suggested that KIN10 may participate in cold stress response via regulating sucrose biosynthesis, which is helpful in regulating cold acclimation pathway in wild banana.
Collapse
Affiliation(s)
- Weihua Liu
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 Fujian China
| | - Chunzhen Cheng
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 Fujian China
| | - Gongti Lai
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 Fujian China
| | - Yuling Lin
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 Fujian China
| | - Zhongxiong Lai
- Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002 Fujian China
| |
Collapse
|
45
|
Abstract
SIGNIFICANCE Selenoproteins employ selenium to supplement the chemistry available through the common 20 amino acids. These powerful enzymes are affiliated with redox biology, often in connection with the detection, management, and signaling of oxidative stress. Among them, membrane-bound selenoproteins play prominent roles in signaling pathways, Ca(2+) regulation, membrane complexes integrity, and biosynthesis of lipophilic molecules. RECENT ADVANCES The number of selenoproteins whose physiological roles, protein partners, expression, evolution, and biosynthesis are characterized is steadily increasing, thus offering a more nuanced view of this specialized family. This review focuses on human membrane selenoproteins, particularly the five least characterized ones: selenoproteins I, K, N, S, and T. CRITICAL ISSUES Membrane-bound selenoproteins are the least understood, as it is challenging to provide the membrane-like environment required for their biochemical and biophysical characterization. Hence, their studies rely mostly on biological rather than structural and biochemical assays. Another aspect that has not received much attention is the particular role that their membrane association plays in their physiological function. FUTURE DIRECTIONS Findings cited in this review show that it is possible to infer the structure and the membrane-binding mode of these lesser-studied selenoproteins and design experiments to examine the role of the rare amino acid selenocysteine.
Collapse
Affiliation(s)
- Jun Liu
- Department of Chemistry and Biochemistry, University of Delaware , Newark, Delaware
| | - Sharon Rozovsky
- Department of Chemistry and Biochemistry, University of Delaware , Newark, Delaware
| |
Collapse
|
46
|
Wan S, Mak MW, Kung SY. mLASSO-Hum: A LASSO-based interpretable human-protein subcellular localization predictor. J Theor Biol 2015; 382:223-34. [DOI: 10.1016/j.jtbi.2015.06.042] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 06/25/2015] [Accepted: 06/26/2015] [Indexed: 02/03/2023]
|
47
|
Mendieta-Serrano MA, Schnabel D, Lomelí H, Salas-Vidal E. Spatial and temporal expression of zebrafish glutathione peroxidase 4 a and b genes during early embryo development. Gene Expr Patterns 2015; 19:98-107. [PMID: 26315538 DOI: 10.1016/j.gep.2015.08.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Revised: 07/09/2015] [Accepted: 08/18/2015] [Indexed: 10/23/2022]
Abstract
Antioxidant cellular mechanisms are essential for cell redox homeostasis during animal development and in adult life. Previous in situ hybridization analyses of antioxidant enzymes in zebrafish have indicated that they are ubiquitously expressed. However, spatial information about the protein distribution of these enzymes is not available. Zebrafish embryos are particularly suitable for this type of analysis due to their small size, transparency and fast development. The main objective of the present work was to analyze the spatial and temporal gene expression pattern of the two reported zebrafish glutathione peroxidase 4 (GPx4) genes during the first day of zebrafish embryo development. We found that the gpx4b gene shows maternal and zygotic gene expression in the embryo proper compared to gpx4a that showed zygotic gene expression in the periderm covering the yolk cell only. Following, we performed a GPx4 protein immunolocalization analysis during the first 24-h of development. The detection of this protein suggests that the antibody recognizes GPx4b in the embryo proper during the first 24 h of development and GPx4a at the periderm covering the yolk cell after 14-somite stage. Throughout early cleavages, GPx4 was located in blastomeres and was less abundant at the cleavage furrow. Later, from the 128-cell to 512-cell stages, GPx4 remained in the cytoplasm but gradually increased in the nuclei, beginning in marginal blastomeres and extending the nuclear localization to all blastomeres. During epiboly progression, GPx4b was found in blastoderm cells and was excluded from the yolk cell. After 24 h of development, GPx4b was present in the myotomes particularly in the slow muscle fibers, and was excluded from the myosepta. These results highlight the dynamics of the GPx4 localization pattern and suggest its potential participation in fundamental developmental processes.
Collapse
Affiliation(s)
- Mario A Mendieta-Serrano
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Avenida Universidad #2001, Colonia Chamilpa, Cuernavaca, Morelos C.P. 62210, Mexico
| | - Denhí Schnabel
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Avenida Universidad #2001, Colonia Chamilpa, Cuernavaca, Morelos C.P. 62210, Mexico
| | - Hilda Lomelí
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Avenida Universidad #2001, Colonia Chamilpa, Cuernavaca, Morelos C.P. 62210, Mexico
| | - Enrique Salas-Vidal
- Departamento de Genética del Desarrollo y Fisiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Avenida Universidad #2001, Colonia Chamilpa, Cuernavaca, Morelos C.P. 62210, Mexico.
| |
Collapse
|
48
|
Surfaceome and exoproteome of a clinical sequence type 398 methicillin resistant Staphylococcus aureus strain. Biochem Biophys Rep 2015; 3:7-13. [PMID: 29124163 PMCID: PMC5668672 DOI: 10.1016/j.bbrep.2015.07.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Revised: 07/01/2015] [Accepted: 07/07/2015] [Indexed: 11/23/2022] Open
Abstract
For many years Staphylococcus aureus has been recognized as an important human pathogen. In this study, the surfacome and exoproteome of a clinical sample of MRSA was analyzed. The C2355 strain, previously typed as ST398 and spa-t011 and showing a phenotype of multiresistance to antibiotics, has several resistance genes. Using shotgun proteomics and bioinformatics tools, 236 proteins were identified in the surfaceome and 99 proteins in the exoproteome. Although many of these proteins are related to basic cell functions, some are related to virulence and pathogenicity like catalase and isdA, main actors in S. aureus infection, and others are related to antibiotic action or eventually resistance like penicillin binding protein, a cell-wall protein. Studying the proteomes of different subcellular compartments should improve our understanding of this pathogen, a microorganism with several mechanisms of resistance and pathogenicity, and provide valuable data for bioinformatics databases. We examine the surface proteome and exoproteome of multiresistant strains. We identify bacterial infection proteins in the extracellular proteome. Confirmation that moonlighting proteins will extend the localization data.
Collapse
|
49
|
Yang L, Hao D, Wang J, Xing X, Lv Y, Zuo Y, Jiang W. Characterization of proteins in S. cerevisiae with subcellular localizations. MOLECULAR BIOSYSTEMS 2015; 11:1360-1369. [PMID: 25797515 DOI: 10.1039/c5mb00124b] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Acquiring comprehensive knowledge of protein in various subcellular localizations is one of the fundamental goals in cell biology and proteomics. Although recent large-scale experimental and proteomics studies of S. cerevisiae protein subcellular localizations are archived in various databases, only a few studies use a systems biology approach to characterize S. cerevisiae proteins at a subcellular localization level. Based on the topological properties and biological properties of S. cerevisiae proteins, we have compared, contrasted and analyzed the statistical properties across eight different subcellular localizations. Significant differences are found in all topological properties and biological properties among eight protein categories. Network topology analysis indicates that the nuclear proteins differ from the other seven protein categories, and tend to have the most important topological properties and play an important role in the network, including the highest degree, core number, and betweenness centrality. In the light of the above, we hope these findings presented in this study may provide important help for protein subcellular localization prediction in S. cerevisiae and provide many new insights for understanding the proteins directly from subcellular localizations.
Collapse
Affiliation(s)
- Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, PR China.
| | | | | | | | | | | | | |
Collapse
|
50
|
mPLR-Loc: An adaptive decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction. Anal Biochem 2015; 473:14-27. [DOI: 10.1016/j.ab.2014.10.014] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Revised: 09/29/2014] [Accepted: 10/21/2014] [Indexed: 01/16/2023]
|