1
|
Oztug M, Durer ZAO, Yetke Hİ, Asicioglu M, Akgoz M, Karaguler NG. Cloning, Expression, and Characterization of Serine Protease AprX from Geobacillus thermoleovorans ARTRW1. Ind Biotechnol (New Rochelle N Y) 2022. [DOI: 10.1089/ind.2022.0016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Merve Oztug
- Department of Molecular Biology and Genetics, Faculty of Science and Letters, Istanbul Technical University, Istanbul, Turkey
- Dr. Orhan Öcalgiray Molecular Biology-Biotechnology and Genetics Research Center, Istanbul Technical University, Istanbul, Turkey
- TUBITAK National Metrology Institute (TUBITAK UME), Kocaeli, Turkey
| | - Zeynep A. Oztug Durer
- Department of Biophysics, School of Medicine, Acıbadem Mehmet Ali Aydinlar University, Istanbul, Turkey
- Department of Biochemistry, School of Pharmacy, Acıbadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| | - Hande İpek Yetke
- Department of Biophysics, Faculty of Medicine, Marmara University, Istanbul, Turkey
| | - Meltem Asicioglu
- Department of Molecular Biology and Genetics, Faculty of Science and Letters, Istanbul Technical University, Istanbul, Turkey
- Dr. Orhan Öcalgiray Molecular Biology-Biotechnology and Genetics Research Center, Istanbul Technical University, Istanbul, Turkey
- TUBITAK National Metrology Institute (TUBITAK UME), Kocaeli, Turkey
| | - Muslum Akgoz
- TUBITAK National Metrology Institute (TUBITAK UME), Kocaeli, Turkey
| | - Nevin Gul Karaguler
- Department of Molecular Biology and Genetics, Faculty of Science and Letters, Istanbul Technical University, Istanbul, Turkey
- Dr. Orhan Öcalgiray Molecular Biology-Biotechnology and Genetics Research Center, Istanbul Technical University, Istanbul, Turkey
| |
Collapse
|
2
|
Lopez-Fernandez H, Duque P, Vazquez N, Fdez-Riverola F, Reboiro-Jato M, Vieira CP, Vieira J. SEDA: A Desktop Tool Suite for FASTA Files Processing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1850-1860. [PMID: 33237866 DOI: 10.1109/tcbb.2020.3040383] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
SEDA (SEquence DAtaset builder) is a multiplatform desktop application for the manipulation of FASTA files containing DNA or protein sequences. The convenient graphical user interface gives access to a collection of simple (filtering, sorting, or file reformatting, among others) and advanced (BLAST searching, protein domain annotation, gene annotation, and sequence alignment) utilities not present in similar applications, which eases the work of life science researchers working with DNA and/or protein sequences, especially those who have no programming skills. This paper presents general guidelines on how to build efficient data handling protocols using SEDA, as well as practical examples on how to prepare high-quality datasets for single gene phylogenetic studies, the characterization of protein families, or phylogenomic studies. The user-friendliness of SEDA also relies on two important features: (i) the availability of easy-to-install distributable versions and installers of SEDA, including a Docker image for Linux, and (ii) the facility with which users can manage large datasets. SEDA is open-source, with GNU General Public License v3.0 license, and publicly available at GitHub (https://github.com/sing-group/seda). SEDA installers and documentation are available at https://www.sing-group.org/seda/.
Collapse
|
3
|
Zahoránszky-Kőhalmi G, Siramshetty VB, Kumar P, Gurumurthy M, Grillo B, Mathew B, Metaxatos D, Backus M, Mierzwa T, Simon R, Grishagin I, Brovold L, Mathé EA, Hall MD, Michael SG, Godfrey AG, Mestres J, Jensen LJ, Oprea TI. A Workflow of Integrated Resources to Catalyze Network Pharmacology Driven COVID-19 Research. J Chem Inf Model 2022; 62:718-729. [PMID: 35057621 PMCID: PMC10790216 DOI: 10.1021/acs.jcim.1c00431] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
In the event of an outbreak due to an emerging pathogen, time is of the essence to contain or to mitigate the spread of the disease. Drug repositioning is one of the strategies that has the potential to deliver therapeutics relatively quickly. The SARS-CoV-2 pandemic has shown that integrating critical data resources to drive drug-repositioning studies, involving host-host, host-pathogen, and drug-target interactions, remains a time-consuming effort that translates to a delay in the development and delivery of a life-saving therapy. Here, we describe a workflow we designed for a semiautomated integration of rapidly emerging data sets that can be generally adopted in a broad network pharmacology research setting. The workflow was used to construct a COVID-19 focused multimodal network that integrates 487 host-pathogen, 63 278 host-host protein, and 1221 drug-target interactions. The resultant Neo4j graph database named "Neo4COVID19" is made publicly accessible via a web interface and via API calls based on the Bolt protocol. Details for accessing the database are provided on a landing page (https://neo4covid19.ncats.io/). We believe that our Neo4COVID19 database will be a valuable asset to the research community and will catalyze the discovery of therapeutics to fight COVID-19.
Collapse
Affiliation(s)
| | - Vishal B. Siramshetty
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Praveen Kumar
- Department of Internal Medicine, University of New Mexico School of Medicine, 1 University of New Mexico, Albuquerque, NM 87131, USA
- Department of Computer Science, University of New Mexico, 1 University of New Mexico Albuquerque, NM 87131, USA
| | - Manideep Gurumurthy
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Busola Grillo
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Biju Mathew
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Dimitrios Metaxatos
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Mark Backus
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Tim Mierzwa
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Reid Simon
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Ivan Grishagin
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
- Rancho BioSciences LLC., 16955 Via Del Campo Suite 200, San Diego, CA 92127, USA
| | - Laura Brovold
- Rancho BioSciences LLC., 16955 Via Del Campo Suite 200, San Diego, CA 92127, USA
| | - Ewy A. Mathé
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Matthew D. Hall
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Samuel G. Michael
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Alexander G. Godfrey
- National Center for Advancing Translational Sciences, Rockville, 9800 Medical Center Dr., MD 20850, USA
| | - Jordi Mestres
- Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain
| | - Lars J. Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences,University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
| | - Tudor I. Oprea
- Department of Internal Medicine, University of New Mexico School of Medicine, 1 University of New Mexico, Albuquerque, NM 87131, USA
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences,University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen N, Denmark
- UNM Comprehensive Cancer Center, 1201 Camino de Salud NE, Albuquerque, NM 87102, USA
- Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, Box 480, 40530 Gothenburg, Sweden
| |
Collapse
|
4
|
Zhu Z, Meng K, Liu G, Meng G. A database resource and online analysis tools for coronaviruses on a historical and global scale. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2020:5909701. [PMID: 33009914 PMCID: PMC7665380 DOI: 10.1093/database/baaa070] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 07/26/2020] [Accepted: 07/30/2020] [Indexed: 01/07/2023]
Abstract
The recent outbreak of COVID-19 caused by a new zoonotic origin coronavirus (SARS-CoV-2 or 2019-nCoV) has sound the alarm for the potential spread of epidemic coronavirus crossing species. With the urgent needs to assist disease control and to provide invaluable scientific information, we developed the coronavirus database (CoVdb), an online genomic, proteomic and evolutionary analysis platform. CoVdb has brought together genomes of more than 5000 coronavirus strains, which were collected from 1941 to 2020, in more than 60 countries and in hosts belonging to more than 30 species, ranging from fish to human. CoVdb presents comprehensive genomic information, such as gene function, subcellular localization, topology and protein structure. To facilitate coronavirus research, CoVdb also provides flexible search approaches and online tools to view and analyze protein structure, to perform multiple alignments, to automatically build phylogenetic trees and to carry on evolutionary analyses. CoVdb can be accessed freely at http://covdb.popgenetics.net. Hopefully, it will accelerate the progress to develop medicines or vaccines to control the pandemic of COVID-19.
Collapse
Affiliation(s)
- Zhenglin Zhu
- School of Life Sciences, Chongqing University, No. 55 Daxuecheng South Rd., Shapingba, Chongqing, 401331, China
| | - Kaiwen Meng
- College of Veterinary Medicine, China Agricultural University, HaiDian District, Beijing, 100094, China
| | - Gexin Liu
- School of Life Sciences, Chongqing University, No. 55 Daxuecheng South Rd., Shapingba, Chongqing, 401331, China
| | - Geng Meng
- College of Veterinary Medicine, China Agricultural University, HaiDian District, Beijing, 100094, China
| |
Collapse
|
5
|
CODON-Software to manual curation of prokaryotic genomes. PLoS Comput Biol 2021; 17:e1008797. [PMID: 33788829 PMCID: PMC8011737 DOI: 10.1371/journal.pcbi.1008797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 02/14/2021] [Indexed: 11/19/2022] Open
Abstract
Genome annotation conceptually consists of inferring and assigning biological information to gene products. Over the years, numerous pipelines and computational tools have been developed aiming to automate this task and assist researchers in gaining knowledge about target genes of study. However, even with these technological advances, manual annotation or manual curation is necessary, where the information attributed to the gene products is verified and enriched. Despite being called the gold standard process for depositing data in a biological database, the task of manual curation requires significant time and effort from researchers who sometimes have to parse through numerous products in various public databases. To assist with this problem, we present CODON, a tool for manual curation of genomic data, capable of performing the prediction and annotation process. This software makes use of a finite state machine in the prediction process and automatically annotates products based on information obtained from the Uniprot database. CODON is equipped with a simple and intuitive graphic interface that assists on manual curation, enabling the user to decide about the analysis based on information as to identity, length of the alignment, and name of the organism in which the product obtained a match. Further, visual analysis of all matches found in the database is possible, impacting significantly in the curation task considering that the user has at his disposal all the information available for a given product. An analysis performed on eleven organisms was used to test the efficiency of this tool by comparing the results of prediction and annotation through CODON to ones from the NCBI and RAST platforms.
Collapse
|
6
|
Zahoránszky-Kőhalmi G, Siramshetty VB, Kumar P, Gurumurthy M, Grillo B, Mathew B, Metaxatos D, Backus M, Mierzwa T, Simon R, Grishagin I, Brovold L, Mathé EA, Hall MD, Michael SG, Godfrey AG, Mestres J, Jensen LJ, Oprea TI. A Workflow of Integrated Resources to Catalyze Network Pharmacology Driven COVID-19 Research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.11.04.369041. [PMID: 33173863 PMCID: PMC7654851 DOI: 10.1101/2020.11.04.369041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION In the event of an outbreak due to an emerging pathogen, time is of the essence to contain or to mitigate the spread of the disease. Drug repositioning is one of the strategies that has the potential to deliver therapeutics relatively quickly. The SARS-CoV-2 pandemic has shown that integrating critical data resources to drive drug-repositioning studies, involving host-host, hostpathogen and drug-target interactions, remains a time-consuming effort that translates to a delay in the development and delivery of a life-saving therapy. RESULTS Here, we describe a workflow we designed for a semi-automated integration of rapidly emerging datasets that can be generally adopted in a broad network pharmacology research setting. The workflow was used to construct a COVID-19 focused multimodal network that integrates 487 host-pathogen, 74,805 host-host protein and 1,265 drug-target interactions. The resultant Neo4j graph database named "Neo4COVID19" is accessible via a web interface and via API calls based on the Bolt protocol. We believe that our Neo4COVID19 database will be a valuable asset to the research community and will catalyze the discovery of therapeutics to fight COVID-19. AVAILABILITY https://neo4covid19.ncats.io.
Collapse
Affiliation(s)
| | | | - Praveen Kumar
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, NM, USA
- Department of Computer Science, University of New Mexico, Albuquerque, New Mexico, USA
| | | | - Busola Grillo
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Biju Mathew
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | | | - Mark Backus
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Tim Mierzwa
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Reid Simon
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Ivan Grishagin
- National Center for Advancing Translational Sciences, Rockville, MD, USA
- Rancho BioSciences LLC., San Diego, CA USA
| | | | - Ewy A. Mathé
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Matthew D. Hall
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Samuel G. Michael
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | | | - Jordi Mestres
- Research Group on Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Medical Research Institute and University Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Lars J. Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Tudor I. Oprea
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, NM, USA
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- UNM Comprehensive Cancer Center, Albuquerque, NM, USA
- Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
7
|
Zhu Z, Meng G. ASFVdb: an integrative resource for genomic and proteomic analyses of African swine fever virus. Database (Oxford) 2020; 2020:baaa023. [PMID: 32294195 PMCID: PMC7159030 DOI: 10.1093/database/baaa023] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/23/2020] [Accepted: 03/03/2020] [Indexed: 11/17/2022]
Abstract
The recent outbreaks of African swine fever (ASF) in China and Europe have threatened the swine industry globally. To control the transmission of ASF virus (ASFV), we developed the African swine fever virus database (ASFVdb), an online data visualization and analysis platform for comparative genomics and proteomics. On the basis of known ASFV genes, ASFVdb reannotates the genomes of every strain and newly annotates 5352 possible open reading frames (ORFs) of 45 strains. Moreover, ASFVdb performs a thorough analysis of the population genetics of all the published genomes of ASFV strains and performs functional and structural predictions for all genes. Users can obtain not only basic information for each gene but also its distribution in strains and conserved or high mutation regions, possible subcellular location and topology. In the genome browser, ASFVdb provides a sliding window for results of population genetic analysis, which facilitates genetic and evolutionary analyses at the genomic level. The web interface was constructed based on SWAV 1.0. ASFVdb is freely accessible at http://asfvdb.popgenetics.net.
Collapse
Affiliation(s)
- Zhenglin Zhu
- School of Life Sciences, Chongqing University, Chongqing, China
| | - Geng Meng
- Laboratory of Biomedical Research and College of Veterinary Medicine, China Agricultural University, Beijing, China
| |
Collapse
|
8
|
UniprotR: Retrieving and visualizing protein sequence and functional information from Universal Protein Resource (UniProt knowledgebase). J Proteomics 2019; 213:103613. [PMID: 31843688 DOI: 10.1016/j.jprot.2019.103613] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/26/2019] [Accepted: 12/13/2019] [Indexed: 02/06/2023]
Abstract
UniprotR is a software package designed to easily retrieve, cluster and visualize protein data from UniProt knowledgebase (UniProtKB) using R language. The package is implemented mainly to process, parse and illustrate proteomics data in a handy and time-saving approach allowing researchers to summarize all required protein information available at UniProtKB in a readable data frame, Excel CSV file, and/or graphical output. UniprotR generates a set of graphics including gene ontology, chromosomal location, protein scoring and status, protein networking, sequence phylogenetic tree, and physicochemical properties. In addition, the package supports clustering of proteins based on primary gene name or chromosomal location, facilitating additional downstream analysis. SIGNIFICANCE: In this work, we implemented a robust package for retrieving and visualizing information from multiple sources such UniProtKB, SWISS-MODEL, and STRING. UniprotR Contains functions that enable retrieving and cluster data in a handy way and visualize data in publishable graphs to facilitate researcher's work and fulfill their needs. UniprotR will aid in saving time for downstream data analysis instead of manual time consuming data analysis. AVAILABILITY AND IMPLEMENTATION: UniprotR released as free open source code under the license of GPLv3, and available in CRAN (The Comprehensive R Archive Network) and GitHub. (https://cran.r-project.org/web/packages/UniprotR/index.html). (https://github.com/Proteomicslab57357/UniprotR).
Collapse
|
9
|
Heyer R, Schallert K, Büdel A, Zoun R, Dorl S, Behne A, Kohrs F, Püttker S, Siewert C, Muth T, Saake G, Reichl U, Benndorf D. A Robust and Universal Metaproteomics Workflow for Research Studies and Routine Diagnostics Within 24 h Using Phenol Extraction, FASP Digest, and the MetaProteomeAnalyzer. Front Microbiol 2019; 10:1883. [PMID: 31474963 PMCID: PMC6707425 DOI: 10.3389/fmicb.2019.01883] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/30/2019] [Indexed: 01/29/2023] Open
Abstract
The investigation of microbial proteins by mass spectrometry (metaproteomics) is a key technology for simultaneously assessing the taxonomic composition and the functionality of microbial communities in medical, environmental, and biotechnological applications. We present an improved metaproteomics workflow using an updated sample preparation and a new version of the MetaProteomeAnalyzer software for data analysis. High resolution by multidimensional separation (GeLC, MudPIT) was sacrificed to aim at fast analysis of a broad range of different samples in less than 24 h. The improved workflow generated at least two times as many protein identifications than our previous workflow, and a drastic increase of taxonomic and functional annotations. Improvements of all aspects of the workflow, particularly the speed, are first steps toward potential routine clinical diagnostics (i.e., fecal samples) and analysis of technical and environmental samples. The MetaProteomeAnalyzer is provided to the scientific community as a central remote server solution at www.mpa.ovgu.de.
Collapse
Affiliation(s)
- Robert Heyer
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Kay Schallert
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Anja Büdel
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Roman Zoun
- Database Research Group, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Sebastian Dorl
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg, Austria
| | | | - Fabian Kohrs
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Sebastian Püttker
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Corina Siewert
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems Magdeburg, Magdeburg, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Gunter Saake
- Database Research Group, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Udo Reichl
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems Magdeburg, Magdeburg, Germany
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems Magdeburg, Magdeburg, Germany
| |
Collapse
|
10
|
Heyer R, Schallert K, Siewert C, Kohrs F, Greve J, Maus I, Klang J, Klocke M, Heiermann M, Hoffmann M, Püttker S, Calusinska M, Zoun R, Saake G, Benndorf D, Reichl U. Metaproteome analysis reveals that syntrophy, competition, and phage-host interaction shape microbial communities in biogas plants. MICROBIOME 2019; 7:69. [PMID: 31029164 PMCID: PMC6486700 DOI: 10.1186/s40168-019-0673-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 03/26/2019] [Indexed: 05/30/2023]
Abstract
BACKGROUND In biogas plants, complex microbial communities produce methane and carbon dioxide by anaerobic digestion of biomass. For the characterization of the microbial functional networks, samples of 11 reactors were analyzed using a high-resolution metaproteomics pipeline. RESULTS Examined methanogenesis archaeal communities were either mixotrophic or strictly hydrogenotrophic in syntrophy with bacterial acetate oxidizers. Mapping of identified metaproteins with process steps described by the Anaerobic Digestion Model 1 confirmed its main assumptions and also proposed some extensions such as syntrophic acetate oxidation or fermentation of alcohols. Results indicate that the microbial communities were shaped by syntrophy as well as competition and phage-host interactions causing cell lysis. For the families Bacillaceae, Enterobacteriaceae, and Clostridiaceae, the number of phages exceeded up to 20-fold the number of host cells. CONCLUSION Phage-induced cell lysis might slow down the conversion of substrates to biogas, though, it could support the growth of auxotrophic microbes by cycling of nutrients.
Collapse
Affiliation(s)
- R. Heyer
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, 39106 Magdeburg, Germany
| | - K. Schallert
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, 39106 Magdeburg, Germany
| | - C. Siewert
- Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Sandtorstraße 1, 39106 Magdeburg, Germany
| | - F. Kohrs
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, 39106 Magdeburg, Germany
| | - J. Greve
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, 39106 Magdeburg, Germany
| | - I. Maus
- Center for Biotechnology (CeBiTec), University Bielefeld, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - J. Klang
- Department Bioengineering, Leibniz Institute for Agricultural Engineering and Bioeconomy (ATB), Max-Eyth-Allee 100, 14469 Potsdam, Germany
| | - M. Klocke
- Department Bioengineering, Leibniz Institute for Agricultural Engineering and Bioeconomy (ATB), Max-Eyth-Allee 100, 14469 Potsdam, Germany
| | - M. Heiermann
- Department Technology Assessment and Substance Cycles, Leibniz Institute for Agricultural Engineering and Bioeconomy (ATB), Max-Eyth-Allee 100, 14469 Potsdam, Germany
| | - M. Hoffmann
- Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Sandtorstraße 1, 39106 Magdeburg, Germany
| | - S. Püttker
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, 39106 Magdeburg, Germany
| | - M. Calusinska
- Environmental Research and Innovation (ERIN), Luxembourg Institute of Science and Technology, 41 rue du Brill, L-4422 Belvaux, Luxembourg
| | - R. Zoun
- Otto von Guericke University, Institute for Databases and Software Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany
| | - G. Saake
- Otto von Guericke University, Institute for Databases and Software Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany
| | - D. Benndorf
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, 39106 Magdeburg, Germany
- Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Sandtorstraße 1, 39106 Magdeburg, Germany
| | - U. Reichl
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, 39106 Magdeburg, Germany
- Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Sandtorstraße 1, 39106 Magdeburg, Germany
| |
Collapse
|
11
|
Metaproteomics of fecal samples of Crohn's disease and Ulcerative Colitis. J Proteomics 2019; 201:93-103. [PMID: 31009805 DOI: 10.1016/j.jprot.2019.04.009] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Revised: 03/19/2019] [Accepted: 04/05/2019] [Indexed: 12/24/2022]
Abstract
Crohn's Disease (CD) and Ulcerative Colitis (UC) are chronic inflammatory bowel diseases (IBD) of the gastrointestinal tract. This study used non-invasive LC-MS/MS to find disease specific microbial and human proteins which might be used later for an easier diagnosis. Therefore, 17 healthy controls, 11 CD patients and 14 UC patients but also 13 Irritable Bowel Disease (IBS) patients, 8 Colon Adenoma (CA) patients, and 8 Gastric Carcinoma (GCA) patients were investigated. The proteins were extracted from the fecal samples with liquid phenol in a ball mill. Subsequently, the proteins were digested tryptically to peptides and analyzed by an Orbitrap LC-MS/MS. For protein identification and interpretation of taxonomic and functional results, the MetaProteomeAnalyzer software was used. Cluster analysis and non-parametric test (analysis of similarities) separated healthy controls from patients with CD and UC as well as from patients with GCA. Among others, CD and UC correlated with an increase of neutrophil extracellular traps and immune globulins G (IgG). In addition, a decrease of human IgA and the transcriptional regulatory protein RprY from Bacillus fragilis was found for CD and UC. A specific marker in feces for CD was an increased amount of the human enzyme sucrose-isomaltase. SIGNIFICANCE: Crohn's Disease and Ulcerative Colitis are chronic inflammatory diseases of the gastrointestinal tract, whose diagnosis required comprehensive medical examinations including colonoscopy. The impact of the microbial communities in the gut on the pathogenesis of these diseases is poorly understood. Therefore, this study investigated the impact of gut microbiome on these diseases by a metaproteome approach, revealing several disease specific marker proteins. Overall, this indicated that fecal metaproteomics has the potential to be useful as non-invasive tool for a better and easier diagnosis of both diseases.
Collapse
|
12
|
Heyer R, Schallert K, Zoun R, Becher B, Saake G, Benndorf D. Challenges and perspectives of metaproteomic data analysis. J Biotechnol 2017; 261:24-36. [PMID: 28663049 DOI: 10.1016/j.jbiotec.2017.06.1201] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Revised: 06/20/2017] [Accepted: 06/23/2017] [Indexed: 02/07/2023]
Abstract
In nature microorganisms live in complex microbial communities. Comprehensive taxonomic and functional knowledge about microbial communities supports medical and technical application such as fecal diagnostics as well as operation of biogas plants or waste water treatment plants. Furthermore, microbial communities are crucial for the global carbon and nitrogen cycle in soil and in the ocean. Among the methods available for investigation of microbial communities, metaproteomics can approximate the activity of microorganisms by investigating the protein content of a sample. Although metaproteomics is a very powerful method, issues within the bioinformatic evaluation impede its success. In particular, construction of databases for protein identification, grouping of redundant proteins as well as taxonomic and functional annotation pose big challenges. Furthermore, growing amounts of data within a metaproteomics study require dedicated algorithms and software. This review summarizes recent metaproteomics software and addresses the introduced issues in detail.
Collapse
Affiliation(s)
- Robert Heyer
- Otto von Guericke University, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Kay Schallert
- Otto von Guericke University, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Roman Zoun
- Otto von Guericke University, Institute for Technical and Business Information Systems, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Beatrice Becher
- Otto von Guericke University, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Gunter Saake
- Otto von Guericke University, Institute for Technical and Business Information Systems, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Dirk Benndorf
- Otto von Guericke University, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany; Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Sandtorstraße 1, 39106, Magdeburg, Germany.
| |
Collapse
|
13
|
Dias O, Gomes D, Vilaca P, Cardoso J, Rocha M, Ferreira EC, Rocha I. Genome-Wide Semi-Automated Annotation of Transporter Systems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:443-456. [PMID: 26887005 DOI: 10.1109/tcbb.2016.2527647] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Usually, transport reactions are added to genome-scale metabolic models (GSMMs) based on experimental data and literature. This approach does not allow associating specific genes with transport reactions, which impairs the ability of the model to predict effects of gene deletions. Novel methods for systematic genome-wide transporter functional annotation and their integration into GSMMs are therefore necessary. In this work, an automatic system to detect and classify all potential membrane transport proteins for a given genome and integrate the related reactions into GSMMs is proposed, based on the identification and classification of genes that encode transmembrane proteins. The Transport Reactions Annotation and Generation (TRIAGE) tool identifies the metabolites transported by each transmembrane protein and its transporter family. The localization of the carriers is also predicted and, consequently, their action is confined to a given membrane. The integration of the data provided by TRIAGE with highly curated models allowed the identification of new transport reactions. TRIAGE is included in the new release of merlin, a software tool previously developed by the authors, which expedites the GSMM reconstruction processes.
Collapse
|
14
|
Campbell MP, Peterson RA, Gasteiger E, Mariethoz J, Lisacek F, Packer NH. Navigating the Glycome Space and Connecting the Glycoproteome. Methods Mol Biol 2017; 1558:139-158. [PMID: 28150237 DOI: 10.1007/978-1-4939-6783-4_7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
UniCarbKB ( http://unicarbkb.org ) is a comprehensive resource for mammalian glycoprotein and annotation data. In particular, the database provides information on the oligosaccharides characterized from a glycoprotein at either the global or site-specific level. This evidence is accumulated from a peer-reviewed and manually curated collection of information on oligosaccharides derived from membrane and secreted glycoproteins purified from biological fluids and/or tissues. This information is further supplemented with experimental method descriptions that summarize important sample preparation and analytical strategies. A new release of UniCarbKB is published every three months, each includes a collection of curated data and improvements to database functionality. In this Chapter, we outline the objectives of UniCarbKB, and describe a selection of step-by-step workflows for navigating the information available. We also provide a short description of web services available and future plans for improving data access. The information presented in this Chapter supplements content available in our knowledgebase including regular updates on interface improvements, new features, and revisions to the database content ( http://confluence.unicarbkb.org ).
Collapse
Affiliation(s)
- Matthew P Campbell
- Department of Chemistry and Biomolecular Sciences, Research Drive, Building E8C, Macquarie University, North Ryde, Sydney, 2109, NSW, Australia
| | - Robyn A Peterson
- Department of Chemistry and Biomolecular Sciences, Research Drive, Building E8C, Macquarie University, North Ryde, Sydney, 2109, NSW, Australia
| | - Elisabeth Gasteiger
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Battelle - Building A7, Route de Drize, 1227 Carouge, Switzerland
| | - Julien Mariethoz
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Battelle - Building A7, Route de Drize, 1227 Carouge, Switzerland
| | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Battelle - Building A7, Route de Drize, 1227 Carouge, Switzerland
- Computer Science Department, University of Geneva, Battelle - Building A7, Route de Drize, 1227 Carouge, Switzerland
| | - Nicolle H Packer
- Department of Chemistry and Biomolecular Sciences, Research Drive, Building E8C, Macquarie University, North Ryde, Sydney, 2109, NSW, Australia.
| |
Collapse
|
15
|
Gilpin W. PyPDB: a Python API for the Protein Data Bank. Bioinformatics 2015; 32:159-60. [PMID: 26369703 DOI: 10.1093/bioinformatics/btv543] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 09/07/2015] [Indexed: 11/12/2022] Open
Abstract
SUMMARY We have created a Python programming interface for the RCSB Protein Data Bank (PDB) that allows search and data retrieval for a wide range of result types, including BLAST and sequence motif queries. The API relies on the existing XML-based API and operates by creating custom XML requests from native Python types, allowing extensibility and straightforward modification. The package has the ability to perform many types of advanced search of the PDB that are otherwise only available through the PDB website. AVAILABILITY AND IMPLEMENTATION PyPDB is implemented exclusively in Python 3 using standard libraries for maximal compatibility. The most up-to-date version, including iPython notebooks containing usage tutorials, is available free-of-charge under an open-source MIT license via GitHub at https://github.com/williamgilpin/pypdb, and the full API reference is at http://williamgilpin.github.io/pypdb_docs/html/. The latest stable release is also available on PyPI. CONTACT wgilpin@stanford.edu.
Collapse
Affiliation(s)
- William Gilpin
- Department of Applied Physics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
16
|
Dias O, Rocha M, Ferreira EC, Rocha I. Reconstructing genome-scale metabolic models with merlin. Nucleic Acids Res 2015; 43:3899-910. [PMID: 25845595 PMCID: PMC4417185 DOI: 10.1093/nar/gkv294] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2015] [Accepted: 03/18/2015] [Indexed: 01/13/2023] Open
Abstract
The Metabolic Models Reconstruction Using Genome-Scale Information (merlin) tool is a user-friendly Java application that aids the reconstruction of genome-scale metabolic models for any organism that has its genome sequenced. It performs the major steps of the reconstruction process, including the functional genomic annotation of the whole genome and subsequent construction of the portfolio of reactions. Moreover, merlin includes tools for the identification and annotation of genes encoding transport proteins, generating the transport reactions for those carriers. It also performs the compartmentalisation of the model, predicting the organelle localisation of the proteins encoded in the genome and thus the localisation of the metabolites involved in the reactions promoted by such enzymes. The gene-proteins-reactions (GPR) associations are automatically generated and included in the model. Finally, merlin expedites the transition from genomic data to draft metabolic models reconstructions exported in the SBML standard format, allowing the user to have a preliminary view of the biochemical network, which can be manually curated within the environment provided by merlin.
Collapse
Affiliation(s)
- Oscar Dias
- Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| | - Eugénio C Ferreira
- Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| | - Isabel Rocha
- Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| |
Collapse
|
17
|
Muth T, Behne A, Heyer R, Kohrs F, Benndorf D, Hoffmann M, Lehtevä M, Reichl U, Martens L, Rapp E. The MetaProteomeAnalyzer: A Powerful Open-Source Software Suite for Metaproteomics Data Analysis and Interpretation. J Proteome Res 2015; 14:1557-65. [DOI: 10.1021/pr501246w] [Citation(s) in RCA: 124] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Thilo Muth
- Max Planck Institute
for Dynamics of Complex Technical Systems, 39106 Magdeburg, Germany
| | - Alexander Behne
- Chair
of Bioprocess Engineering, Otto von Guericke University Magdeburg, 39106 Magdeburg, Germany
| | - Robert Heyer
- Chair
of Bioprocess Engineering, Otto von Guericke University Magdeburg, 39106 Magdeburg, Germany
| | - Fabian Kohrs
- Chair
of Bioprocess Engineering, Otto von Guericke University Magdeburg, 39106 Magdeburg, Germany
| | - Dirk Benndorf
- Chair
of Bioprocess Engineering, Otto von Guericke University Magdeburg, 39106 Magdeburg, Germany
| | - Marcus Hoffmann
- Max Planck Institute
for Dynamics of Complex Technical Systems, 39106 Magdeburg, Germany
| | - Miro Lehtevä
- Department
of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department
of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Udo Reichl
- Max Planck Institute
for Dynamics of Complex Technical Systems, 39106 Magdeburg, Germany
- Chair
of Bioprocess Engineering, Otto von Guericke University Magdeburg, 39106 Magdeburg, Germany
| | - Lennart Martens
- Department
of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department
of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Erdmann Rapp
- Max Planck Institute
for Dynamics of Complex Technical Systems, 39106 Magdeburg, Germany
| |
Collapse
|
18
|
Analysis of the protein domain and domain architecture content in fungi and its application in the search of new antifungal targets. PLoS Comput Biol 2014; 10:e1003733. [PMID: 25033262 PMCID: PMC4102429 DOI: 10.1371/journal.pcbi.1003733] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Accepted: 06/04/2014] [Indexed: 01/25/2023] Open
Abstract
Over the past several years fungal infections have shown an increasing incidence in the susceptible population, and caused high mortality rates. In parallel, multi-resistant fungi are emerging in human infections. Therefore, the identification of new potential antifungal targets is a priority. The first task of this study was to analyse the protein domain and domain architecture content of the 137 fungal proteomes (corresponding to 111 species) available in UniProtKB (UniProt KnowledgeBase) by January 2013. The resulting list of core and exclusive domain and domain architectures is provided in this paper. It delineates the different levels of fungal taxonomic classification: phylum, subphylum, order, genus and species. The analysis highlighted Aspergillus as the most diverse genus in terms of exclusive domain content. In addition, we also investigated which domains could be considered promiscuous in the different organisms. As an application of this analysis, we explored three different ways to detect potential targets for antifungal drugs. First, we compared the domain and domain architecture content of the human and fungal proteomes, and identified those domains and domain architectures only present in fungi. Secondly, we looked for information regarding fungal pathways in public repositories, where proteins containing promiscuous domains could be involved. Three pathways were identified as a result: lovastatin biosynthesis, xylan degradation and biosynthesis of siroheme. Finally, we classified a subset of the studied fungi in five groups depending on their occurrence in clinical samples. We then looked for exclusive domains in the groups that were more relevant clinically and determined which of them had the potential to bind small molecules. Overall, this study provides a comprehensive analysis of the available fungal proteomes and shows three approaches that can be used as a first step in the detection of new antifungal targets. Some fungi have become pathogenic to plants and in a lesser extent to animals. Under certain conditions their presence in the human body can prove a threat for human health, especially for immunocompromised patients. Yet, some fungi can also infect healthy individuals. The low sensitivity of the antifungal drugs available together with the clinically observed resistance of some fungi raises the demand for new alternative treatments. Proteins are biological molecules which perform essential functions within the living organisms. Many of those functions are attributed to the varying folded structure of each protein. These configurations are composed of functional units -also called domains- each one independently responsible for a fraction of the overall biological function. Understanding how the different block combinations are distributed across members of the same or similar families of organisms is important. For instance, exclusive domain combinations can hold particular acquired functions. Blocks displaying a high mobility can play major roles for the organism's survival. The biological goal of this study was to analyse the functional implications of protein domains and domain combinations in the available fungal proteomes. This information can be used to highlight proteins and pathways that could be potentially used as drug targets.
Collapse
|
19
|
Abstract
The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequences and functional annotation. It integrates, interprets and standardizes data from literature and numerous resources to achieve the most comprehensive catalog possible of protein information. The central activities are the biocuration of the UniProt Knowledgebase and the dissemination of these data through our Web site and web services. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads.
Collapse
Affiliation(s)
- The UniProt Consortium
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland, Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street North West, Suite 1200, Washington, DC 20007, USA and Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA
| |
Collapse
|
20
|
Campbell MP, Peterson R, Mariethoz J, Gasteiger E, Akune Y, Aoki-Kinoshita KF, Lisacek F, Packer NH. UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res 2013; 42:D215-21. [PMID: 24234447 PMCID: PMC3964942 DOI: 10.1093/nar/gkt1128] [Citation(s) in RCA: 129] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The UniCarb KnowledgeBase (UniCarbKB; http://unicarbkb.org) offers public access to a growing, curated database of information on the glycan structures of glycoproteins. UniCarbKB is an international effort that aims to further our understanding of structures, pathways and networks involved in glycosylation and glyco-mediated processes by integrating structural, experimental and functional glycoscience information. This initiative builds upon the success of the glycan structure database GlycoSuiteDB, together with the informatic standards introduced by EUROCarbDB, to provide a high-quality and updated resource to support glycomics and glycoproteomics research. UniCarbKB provides comprehensive information concerning glycan structures, and published glycoprotein information including global and site-specific attachment information. For the first release over 890 references, 3740 glycan structure entries and 400 glycoproteins have been curated. Further, 598 protein glycosylation sites have been annotated with experimentally confirmed glycan structures from the literature. Among these are 35 glycoproteins, 502 structures and 60 publications previously not included in GlycoSuiteDB. This article provides an update on the transformation of GlycoSuiteDB (featured in previous NAR Database issues and hosted by ExPASy since 2009) to UniCarbKB and its integration with UniProtKB and GlycoMod. Here, we introduce a refactored database, supported by substantial new curated data collections and intuitive user-interfaces that improve database searching.
Collapse
Affiliation(s)
- Matthew P Campbell
- Biomolecular Frontiers Research Centre, Macquarie University, North Ryde, NSW 2109, Australia, Proteome Informatics Group, Swiss Institute of Bioinformatics, Geneva, Switzerland, Swiss-Prot Group, Swiss Institute of Bioinformatics, Geneva, Switzerland, Department of Bioinformatics, Faculty of Engineering, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo, Japan and Section of Biology, Faculty of Sciences, University of Geneva, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 2013; 41:D43-7. [PMID: 23161681 PMCID: PMC3531094 DOI: 10.1093/nar/gks1068] [Citation(s) in RCA: 543] [Impact Index Per Article: 49.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Revised: 10/11/2012] [Accepted: 10/11/2012] [Indexed: 12/22/2022] Open
Abstract
The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase. It integrates, interprets and standardizes data from numerous resources to achieve the most comprehensive catalogue of protein sequences and functional annotation. UniProt comprises four major components, each optimized for different uses, the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads.
Collapse
Affiliation(s)
- The UniProt Consortium
- The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland, Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street North West, Suite 1200, Washington, DC 20007 and Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA
| |
Collapse
|
22
|
Wang Z, Sagotsky J, Taylor T, Shironoshita P, Deisboeck TS. Accelerating cancer systems biology research through Semantic Web technology. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012. [PMID: 23188758 DOI: 10.1002/wsbm.1200] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Cancer systems biology is an interdisciplinary, rapidly expanding research field in which collaborations are a critical means to advance the field. Yet the prevalent database technologies often isolate data rather than making it easily accessible. The Semantic Web has the potential to help facilitate web-based collaborative cancer research by presenting data in a manner that is self-descriptive, human and machine readable, and easily sharable. We have created a semantically linked online Digital Model Repository (DMR) for storing, managing, executing, annotating, and sharing computational cancer models. Within the DMR, distributed, multidisciplinary, and inter-organizational teams can collaborate on projects, without forfeiting intellectual property. This is achieved by the introduction of a new stakeholder to the collaboration workflow, the institutional licensing officer, part of the Technology Transfer Office. Furthermore, the DMR has achieved silver level compatibility with the National Cancer Institute's caBIG, so users can interact with the DMR not only through a web browser but also through a semantically annotated and secure web service. We also discuss the technology behind the DMR leveraging the Semantic Web, ontologies, and grid computing to provide secure inter-institutional collaboration on cancer modeling projects, online grid-based execution of shared models, and the collaboration workflow protecting researchers' intellectual property.
Collapse
Affiliation(s)
- Zhihui Wang
- Department of Pathology, University of New Mexico, Albuquerque, NM, USA
| | | | | | | | | |
Collapse
|
23
|
Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 2012; 40:D71-5. [PMID: 22102590 PMCID: PMC3245120 DOI: 10.1093/nar/gkr981] [Citation(s) in RCA: 1041] [Impact Index Per Article: 86.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 10/14/2011] [Indexed: 12/14/2022] Open
Abstract
The mission of UniProt is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces. UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. A key development at UniProt is the provision of complete, reference and representative proteomes. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Collapse
Affiliation(s)
- The UniProt Consortium
- The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland, Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven St NW, Suite 1200, Washington, DC 20007 and University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA
| |
Collapse
|
24
|
Magrane M. UniProt Knowledgebase: a hub of integrated protein data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:bar009. [PMID: 21447597 PMCID: PMC3070428 DOI: 10.1093/database/bar009] [Citation(s) in RCA: 1057] [Impact Index Per Article: 81.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The UniProt Knowledgebase (UniProtKB) acts as a central hub of protein knowledge by providing a unified view of protein sequence and functional information. Manual and automatic annotation procedures are used to add data directly to the database while extensive cross-referencing to more than 120 external databases provides access to additional relevant information in more specialized data collections. UniProtKB also integrates a range of data from other resources. All information is attributed to its original source, allowing users to trace the provenance of all data. The UniProt Consortium is committed to using and promoting common data exchange formats and technologies, and UniProtKB data is made freely available in a range of formats to facilitate integration with other databases. Database URL:http://www.uniprot.org/
Collapse
Affiliation(s)
- Michele Magrane
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
25
|
Abstract
In the past decades, a variety of publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. However, there is also an increasing confusion for the researchers who are trying to quickly find the appropriate resources to help them solve their problems. In this chapter, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases and resources that are relevant to comparative proteomics research. We conclude the chapter by discussing the challenges and opportunities for developing new protein bioinformatics databases.
Collapse
|
26
|
Abstract
The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Collapse
Affiliation(s)
- The UniProt Consortium
- The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland, Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven St. NW, Suite 1200, Washington, DC 20007 and University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA
- *To whom correspondence should be addressed. Tel: +44 1223 494435; Fax: +44 1223 494468;
| |
Collapse
|
27
|
Haslam NJ, Gibson TJ. EpiC: An Open Resource for Exploring Epitopes To Aid Antibody-Based Experiments. J Proteome Res 2010; 9:3759-63. [DOI: 10.1021/pr100029f] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Niall J. Haslam
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Toby J. Gibson
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| |
Collapse
|
28
|
Abstract
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Collapse
Affiliation(s)
-
- The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
29
|
Plake C, Royer L, Winnenburg R, Hakenberg J, Schroeder M. GoGene: gene annotation in the fast lane. Nucleic Acids Res 2009; 37:W300-4. [PMID: 19465383 PMCID: PMC2703922 DOI: 10.1093/nar/gkp429] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
High-throughput screens such as microarrays and RNAi screens produce huge amounts of data. They typically result in hundreds of genes, which are often further explored and clustered via enriched GeneOntology terms. The strength of such analyses is that they build on high-quality manual annotations provided with the GeneOntology. However, the weakness is that annotations are restricted to process, function and location and that they do not cover all known genes in model organisms. GoGene addresses this weakness by complementing high-quality manual annotation with high-throughput text mining extracting co-occurrences of genes and ontology terms from literature. GoGene contains over 4 000 000 associations between genes and gene-related terms for 10 model organisms extracted from more than 18 000 000 PubMed entries. It does not cover only process, function and location of genes, but also biomedical categories such as diseases, compounds, techniques and mutations. By bringing it all together, GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. GoGene accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. Since all associations of genes to terms are supported by evidence in the literature, the results are transparent and can be verified by the user. GoGene is available at http://gopubmed.org/gogene.
Collapse
Affiliation(s)
- Conrad Plake
- Biotechnology Center, Technische Universität Dresden, 01307 Dresden, Germany.
| | | | | | | | | |
Collapse
|