1
|
Deutsch EW, Bandeira N, Perez-Riverol Y, Sharma V, Carver J, Mendoza L, Kundu DJ, Wang S, Bandla C, Kamatchinathan S, Hewapathirana S, Pullman B, Wertz J, Sun Z, Kawano S, Okuda S, Watanabe Y, MacLean B, MacCoss M, Zhu Y, Ishihama Y, Vizcaíno J. The ProteomeXchange consortium at 10 years: 2023 update. Nucleic Acids Res 2023; 51:D1539-D1548. [PMID: 36370099 PMCID: PMC9825490 DOI: 10.1093/nar/gkac1040] [Citation(s) in RCA: 343] [Impact Index Per Article: 171.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/20/2022] [Accepted: 10/23/2022] [Indexed: 11/13/2022] Open
Abstract
Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.
Collapse
Affiliation(s)
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Jeremy J Carver
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Luis Mendoza
- Institute for Systems Biology, Seattle WA 98109, USA
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Benjamin S Pullman
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Julie Wertz
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Zhi Sun
- Institute for Systems Biology, Seattle WA 98109, USA
| | - Shin Kawano
- Faculty of Contemporary Society, Toyama University of International Studies, Toyama 930-1292, Japan
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- School of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Yu Watanabe
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | | | | | - Yunping Zhu
- Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
2
|
Niarakis A, Waltemath D, Glazier J, Schreiber F, Keating SM, Nickerson D, Chaouiya C, Siegel A, Noël V, Hermjakob H, Helikar T, Soliman S, Calzone L. Addressing barriers in comprehensiveness, accessibility, reusability, interoperability and reproducibility of computational models in systems biology. Brief Bioinform 2022; 23:bbac212. [PMID: 35671510 PMCID: PMC9294410 DOI: 10.1093/bib/bbac212] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/20/2022] [Accepted: 05/06/2022] [Indexed: 11/14/2022] Open
Abstract
Computational models are often employed in systems biology to study the dynamic behaviours of complex systems. With the rise in the number of computational models, finding ways to improve the reusability of these models and their ability to reproduce virtual experiments becomes critical. Correct and effective model annotation in community-supported and standardised formats is necessary for this improvement. Here, we present recent efforts toward a common framework for annotated, accessible, reproducible and interoperable computational models in biology, and discuss key challenges of the field.
Collapse
Affiliation(s)
- Anna Niarakis
- Université Paris-Saclay, Laboratoire Européen de Recherche pour la Polyarthrite rhumatoïde - Genhotel, Univ Evry, Evry, France
- Lifeware Group, Inria, Saclay-île de France, 91120 Palaiseau, France
| | - Dagmar Waltemath
- Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany
| | - James Glazier
- Biocomplexity Institute and Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN, USA
| | - Falk Schreiber
- Department of Computer and Information Science, University of Konstanz, Konstanz, Germany
- Faculty of Information Technology, Monash University, Clayton, Australia
| | | | - David Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | | | - Anne Siegel
- Univ Rennes, CNRS, Inria - IRISA lab. Rennes
| | - Vincent Noël
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Henning Hermjakob
- EMBL-European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Tomáš Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Sylvain Soliman
- Lifeware Group, Inria, Saclay-île de France, 91120 Palaiseau, France
| | - Laurence Calzone
- Institut Curie, PSL Research University, Paris, France
- INSERM, U900, Paris, France
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| |
Collapse
|
3
|
Hanspers K, Kutmon M, Coort SL, Digles D, Dupuis LJ, Ehrhart F, Hu F, Lopes EN, Martens M, Pham N, Shin W, Slenter DN, Waagmeester A, Willighagen EL, Winckers LA, Evelo CT, Pico AR. Ten simple rules for creating reusable pathway models for computational analysis and visualization. PLoS Comput Biol 2021; 17:e1009226. [PMID: 34411100 PMCID: PMC8375987 DOI: 10.1371/journal.pcbi.1009226] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Kristina Hanspers
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
| | - Martina Kutmon
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Susan L. Coort
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Daniela Digles
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Lauren J. Dupuis
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Finterly Hu
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Elisson N. Lopes
- Instituto de Ciencias Biologicas, Departamento de Bioquimica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Marvin Martens
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Nhung Pham
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Woosub Shin
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Denise N. Slenter
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | | | - Egon L. Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Laurent A. Winckers
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Alexander R. Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
4
|
Galgonek J, Vondrášek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform 2021; 13:38. [PMID: 33980298 PMCID: PMC8117646 DOI: 10.1186/s13321-021-00515-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/23/2021] [Indexed: 11/12/2022] Open
Abstract
The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| |
Collapse
|