1
|
Bittremieux W, Bouyssié D, Dorfer V, Locard-Paulet M, Perez-Riverol Y, Schwämmle V, Uszkoreit J, Van Den Bossche T. The European Bioinformatics Community for Mass Spectrometry (EuBIC-MS): an open community for bioinformatics training and research. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2025; 39 Suppl 1:e9087. [PMID: 33861485 DOI: 10.1002/rcm.9087] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 02/13/2021] [Accepted: 03/18/2021] [Indexed: 06/12/2023]
Abstract
The European Bioinformatics Community for Mass Spectrometry (EuBIC-MS; eubic-ms.org) was founded in 2014 to unite European computational mass spectrometry researchers and proteomics bioinformaticians working in academia and industry. EuBIC-MS maintains educational resources (proteomics-academy.org) and organises workshops at national and international conferences on proteomics and mass spectrometry. Furthermore, EuBIC-MS is actively involved in several community initiatives such as the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI). Apart from these collaborations, EuBIC-MS has organised two Winter Schools and two Developers' Meetings that have contributed to the strengthening of the European mass spectrometry network and fostered international collaboration in this field, even beyond Europe. Moreover, EuBIC-MS is currently actively developing a community-driven standard dedicated to mass spectrometry data annotation (SDRF-Proteomics) that will facilitate data reuse and collaboration. This manuscript highlights what EuBIC-MS is, what it does, and what it already has achieved. A warm invitation is extended to new researchers at all career stages to join the EuBIC-MS community on its Slack channel (eubic.slack.com).
Collapse
Affiliation(s)
- Wout Bittremieux
- European Bioinformatics Community for Mass Spectrometry, Belgium
- University of California San Diego, La Jolla, CA, USA
- University of Antwerp, Antwerp, Belgium
| | - David Bouyssié
- European Bioinformatics Community for Mass Spectrometry, Belgium
- IPBS, University of Toulouse, CNRS, UPS, Toulouse, France
| | - Viktoria Dorfer
- European Bioinformatics Community for Mass Spectrometry, Belgium
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg, Austria
| | - Marie Locard-Paulet
- European Bioinformatics Community for Mass Spectrometry, Belgium
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark
| | - Yasset Perez-Riverol
- European Bioinformatics Community for Mass Spectrometry, Belgium
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Veit Schwämmle
- European Bioinformatics Community for Mass Spectrometry, Belgium
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Julian Uszkoreit
- European Bioinformatics Community for Mass Spectrometry, Belgium
- Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, Ruhr University Bochum, Bochum, Germany
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, Bochum, Germany
| | - Tim Van Den Bossche
- European Bioinformatics Community for Mass Spectrometry, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
2
|
Klein J, Lam H, Mak TD, Bittremieux W, Perez-Riverol Y, Gabriels R, Shofstahl J, Hecht H, Binz PA, Kawano S, Van Den Bossche T, Carver J, Neely BA, Mendoza L, Suomi T, Claeys T, Payne T, Schulte D, Sun Z, Hoffmann N, Zhu Y, Neumann S, Jones AR, Bandeira N, Vizcaíno JA, Deutsch EW. The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF. Anal Chem 2024; 96:18491-18501. [PMID: 39514576 PMCID: PMC11579979 DOI: 10.1021/acs.analchem.4c04091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 10/16/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024]
Abstract
Mass spectral libraries are collections of reference spectra, usually associated with specific analytes from which the spectra were generated, that are used for further downstream analysis of new spectra. There are many different formats used for encoding spectral libraries, but none have undergone a standardization process to ensure broad applicability to many applications. As part of the Human Proteome Organization Proteomics Standards Initiative (PSI), we have developed a standardized format for encoding spectral libraries, called mzSpecLib (https://psidev.info/mzSpecLib). It is primarily a data model that flexibly encodes metadata about the library entries using the extensible PSI-MS controlled vocabulary and can be encoded in and converted between different serialization formats. We have also developed a standardized data model and serialization for fragment ion peak annotations, called mzPAF (https://psidev.info/mzPAF). It is defined as a separate standard, since it may be used for other applications besides spectral libraries. The mzSpecLib and mzPAF standards are compatible with existing PSI standards such as ProForma 2.0 and the Universal Spectrum Identifier. The mzSpecLib and mzPAF standards have been primarily defined for peptides in proteomics applications with basic small molecule support. They could be extended in the future to other fields that need to encode spectral libraries for nonpeptidic analytes.
Collapse
Affiliation(s)
- Joshua Klein
- Program
for Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, 999077 Hong Kong, P. R. China
| | - Tytus D. Mak
- Mass
Spectrometry Data Center, National Institute
of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jim Shofstahl
- Thermo
Fisher
Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Helge Hecht
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 60200 Brno, Czech Republic
| | | | - Shin Kawano
- Database
Center for Life Science, Joint Support Center
for Data Science Research, Research Organization of Information and
Systems, Chiba 277-0871, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
| | - Benjamin A. Neely
- National
Institute of Standards and Technology (NIST) Charleston, Charleston, South Carolina 29412, United States
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tomi Suomi
- Turku Bioscience
Centre, University of Turku and Åbo
Akademi University, FI-20520 Turku, Finland
| | - Tine Claeys
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Thomas Payne
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Douwe Schulte
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584,
CH, Utrecht, The
Netherlands
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Steffen Neumann
- Computational
Plant Biochemistry, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, United Kingdom
| | - Nuno Bandeira
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
3
|
Ren L, Shi L, Zheng Y. Reference Materials for Improving Reliability of Multiomics Profiling. PHENOMICS (CHAM, SWITZERLAND) 2024; 4:487-521. [PMID: 39723231 PMCID: PMC11666855 DOI: 10.1007/s43657-023-00153-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 12/18/2023] [Accepted: 12/22/2023] [Indexed: 12/28/2024]
Abstract
High-throughput technologies for multiomics or molecular phenomics profiling have been extensively adopted in biomedical research and clinical applications, offering a more comprehensive understanding of biological processes and diseases. Omics reference materials play a pivotal role in ensuring the accuracy, reliability, and comparability of laboratory measurements and analyses. However, the current application of omics reference materials has revealed several issues, including inappropriate selection and underutilization, leading to inconsistencies across laboratories. This review aims to address these concerns by emphasizing the importance of well-characterized reference materials at each level of omics, encompassing (epi-)genomics, transcriptomics, proteomics, and metabolomics. By summarizing their characteristics, advantages, and limitations along with appropriate performance metrics pertinent to study purposes, we provide an overview of how omics reference materials can enhance data quality and data integration, thus fostering robust scientific investigations with omics technologies.
Collapse
Affiliation(s)
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438 China
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438 China
- Shanghai Cancer Center, Fudan University, Shanghai, 200032 China
- International Human Phenome Institutes, Shanghai, 200438 China
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, 200438 China
| |
Collapse
|
4
|
Combe CW, Kolbowski L, Fischer L, Koskinen V, Klein J, Leitner A, Jones AR, Vizcaíno JA, Rappsilber J. mzIdentML 1.3.0 - Essential progress on the support of crosslinking and other identifications based on multiple spectra. Proteomics 2024; 24:e2300385. [PMID: 39001627 DOI: 10.1002/pmic.202300385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/07/2024] [Accepted: 02/09/2024] [Indexed: 10/10/2024]
Abstract
The mzIdentML data format, originally developed by the Proteomics Standards Initiative in 2011, is the open XML data standard for peptide and protein identification results coming from mass spectrometry. We present mzIdentML version 1.3.0, which introduces new functionality and support for additional use cases. First of all, a new mechanism for encoding identifications based on multiple spectra has been introduced. Furthermore, the main mzIdentML specification document can now be supplemented by extension documents which provide further guidance for encoding specific use cases for different proteomics subfields. One extension document has been added, covering additional use cases for the encoding of crosslinked peptide identifications. The ability to add extension documents facilitates keeping the mzIdentML standard up to date with advances in the proteomics field, without having to change the main specification document. The crosslinking extension document provides further explanation of the crosslinking use cases already supported in mzIdentML version 1.2.0, and provides support for encoding additional scenarios that are critical to reflect developments in the crosslinking field and facilitate its integration in structural biology. These are: (i) support for cleavable crosslinkers, (ii) support for internally linked peptides, (iii) support for noncovalently associated peptides, and (iv) improved support for encoding scores and the corresponding thresholds.
Collapse
Affiliation(s)
- Colin W Combe
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | - Lars Kolbowski
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | - Lutz Fischer
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | | | - Joshua Klein
- Program for Bioinformatics, Boston University, Boston, Massachusetts, USA
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Andrew R Jones
- Department of Biochemistry & Systems Biology, University of Liverpool, Liverpool, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL-EBI), Hinxton, Cambridge, UK
| | - Juri Rappsilber
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| |
Collapse
|
5
|
Combe CW, Graham M, Kolbowski L, Fischer L, Rappsilber J. xiVIEW: Visualisation of Crosslinking Mass Spectrometry Data. J Mol Biol 2024; 436:168656. [PMID: 39237202 DOI: 10.1016/j.jmb.2024.168656] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/17/2024] [Accepted: 06/07/2024] [Indexed: 09/07/2024]
Abstract
Crosslinking mass spectrometry (MS) has emerged as an important technique for elucidating the in-solution structures of protein complexes and the topology of protein-protein interaction networks. However, the expanding user community lacked an integrated visualisation tool that helped them make use of the crosslinking data for investigating biological mechanisms. We addressed this need by developing xiVIEW, a web-based application designed to streamline crosslinking MS data analysis, which we present here. xiVIEW provides a user-friendly interface for accessing coordinated views of mass spectrometric data, network visualisation, annotations extracted from trusted repositories like UniProtKB, and available 3D structures. In accordance with recent recommendations from the crosslinking MS community, xiVIEW (i) provides a standards compliant parser to improve data integration and (ii) offers accessible visualisation tools. By promoting the adoption of standard file formats and providing a comprehensive visualisation platform, xiVIEW empowers both experimentalists and modellers alike to pursue their respective research interests. We anticipate that xiVIEW will advance crosslinking MS-inspired research, and facilitate broader and more effective investigations into complex biological systems.
Collapse
Affiliation(s)
- Colin W Combe
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK.
| | - Martin Graham
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK
| | - Lars Kolbowski
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK; Technische Universität Berlin, 10623 Berlin, Germany
| | - Lutz Fischer
- Technische Universität Berlin, 10623 Berlin, Germany.
| | - Juri Rappsilber
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK; Technische Universität Berlin, 10623 Berlin, Germany.
| |
Collapse
|
6
|
Jiang Y, Rex DA, Schuster D, Neely BA, Rosano GL, Volkmar N, Momenzadeh A, Peters-Clarke TM, Egbert SB, Kreimer S, Doud EH, Crook OM, Yadav AK, Vanuopadath M, Hegeman AD, Mayta M, Duboff AG, Riley NM, Moritz RL, Meyer JG. Comprehensive Overview of Bottom-Up Proteomics Using Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:338-417. [PMID: 39193565 PMCID: PMC11348894 DOI: 10.1021/acsmeasuresciau.3c00068] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 08/29/2024]
Abstract
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this Review will serve as a handbook for researchers who are new to the field of bottom-up proteomics.
Collapse
Affiliation(s)
- Yuming Jiang
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Devasahayam Arokia
Balaya Rex
- Center for
Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Dina Schuster
- Department
of Biology, Institute of Molecular Systems
Biology, ETH Zurich, Zurich 8093, Switzerland
- Department
of Biology, Institute of Molecular Biology
and Biophysics, ETH Zurich, Zurich 8093, Switzerland
- Laboratory
of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
| | - Benjamin A. Neely
- Chemical
Sciences Division, National Institute of
Standards and Technology, NIST, Charleston, South Carolina 29412, United States
| | - Germán L. Rosano
- Mass
Spectrometry
Unit, Institute of Molecular and Cellular
Biology of Rosario, Rosario, 2000 Argentina
| | - Norbert Volkmar
- Department
of Biology, Institute of Molecular Systems
Biology, ETH Zurich, Zurich 8093, Switzerland
| | - Amanda Momenzadeh
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Trenton M. Peters-Clarke
- Department
of Pharmaceutical Chemistry, University
of California—San Francisco, San Francisco, California, 94158, United States
| | - Susan B. Egbert
- Department
of Chemistry, University of Manitoba, Winnipeg, Manitoba, R3T 2N2 Canada
| | - Simion Kreimer
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Emma H. Doud
- Center
for Proteome Analysis, Indiana University
School of Medicine, Indianapolis, Indiana, 46202-3082, United States
| | - Oliver M. Crook
- Oxford
Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United
Kingdom
| | - Amit Kumar Yadav
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster 3rd Milestone Faridabad-Gurgaon
Expressway, Faridabad, Haryana 121001, India
| | | | - Adrian D. Hegeman
- Departments
of Horticultural Science and Plant and Microbial Biology, University of Minnesota, Twin Cities, Minnesota 55108, United States
| | - Martín
L. Mayta
- School
of Medicine and Health Sciences, Center for Health Sciences Research, Universidad Adventista del Plata, Libertador San Martin 3103, Argentina
- Molecular
Biology Department, School of Pharmacy and Biochemistry, Universidad Nacional de Rosario, Rosario 2000, Argentina
| | - Anna G. Duboff
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Nicholas M. Riley
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Robert L. Moritz
- Institute
for Systems biology, Seattle, Washington 98109, United States
| | - Jesse G. Meyer
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| |
Collapse
|
7
|
Shome M, MacKenzie TMG, Subbareddy SR, Snyder MP. The Importance, Challenges, and Possible Solutions for Sharing Proteomics Data While Safeguarding Individuals' Privacy. Mol Cell Proteomics 2024; 23:100731. [PMID: 38331191 PMCID: PMC10915627 DOI: 10.1016/j.mcpro.2024.100731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 01/28/2024] [Accepted: 02/05/2024] [Indexed: 02/10/2024] Open
Abstract
Proteomics data sharing has profound benefits at the individual level as well as at the community level. While data sharing has increased over the years, mostly due to journal and funding agency requirements, the reluctance of researchers with regard to data sharing is evident as many shares only the bare minimum dataset required to publish an article. In many cases, proper metadata is missing, essentially making the dataset useless. This behavior can be explained by a lack of incentives, insufficient awareness, or a lack of clarity surrounding ethical issues. Through adequate training at research institutes, researchers can realize the benefits associated with data sharing and can accelerate the norm of data sharing for the field of proteomics, as has been the standard in genomics for decades. In this article, we have put together various repository options available for proteomics data. We have also added pros and cons of those repositories to facilitate researchers in selecting the repository most suitable for their data submission. It is also important to note that a few types of proteomics data have the potential to re-identify an individual in certain scenarios. In such cases, extra caution should be taken to remove any personal identifiers before sharing on public repositories. Data sets that will be useless without personal identifiers need to be shared in a controlled access repository so that only authorized researchers can access the data and personal identifiers are kept safe.
Collapse
Affiliation(s)
- Mahasish Shome
- Department of Genetics, Stanford University, Palo Alto, California, USA
| | - Tim M G MacKenzie
- Department of Genetics, Stanford University, Palo Alto, California, USA
| | | | - Michael P Snyder
- Department of Genetics, Stanford University, Palo Alto, California, USA.
| |
Collapse
|
8
|
Walzer M, Jeong K, Tabb DL, Vizcaíno JA. TopDownApp: An open and modular platform for analysis and visualisation of top-down proteomics data. Proteomics 2024; 24:e2200403. [PMID: 37787899 DOI: 10.1002/pmic.202200403] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/13/2023] [Accepted: 09/13/2023] [Indexed: 10/04/2023]
Abstract
Although Top-down (TD) proteomics techniques, aimed at the analysis of intact proteins and proteoforms, are becoming increasingly popular, efforts are needed at different levels to generalise their adoption. In this context, there are numerous improvements that are possible in the area of open science practices, including a greater application of the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. These include, for example, increased data sharing practices and readily available open data standards. Additionally, the field would benefit from the development of open data analysis workflows that can enable data reuse of public datasets, something that is increasingly common in other proteomics fields.
Collapse
Affiliation(s)
- Mathias Walzer
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
| | - David L Tabb
- Institut Pasteur, Université Paris Cité, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris, France
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
9
|
Jiang Y, Rex DAB, Schuster D, Neely BA, Rosano GL, Volkmar N, Momenzadeh A, Peters-Clarke TM, Egbert SB, Kreimer S, Doud EH, Crook OM, Yadav AK, Vanuopadath M, Mayta ML, Duboff AG, Riley NM, Moritz RL, Meyer JG. Comprehensive Overview of Bottom-Up Proteomics using Mass Spectrometry. ARXIV 2023:arXiv:2311.07791v1. [PMID: 38013887 PMCID: PMC10680866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods to aid the novice and experienced researcher. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this work to serve as a basic resource for new practitioners in the field of shotgun or bottom-up proteomics.
Collapse
Affiliation(s)
- Yuming Jiang
- Department of Computational Biomedicine, Cedars Sinai Medical Center
| | - Devasahayam Arokia Balaya Rex
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Dina Schuster
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland; Department of Biology, Institute of Molecular Biology and Biophysics, ETH Zurich, Zurich 8093, Switzerland; Laboratory of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
| | - Benjamin A. Neely
- Chemical Sciences Division, National Institute of Standards and Technology, NIST Charleston · Funded by NIST
| | - Germán L. Rosano
- Mass Spectrometry Unit, Institute of Molecular and Cellular Biology of Rosario, Rosario, Argentina · Funded by Grant PICT 2019-02971 (Agencia I+D+i)
| | - Norbert Volkmar
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland
| | - Amanda Momenzadeh
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California, USA
| | | | - Susan B. Egbert
- Department of Chemistry, University of Manitoba, Winnipeg, Cananda
| | - Simion Kreimer
- Smidt Heart Institute, Cedars Sinai Medical Center; Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center
| | - Emma H. Doud
- Center for Proteome Analysis, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Oliver M. Crook
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute · Funded by Grant BT/PR16456/BID/7/624/2016 (Department of Biotechnology, India); Grant Translational Research Program (TRP) at THSTI funded by DBT
| | - Muralidharan Vanuopadath
- School of Biotechnology, Amrita Vishwa Vidyapeetham, Kollam-690 525, Kerala, India · Funded by Department of Health Research, Indian Council of Medical Research, Government of India (File No.R.12014/31/2022-HR)
| | - Martín L. Mayta
- School of Medicine and Health Sciences, Center for Health Sciences Research, Universidad Adventista del Plata, Libertador San Martín 3103, Argentina; Molecular Biology Department, School of Pharmacy and Biochemistry, Universidad Nacional de Rosario, Rosario 2000, Argentina
| | - Anna G. Duboff
- Department of Chemistry, University of Washington · Funded by Summer Research Acceleration Fellowship, Department of Chemistry, University of Washington
| | - Nicholas M. Riley
- Department of Chemistry, University of Washington · Funded by National Institutes of Health Grant R00 GM147304
| | - Robert L. Moritz
- Institute for Systems biology, Seattle, WA, USA, 98109 · Funded by National Institutes of Health Grants R01GM087221, R24GM127667, U19AG023122, S10OD026936; National Science Foundation Award 1920268
| | - Jesse G. Meyer
- Department of Computational Biomedicine, Cedars Sinai Medical Center · Funded by National Institutes of Health Grant R21 AG074234; National Institutes of Health Grant R35 GM142502
| |
Collapse
|
10
|
Omenn GS, Lane L, Overall CM, Pineau C, Packer NH, Cristea IM, Lindskog C, Weintraub ST, Orchard S, Roehrl MH, Nice E, Liu S, Bandeira N, Chen YJ, Guo T, Aebersold R, Moritz RL, Deutsch EW. The 2022 Report on the Human Proteome from the HUPO Human Proteome Project. J Proteome Res 2023; 22:1024-1042. [PMID: 36318223 PMCID: PMC10081950 DOI: 10.1021/acs.jproteome.2c00498] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 407 (93.2%) of the 19 750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".
Collapse
Affiliation(s)
- Gilbert S. Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and University of Geneva, 1015 Lausanne, Switzerland
| | | | - Charles Pineau
- French Institute of Health and Medical Research, 35042 RENNES Cedex, France
| | - Nicolle H. Packer
- Macquarie University, Sydney, NSW 2109, Australia
- Griffith University’s Institute for Glycomics, Sydney, NSW 2109, Australia
| | | | | | - Susan T. Weintraub
- University of Texas Health Science Center-San Antonio, San Antonio, Texas 78229-3900, United States
| | - Sandra Orchard
- EMBL-EBI, Hinxton, Cambridgeshire, CB10 1SD, United Kingdom
| | - Michael H.A. Roehrl
- Memorial Sloan Kettering Cancer Center, New York, New York, 10065, United States
| | | | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, California 92093, United States
| | - Yu-Ju Chen
- National Taiwan University, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Tiannan Guo
- Westlake University Guomics Laboratory of Big Proteomic Data, Hangzhou 310024, Zhejiang Province, China
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology in ETH Zurich, 8092 Zurich, Switzerland
| | - Robert L. Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
11
|
Liao W, Zhang X. Patpat: a public proteomics dataset search framework. Bioinformatics 2023; 39:7028483. [PMID: 36744907 PMCID: PMC9933831 DOI: 10.1093/bioinformatics/btad076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/23/2022] [Accepted: 02/03/2023] [Indexed: 02/07/2023] Open
Abstract
SUMMARY As the FAIR (Findable, Accessible, Interoperable, Reusable) principles have become widely accepted in the proteomics field, under the guidance of ProteomeXchange and The Human Proteome Organization Proteomics Standards Initiative, proteomics public databases have been providing Application Programming Interfaces for programmatic access. Based on generating logic from proteomics data, we present Patpat, an extensible framework for searching public datasets, merging results from multiple databases to help researchers find their proteins of interest in the vast mass spectrometry. Patpat's 2D strategy of combining results from multiple databases allows users to provide only protein identifiers to obtain metadata for relevant datasets, improving the 'Findable' of proteomics data. AVAILABILITY AND IMPLEMENTATION The Patpat framework is released under the Apache 2.0 license open source, and the source code is stored on GitHub (https://github.com/henry-leo/Patpat) and is freely available. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weiheng Liao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Xuelian Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai 200438, China
| |
Collapse
|
12
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
13
|
Bittremieux W, Levitsky L, Pilz M, Sachsenberg T, Huber F, Wang M, Dorrestein PC. Unified and Standardized Mass Spectrometry Data Processing in Python Using spectrum_utils. J Proteome Res 2023; 22:625-631. [PMID: 36688502 DOI: 10.1021/acs.jproteome.2c00632] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
spectrum_utils is a Python package for mass spectrometry data processing and visualization. Since its introduction, spectrum_utils has grown into a fundamental software solution that powers various applications in proteomics and metabolomics, ranging from spectrum preprocessing prior to spectrum identification and machine learning applications to spectrum plotting from online data repositories and assisting data analysis tasks for dozens of other projects. Here, we present updates to spectrum_utils, which include new functionality to integrate mass spectrometry community data standards, enhanced mass spectral data processing, and unified mass spectral data visualization in Python. spectrum_utils is freely available as open source at https://github.com/bittremieux/spectrum_utils.
Collapse
Affiliation(s)
- Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Lev Levitsky
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
| | - Matteo Pilz
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| | - Timo Sachsenberg
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| | - Florian Huber
- Centre for Digitalisation and Digitality, University of Applied Sciences Düsseldorf, 40476 Düsseldorf, Germany
| | - Mingxun Wang
- Department of Computer Science, University of California─Riverside, Riverside, California 92507, United States
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California─San Diego, La Jolla, California 92093, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California─San Diego, La Jolla California 92093, United States
| |
Collapse
|
14
|
Deutsch EW, Vizcaíno JA, Jones AR, Binz PA, Lam H, Klein J, Bittremieux W, Perez-Riverol Y, Tabb DL, Walzer M, Ricard-Blum S, Hermjakob H, Neumann S, Mak TD, Kawano S, Mendoza L, Van Den Bossche T, Gabriels R, Bandeira N, Carver J, Pullman B, Sun Z, Hoffmann N, Shofstahl J, Zhu Y, Licata L, Quaglia F, Tosatto SCE, Orchard SE. Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work. J Proteome Res 2023; 22:287-301. [PMID: 36626722 PMCID: PMC9903322 DOI: 10.1021/acs.jproteome.2c00637] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Indexed: 01/11/2023]
Abstract
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Pierre-Alain Binz
- Clinical
Chemistry Service, Lausanne University Hospital, 1011 976 Lausanne, Switzerland
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, P. R. China.
| | - Joshua Klein
- Program for
Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Wout Bittremieux
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David L. Tabb
- SA MRC
Centre for TB Research, DST/NRF Centre of Excellence for Biomedical
TB Research, Division of Molecular Biology and Human Genetics, Faculty
of Medicine and Health Sciences, Stellenbosch
University, Cape Town 7602, South Africa
| | - Mathias Walzer
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sylvie Ricard-Blum
- Univ.
Lyon, Université Lyon 1, ICBMS, UMR 5246, 69622 Villeurbanne, France
| | - Henning Hermjakob
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Steffen Neumann
- Bioinformatics
and Scientific Data, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), 04103 Halle-Jena-Leipzig, Germany
| | - Tytus D. Mak
- Mass Spectrometry
Data Center, National Institute of Standards
and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United
States
| | - Shin Kawano
- Database
Center for Life Science, Joint Support Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- Faculty
of Contemporary Society, Toyama University
of International Studies, Toyama 930-1292, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Nuno Bandeira
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Benjamin Pullman
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Jim Shofstahl
- Thermo
Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Luana Licata
- Fondazione
Human Technopole, 20157 Milan, Italy
- Department
of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Quaglia
- Institute
of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), 70126 Bari, Italy
- Department
of Biomedical Sciences, University of Padova, 35131 Padova, Italy
| | | | - Sandra E. Orchard
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
15
|
Heil BJ, Greene CS. The Field-Dependent Nature of PageRank Values in Citation Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522943. [PMID: 36711900 PMCID: PMC9881996 DOI: 10.1101/2023.01.05.522943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
The value of scientific research can be easier to assess at the collective level than at the level of individual contributions. Several journal-level and article-level metrics aim to measure the importance of journals or individual manuscripts. However, many are citation-based and citation practices vary between fields. To account for these differences, scientists have devised normalization schemes to make metrics more comparable across fields. We use PageRank as an example metric and examine the extent to which field-specific citation norms drive estimated importance differences. In doing so, we recapitulate differences in journal and article PageRanks between fields. We also find that manuscripts shared between fields have different PageRanks depending on which field's citation network the metric is calculated in. We implement a degree-preserving graph shuffling algorithm to generate a null distribution of similar networks and find differences more likely attributed to field-specific preferences than citation norms. Our results suggest that while differences exist between fields' metric distributions, applying metrics in a field-aware manner rather than using normalized global metrics avoids losing important information about article preferences. They also imply that assigning a single importance value to a manuscript may not be a useful construct, as the importance of each manuscript varies by the reader's field.
Collapse
Affiliation(s)
- Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania
| | - Casey S. Greene
- Department of Pharmacology, University of Colorado School of Medicine; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine
| |
Collapse
|
16
|
Deutsch EW, Bandeira N, Perez-Riverol Y, Sharma V, Carver J, Mendoza L, Kundu DJ, Wang S, Bandla C, Kamatchinathan S, Hewapathirana S, Pullman B, Wertz J, Sun Z, Kawano S, Okuda S, Watanabe Y, MacLean B, MacCoss M, Zhu Y, Ishihama Y, Vizcaíno J. The ProteomeXchange consortium at 10 years: 2023 update. Nucleic Acids Res 2023; 51:D1539-D1548. [PMID: 36370099 PMCID: PMC9825490 DOI: 10.1093/nar/gkac1040] [Citation(s) in RCA: 361] [Impact Index Per Article: 180.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/20/2022] [Accepted: 10/23/2022] [Indexed: 11/13/2022] Open
Abstract
Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.
Collapse
Affiliation(s)
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Jeremy J Carver
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Luis Mendoza
- Institute for Systems Biology, Seattle WA 98109, USA
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Benjamin S Pullman
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Julie Wertz
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Zhi Sun
- Institute for Systems Biology, Seattle WA 98109, USA
| | - Shin Kawano
- Faculty of Contemporary Society, Toyama University of International Studies, Toyama 930-1292, Japan
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- School of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Yu Watanabe
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | | | | | - Yunping Zhu
- Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
17
|
Thakur M, Bateman A, Brooksbank C, Freeberg M, Harrison M, Hartley M, Keane T, Kleywegt G, Leach A, Levchenko M, Morgan S, McDonagh E, Orchard S, Papatheodorou I, Velankar S, Vizcaino J, Witham R, Zdrazil B, McEntyre J. EMBL's European Bioinformatics Institute (EMBL-EBI) in 2022. Nucleic Acids Res 2023; 51:D9-D17. [PMID: 36477213 PMCID: PMC9825486 DOI: 10.1093/nar/gkac1098] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 10/21/2022] [Accepted: 10/31/2022] [Indexed: 12/13/2022] Open
Abstract
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.
Collapse
Affiliation(s)
| | - Alex Bateman
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Cath Brooksbank
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mallory Freeberg
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Melissa Harrison
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hartley
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Thomas Keane
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gerard Kleywegt
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Andrew Leach
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mariia Levchenko
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sarah Morgan
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ellen M McDonagh
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
- OpenTargets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sandra Orchard
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Irene Papatheodorou
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sameer Velankar
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Juan Antonio Vizcaino
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Rick Witham
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Barbara Zdrazil
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | | |
Collapse
|
18
|
de Souza LP, Fernie AR. Databases and Tools to Investigate Protein-Metabolite Interactions. Methods Mol Biol 2023; 2554:231-249. [PMID: 36178629 DOI: 10.1007/978-1-0716-2624-5_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein-metabolite interactions (PMIs) are directly responsible for the regulation of numerous processes. From the direct regulation of enzymes to complex developmental processes intermediated by hormones, PMIs are central to understanding the molecular mechanisms of important physiological phenomena. Still, proving such interactions experimentally has proven an arduous task. We discuss here some of the current technologies contributing to expand our knowledge on PMIs, with particular emphasis on platforms and databases to explore the highly heterogenous nature of characterized PMIs, which is likely to be an essential resource on the development of new computational approaches to predict and validate interactions based on large-scale PMI screenings.
Collapse
Affiliation(s)
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Potsdam-Golm, Germany.
| |
Collapse
|
19
|
Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 2022; 18:94. [PMID: 36409434 PMCID: PMC10284100 DOI: 10.1007/s11306-022-01947-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/19/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. AIM OF REVIEW We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
20
|
Bioinformatics in bioscience and bioengineering: Recent advances, applications, and perspectives. J Biosci Bioeng 2022; 134:363-373. [PMID: 36127250 DOI: 10.1016/j.jbiosc.2022.08.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 07/27/2022] [Accepted: 08/14/2022] [Indexed: 11/24/2022]
Abstract
Recent advances have led to the emergence of highly comprehensive and analytical approaches, such as omics analysis and high-resolution, time-resolved bioimaging analysis. These technologies have made it possible to obtain vast data from a single measurement. Subsequently, large datasets have pioneered the data-driven approach, an alternative to the traditional hypothesis-testing system, for researchers. However, processing, interpreting, and elucidating enormous datasets is no longer possible without computation. Bioinformatics is a field that has developed over long periods, intending to understand biological phenomena using methods collected from information science and statistics, thus solving this proposed research challenge. This review presents the latest methodologies and applications in sequencing, imaging, and mass spectrometry that were developed using bioinformatics. We presented the features of individual techniques and outlines in each part, avoiding the use of complex algorithms and formulas to allow beginning researchers to understand an overview. In the section on sequencing, we focused on comparative genomic, transcriptomic, and bacterial microbiome analyses, which are frequently used as applications of next-generation sequencing. Bioinformatic methods for handling sequence data and case studies were described. In the section on imaging, we introduced the analytical methods and microscopy imaging informatics techniques used in animal cell biology and plant physiology. We introduce informatics technologies for maximizing the value of measured data, including predicting the structure of unknown molecules and untargeted analysis in the section on mass spectrometry. Finally, we discuss the future outlook of this field. We anticipate that this review will assist biologists in using bioinformatics more effectively.
Collapse
|
21
|
Perez-Riverol Y. Proteomic repository data submission, dissemination, and reuse: key messages. Expert Rev Proteomics 2022; 19:297-310. [PMID: 36529941 PMCID: PMC7614296 DOI: 10.1080/14789450.2022.2160324] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 12/07/2022] [Indexed: 12/23/2022]
Abstract
INTRODUCTION The creation of ProteomeXchange data workflows in 2012 transformed the field of proteomics, consisting of the standardization of data submission and dissemination and enabling the widespread reanalysis of public MS proteomics data worldwide. ProteomeXchange has triggered a growing trend toward public dissemination of proteomics data, facilitating the assessment, reuse, comparative analyses, and extraction of new findings from public datasets. By 2022, the consortium is integrated by PRIDE, PeptideAtlas, MassIVE, jPOST, iProX, and Panorama Public. AREAS COVERED Here, we review and discuss the current ecosystem of resources, guidelines, and file formats for proteomics data dissemination and reanalysis. Special attention is drawn to new exciting quantitative and post-translational modification-oriented resources. The challenges and future directions on data depositions including the lack of metadata and cloud-based and high-performance software solutions for fast and reproducible reanalysis of the available data are discussed. EXPERT OPINION The success of ProteomeXchange and the amount of proteomics data available in the public domain have triggered the creation and/or growth of other protein knowledgebase resources. Data reuse is a leading, active, and evolving field; supporting the creation of new formats, tools, and workflows to rediscover and reshape the public proteomics data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
22
|
Hoffmann N, Mayer G, Has C, Kopczynski D, Al Machot F, Schwudke D, Ahrends R, Marcus K, Eisenacher M, Turewicz M. A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics. Metabolites 2022; 12:584. [PMID: 35888710 PMCID: PMC9319858 DOI: 10.3390/metabo12070584] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/17/2022] [Accepted: 06/19/2022] [Indexed: 12/13/2022] Open
Abstract
Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography-mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software tools. We further discuss how mzTab-M can serve as a bridge between data acquisition and lipid bioinformatics tools for interpretation, capturing their output and transmitting rich annotated data for downstream processing. However, we identified several challenges of currently available tools and standards. Potential areas for improvement were: adaptation of common nomenclature and standardized reporting to enable high throughput lipidomics and improve its data handling. Finally, we suggest specific areas where tools and repositories need to improve to become FAIRer.
Collapse
Affiliation(s)
- Nils Hoffmann
- Forschungszentrum Jülich GmbH, Institute for Bio- and Geosciences (IBG-5), 52425 Jülich, Germany
| | - Gerhard Mayer
- Institute of Medical Systems Biology, Ulm University, 89081 Ulm, Germany;
| | - Canan Has
- Biological Mass Spectrometry, Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany;
- University Hospital Carl Gustav Carus, 01307 Dresden, Germany
- CENTOGENE GmbH, 18055 Rostock, Germany
| | - Dominik Kopczynski
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Fadi Al Machot
- Faculty of Science and Technology, Norwegian University for Life Science (NMBU), 1433 Ås, Norway;
| | - Dominik Schwudke
- Bioanalytical Chemistry, Forschungszentrum Borstel, Leibniz Lung Center, 23845 Borstel, Germany;
- Airway Research Center North, German Center for Lung Research (DZL), 23845 Borstel, Germany
- German Center for Infection Research (DZIF), TTU Tuberculosis, 23845 Borstel, Germany
| | - Robert Ahrends
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Katrin Marcus
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
| | - Martin Eisenacher
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
- Faculty of Medicine, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Michael Turewicz
- Institute for Clinical Biochemistry and Pathobiochemistry, German Diabetes Center (DDZ), Leibniz Center for Diabetes Research at Heinrich-Heine-University Düsseldorf, 40225 Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Düsseldorf, 85764 Neuherberg, Germany
| |
Collapse
|
23
|
Saha D, Iannuccelli M, Brun C, Zanzoni A, Licata L. The Intricacy of the Viral-Human Protein Interaction Networks: Resources, Data, and Analyses. Front Microbiol 2022; 13:849781. [PMID: 35531299 PMCID: PMC9069133 DOI: 10.3389/fmicb.2022.849781] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 03/11/2022] [Indexed: 11/18/2022] Open
Abstract
Viral infections are one of the major causes of human diseases that cause yearly millions of deaths and seriously threaten global health, as we have experienced with the COVID-19 pandemic. Numerous approaches have been adopted to understand viral diseases and develop pharmacological treatments. Among them, the study of virus-host protein-protein interactions is a powerful strategy to comprehend the molecular mechanisms employed by the virus to infect the host cells and to interact with their components. Experimental protein-protein interactions described in the scientific literature have been systematically captured into several molecular interaction databases. These data are organized in structured formats and can be easily downloaded by users to perform further bioinformatic and network studies. Network analysis of available virus-host interactomes allow us to understand how the host interactome is perturbed upon viral infection and what are the key host proteins targeted by the virus and the main cellular pathways that are subverted. In this review, we give an overview of publicly available viral-human protein-protein interactions resources and the community standards, curation rules and adopted ontologies. A description of the main virus-human interactome available is provided, together with the main network analyses that have been performed. We finally discuss the main limitations and future challenges to assess the quality and reliability of protein-protein interaction datasets and resources.
Collapse
Affiliation(s)
- Deeya Saha
- Aix-Marseille Univ., Inserm, TAGC, UMR_S1090, Marseille, France
| | | | - Christine Brun
- Aix-Marseille Univ., Inserm, TAGC, UMR_S1090, Marseille, France
- CNRS, Marseille, France
| | - Andreas Zanzoni
- Aix-Marseille Univ., Inserm, TAGC, UMR_S1090, Marseille, France
- *Correspondence: Andreas Zanzoni,
| | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
- Luana Licata,
| |
Collapse
|
24
|
Elhabashy H, Merino F, Alva V, Kohlbacher O, Lupas AN. Exploring protein-protein interactions at the proteome level. Structure 2022; 30:462-475. [DOI: 10.1016/j.str.2022.02.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/26/2021] [Accepted: 02/02/2022] [Indexed: 02/08/2023]
|
25
|
Porras P, Orchard S, Licata L. IMEx Databases: Displaying Molecular Interactions into a Single, Standards-Compliant Dataset. Methods Mol Biol 2022; 2449:27-42. [PMID: 35507258 DOI: 10.1007/978-1-0716-2095-3_2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Molecular interaction databases aim to systematically capture and organize the experimental interaction information described in the scientific literature. These data can then be used to perform network analysis, to assign putative roles to uncharacterized proteins and to investigate their involvement in cellular pathways.This chapter gives a brief overview of publicly available molecular interaction databases and focuses on the members of the IMEx Consortium, on their curation policies and standard data formats. All of the goals achieved by IMEx databases over the last 15 years, the data types provided and the many different ways in which such data can be utilized by the research community, are described in detail. The IMEx databases curate molecular interaction data to the highest caliber, following a detailed curation model and supplying rich metadata by employing common curation rules and harmonized standards. The IMEx Consortium provides comprehensively annotated molecular interaction data integrated into a single, non-redundant, open access dataset.
Collapse
Affiliation(s)
- Pablo Porras
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, Rome, Italy.
| |
Collapse
|
26
|
Omenn GS, Lane L, Overall CM, Paik YK, Cristea IM, Corrales FJ, Lindskog C, Weintraub S, Roehrl MHA, Liu S, Bandeira N, Srivastava S, Chen YJ, Aebersold R, Moritz RL, Deutsch EW. Progress Identifying and Analyzing the Human Proteome: 2021 Metrics from the HUPO Human Proteome Project. J Proteome Res 2021; 20:5227-5240. [PMID: 34670092 PMCID: PMC9340669 DOI: 10.1021/acs.jproteome.1c00590] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The 2021 Metrics of the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 357 (92.8%) of the 19 778 predicted proteins coded in the human genome, a gain of 483 since 2020 from reports throughout the world reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 478 to 1421. This represents remarkable progress on the proteome parts list. The utilization of proteomics in a broad array of biological and clinical studies likewise continues to expand with many important findings and effective integration with other omics platforms. We present highlights from the Immunopeptidomics, Glycoproteomics, Infectious Disease, Cardiovascular, Musculo-Skeletal, Liver, and Cancers B/D-HPP teams and from the Knowledgebase, Mass Spectrometry, Antibody Profiling, and Pathology resource pillars, as well as ethical considerations important to the clinical utilization of proteomics and protein biomarkers.
Collapse
Affiliation(s)
- Gilbert S Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | | | - Young-Ki Paik
- Yonsei Proteome Research Center and Yonsei University, Seoul 03722, Korea
| | - Ileana M Cristea
- Princeton University, Princeton, New Jersey 08544, United States
| | | | | | - Susan Weintraub
- University of Texas Health, San Antonio, San Antonio, Texas 78229-3900, United States
| | - Michael H A Roehrl
- Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, California 92093, United States
| | | | - Yu-Ju Chen
- National Taiwan University, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Ruedi Aebersold
- ETH-Zurich and University of Zurich, 8092 Zurich, Switzerland
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
27
|
van Wijk KJ, Leppert T, Sun Q, Boguraev SS, Sun Z, Mendoza L, Deutsch EW. The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource. THE PLANT CELL 2021; 33:3421-3453. [PMID: 34411258 PMCID: PMC8566204 DOI: 10.1093/plcell/koab211] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 08/13/2021] [Indexed: 05/02/2023]
Abstract
We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.
Collapse
Affiliation(s)
- Klaas J van Wijk
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, USA
- Authors for correspondence: (K.J.V.W.), (E.W.D.)
| | - Tami Leppert
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Qi Sun
- Computational Biology Service Unit, Cornell University, Ithaca, New York 14853, USA
| | - Sascha S Boguraev
- Section of Plant Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, New York 14853, USA
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Luis Mendoza
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, USA
- Authors for correspondence: (K.J.V.W.), (E.W.D.)
| |
Collapse
|
28
|
Deutsch EW, Omenn GS, Sun Z, Maes M, Pernemalm M, Palaniappan KK, Letunica N, Vandenbrouck Y, Brun V, Tao SC, Yu X, Geyer PE, Ignjatovic V, Moritz RL, Schwenk JM. Advances and Utility of the Human Plasma Proteome. J Proteome Res 2021; 20:5241-5263. [PMID: 34672606 PMCID: PMC9469506 DOI: 10.1021/acs.jproteome.1c00657] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The study of proteins circulating in blood offers tremendous opportunities to diagnose, stratify, or possibly prevent diseases. With recent technological advances and the urgent need to understand the effects of COVID-19, the proteomic analysis of blood-derived serum and plasma has become even more important for studying human biology and pathophysiology. Here we provide views and perspectives about technological developments and possible clinical applications that use mass-spectrometry(MS)- or affinity-based methods. We discuss examples where plasma proteomics contributed valuable insights into SARS-CoV-2 infections, aging, and hemostasis and the opportunities offered by combining proteomics with genetic data. As a contribution to the Human Proteome Organization (HUPO) Human Plasma Proteome Project (HPPP), we present the Human Plasma PeptideAtlas build 2021-07 that comprises 4395 canonical and 1482 additional nonredundant human proteins detected in 240 MS-based experiments. In addition, we report the new Human Extracellular Vesicle PeptideAtlas 2021-06, which comprises five studies and 2757 canonical proteins detected in extracellular vesicles circulating in blood, of which 74% (2047) are in common with the plasma PeptideAtlas. Our overview summarizes the recent advances, impactful applications, and ongoing challenges for translating plasma proteomics into utility for precision medicine.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, Washington 98109, United States.,Departments of Computational Medicine & Bioinformatics, Internal Medicine, and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
| | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Michal Maes
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Maria Pernemalm
- Department of Oncology and Pathology/Science for Life Laboratory, Karolinska Institutet, 171 65 Stockholm, Sweden
| | | | - Natasha Letunica
- Murdoch Children's Research Institute, 50 Flemington Road, Parkville 3052, Victoria, Australia
| | - Yves Vandenbrouck
- Université Grenoble Alpes, CEA, Inserm U1292, Grenoble 38000, France
| | - Virginie Brun
- Université Grenoble Alpes, CEA, Inserm U1292, Grenoble 38000, France
| | - Sheng-Ce Tao
- Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, B207 SCSB Building, 800 Dongchuan Road, Shanghai 200240, China
| | - Xiaobo Yu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Philipp E Geyer
- OmicEra Diagnostics GmbH, Behringstr. 6, 82152 Planegg, Germany
| | - Vera Ignjatovic
- Murdoch Children's Research Institute, 50 Flemington Road, Parkville 3052, Victoria, Australia.,Department of Paediatrics, The University of Melbourne, 50 Flemington Road, Parkville 3052, Victoria, Australia
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jochen M Schwenk
- Affinity Proteomics, Science for Life Laboratory, Department of Protein Science, KTH Royal Institute of Technology, Tomtebodavägen 23, SE-171 65 Solna, Sweden
| |
Collapse
|
29
|
A proteomics sample metadata representation for multiomics integration and big data analysis. Nat Commun 2021; 12:5854. [PMID: 34615866 PMCID: PMC8494749 DOI: 10.1038/s41467-021-26111-3] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 09/16/2021] [Indexed: 11/08/2022] Open
Abstract
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.
Collapse
|
30
|
Touré V, Flobak Å, Niarakis A, Vercruysse S, Kuiper M. The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling. Brief Bioinform 2021; 22:bbaa390. [PMID: 33378765 PMCID: PMC8294520 DOI: 10.1093/bib/bbaa390] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 11/26/2020] [Accepted: 11/27/2020] [Indexed: 12/16/2022] Open
Abstract
Causal molecular interactions represent key building blocks used in computational modeling, where they facilitate the assembly of regulatory networks. Logical regulatory networks can be used to predict biological and cellular behaviors by system perturbations and in silico simulations. Today, broad sets of causal interactions are available in a variety of biological knowledge resources. However, different visions, based on distinct biological interests, have led to the development of multiple ways to describe and annotate causal molecular interactions. It can therefore be challenging to efficiently explore various resources of causal interaction and maintain an overview of recorded contextual information that ensures valid use of the data. This review lists the different types of public resources with causal interactions, the different views on biological processes that they represent, the various data formats they use for data representation and storage, and the data exchange and conversion procedures that are available to extract and download these interactions. This may further raise awareness among the targeted audience, i.e. logical modelers and other scientists interested in molecular causal interactions, but also database managers and curators, about the abundance and variety of causal molecular interaction data, and the variety of tools and approaches to convert them into one interoperable resource.
Collapse
Affiliation(s)
- Vasundra Touré
- Department of Biology of the Norwegian University of Science and Technology
| | | | - Anna Niarakis
- Department of Biology, Univ Evry, University of Paris-Saclay, affiliated with the laboratory GenHotel in Genopole campus, and a delegate at the Lifeware Group, INRIA Saclay
| | - Steven Vercruysse
- Researcher in computer science and computational biology and focuses on building a bridge between human and computer understanding
| | - Martin Kuiper
- systems biology at the Department of Biology of the Norwegian University of Science and Technology
| |
Collapse
|
31
|
Dorfer V, Strobl M, Winkler S, Mechtler K. MS Amanda 2.0: Advancements in the standalone implementation. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021; 35:e9088. [PMID: 33759252 PMCID: PMC8244010 DOI: 10.1002/rcm.9088] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 02/27/2021] [Accepted: 03/18/2021] [Indexed: 06/12/2023]
Abstract
RATIONALE Database search engines are the preferred method to identify peptides in mass spectrometry data. However, valuable software is in this context not only defined by a powerful algorithm to separate correct from false identifications, but also by constant maintenance and continuous improvements. METHODS In 2014, we presented our peptide identification algorithm MS Amanda, showing its suitability for identifying peptides in high-resolution tandem mass spectrometry data and its ability to outperform widely used tools to identify peptides. Since then, we have continuously worked on improvements to enhance its usability and to support new trends and developments in this fast-growing field, while keeping the original scoring algorithm to assess the quality of a peptide spectrum match unchanged. RESULTS We present the outcome of these efforts, MS Amanda 2.0, a faster and more flexible standalone version with the original scoring algorithm. The new implementation has led to a 3-5× speedup, is able to handle new ion types and supports standard data formats. We also show that MS Amanda 2.0 works best when using only the most common ion types in a particular search instead of all possible ion types. CONCLUSIONS MS Amanda is available free of charge from https://ms.imp.ac.at/index.php?action=msamanda.
Collapse
Affiliation(s)
- Viktoria Dorfer
- Bioinformatics Research GroupUniversity of Applied Sciences Upper AustriaSoftwarepark 11, 4232 HagenbergAustria
| | - Marina Strobl
- Bioinformatics Research GroupUniversity of Applied Sciences Upper AustriaSoftwarepark 11, 4232 HagenbergAustria
| | - Stephan Winkler
- Bioinformatics Research GroupUniversity of Applied Sciences Upper AustriaSoftwarepark 11, 4232 HagenbergAustria
| | - Karl Mechtler
- Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)Campus‐Vienna‐Biocenter 1Vienna1030Austria
- Institute of Molecular Biotechnology (IMBA)Austrian Academy of Sciences, Vienna BioCenter (VBC)Dr. Bohr‐Gasse 3Vienna1030Austria
- Gregor Mendel Institute (GMI)Austrian Academy of Sciences, Vienna BioCenter (VBC)Dr. Bohr‐ Gasse 3Vienna1030Austria
| |
Collapse
|
32
|
Touré V, Vercruysse S, Acencio ML, Lovering RC, Orchard S, Bradley G, Casals-Casas C, Chaouiya C, Del-Toro N, Flobak Å, Gaudet P, Hermjakob H, Hoyt CT, Licata L, Lægreid A, Mungall CJ, Niknejad A, Panni S, Perfetto L, Porras P, Pratt D, Saez-Rodriguez J, Thieffry D, Thomas PD, Türei D, Kuiper M. The Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). Bioinformatics 2021; 36:5712-5718. [PMID: 32637990 PMCID: PMC8023674 DOI: 10.1093/bioinformatics/btaa622] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 06/06/2020] [Accepted: 06/30/2020] [Indexed: 12/30/2022] Open
Abstract
Motivation A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called ‘causal interaction’ takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. Results Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. Availability and implementation The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vasundra Touré
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| | - Steven Vercruysse
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| | - Marcio Luis Acencio
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| | - Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, UCL, University College London, London WC1E 6JF, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Glyn Bradley
- Computational Biology, Functional Genomics, GSK, Stevenage SG1 2NY, UK
| | | | - Claudine Chaouiya
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M Marseille 13331, France
| | - Noemi Del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Åsmund Flobak
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway.,The Cancer Clinic, St. Olav's Hospital, Trondheim University Hospital, Trondheim 7030, Norway
| | - Pascale Gaudet
- SIB Swiss Institute of Bioinformatics, Geneva 1211, Switzerland
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anne Niknejad
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, Amphipole Building, 1015 Lausanne, Switzerland
| | - Simona Panni
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Ecology and Earth Science, Via Pietro Bucci Cubo 6/C, Rende 87036, CS, Italy
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Dexter Pratt
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Julio Saez-Rodriguez
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, 69120 Heidelberg, Germany.,Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Faculty of Medicine, RWTH Aachen University, Aachen 52062, Germany
| | - Denis Thieffry
- Institut de Biologie de l'ENS (IBENS), Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90007, USA
| | - Dénes Türei
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Faculty of Medicine, RWTH Aachen University, Aachen 52062, Germany
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| |
Collapse
|
33
|
Jagannadham MV, Gayatri P, Binny TM, Raman B, Kameshwari DB, Nagaraj R. Mass Spectral Analysis of Synthetic Peptides: Implications in Proteomics. J Biomol Tech 2021; 32:30-35. [PMID: 33953644 PMCID: PMC7704033 DOI: 10.7171/jbt.21-3201-001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Sequence determination of peptides is a crucial step in mass spectrometry-based proteomics. Peptide sequences are determined either by database search or by de novo sequencing using tandem mass spectrometry. Determination of all the theoretical expected peptide fragments and eliminating false discoveries remains a challenge in proteomics. Developing standards for evaluating the performance of mass spectrometers and algorithms used for identification of proteins is important for proteomics studies. The current study is focused on these aspects by using synthetic peptides. A total of 599 peptides were designed from in silico tryptic digest with 1 or 2 missed cleavages from 199 human proteins, and synthetic peptides corresponding to these sequences were obtained. The peptides were mixed together, and analysis was carried out using liquid chromatography-electrospray ionization tandem mass spectrometry on a Q-Exactive HF mass spectrometer. The peptides and proteins were identified with SEQUEST program. The analysis was carried out using the proteomics workflows. A total of 573 peptides representing 196 proteins could be identified, and a spectral library was created for these peptides. Analysis parameters such as "no enzyme selection" gave the maximum number of detected peptides as compared with trypsin in the selection. False discoveries could be identified. This study highlights the limitations of peptide detection and the need for developing powerful algorithms along with tools to evaluate mass spectrometers and algorithms. It also shows the limitations of peptide detection even with high-end mass spectrometers. The mass spectral data are available in ProteomeXchange with accession no. PXD017992.
Collapse
Affiliation(s)
| | - Pratap Gayatri
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Taniya Mary Binny
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Bathisaran Raman
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | | | | |
Collapse
|
34
|
Caufield JH, Fu J, Wang D, Guevara-Gonzalez V, Wang W, Ping P. A Second Look at FAIR in Proteomic Investigations. J Proteome Res 2021; 20:2182-2186. [PMID: 33719446 DOI: 10.1021/acs.jproteome.1c00177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Proteomics is, by definition, comprehensive and large-scale, seeking to unravel ome-level protein features with phenotypic information on an entire system, an organ, cells, or organisms. This scope consistently involves and extends beyond single experiments. Multitudinous resources now exist to assist in making the results of proteomics experiments more findable, accessible, interoperable, and reusable (FAIR), yet many tools are awaiting to be adopted by our community. Here we highlight strategies for expanding the impact of proteomics data beyond single studies. We show how linking specific terminologies, identifiers, and text (words) can unify individual data points across a wide spectrum of studies and, more importantly, how this approach may potentially reveal novel relationships. In this effort, we explain how data sets and methods can be rendered more linkable and how this maximizes their value. We also include a discussion on how data linking strategies benefit stakeholders across the proteomics community and beyond.
Collapse
|
35
|
Watanabe Y, Aoki-Kinoshita KF, Ishihama Y, Okuda S. GlycoPOST realizes FAIR principles for glycomics mass spectrometry data. Nucleic Acids Res 2021; 49:D1523-D1528. [PMID: 33174597 PMCID: PMC7778884 DOI: 10.1093/nar/gkaa1012] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 10/13/2020] [Accepted: 10/14/2020] [Indexed: 02/06/2023] Open
Abstract
For the reproducibility and sustainability of scientific research, FAIRness (Findable, Accessible, Interoperable and Re-usable), with respect to the release of raw data obtained by researchers, is one of the most important principles underpinning the future of open science. In genomics and transcriptomics, the sharing of raw data from next-generation sequencers is made possible through public repositories. In addition, in proteomics, the deposition of raw data from mass spectrometry (MS) experiments into repositories is becoming standardized. However, a standard repository for such MS data had not yet been established in glycomics. With the increasing number of glycomics MS data, therefore, we have developed GlycoPOST (https://glycopost.glycosmos.org/), a repository for raw MS data generated from glycomics experiments. In just the first year since the release of GlycoPOST, 73 projects have already been registered by researchers around the world, and the number of registered projects is continuously growing, making a significant contribution to the future FAIRness of the glycomics field. GlycoPOST is a free resource to the community and accepts (and will continue to accept in the future) raw data regardless of vendor-specific formats.
Collapse
Affiliation(s)
- Yu Watanabe
- Division of Bioinformatics, Niigata University Graduate School of Medical and Dental Sciences, 1-757 Asahimachi-dori, Chuo-ku, Niigata 951-8510, Japan
| | - Kiyoko F Aoki-Kinoshita
- Faculty of Science and Engineering, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Yasushi Ishihama
- Department of Molecular and Cellular BioAnalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan.,Department of Proteomics and Drug Discovery, Graduate School of Pharmaceutical Sciences, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Shujiro Okuda
- Division of Bioinformatics, Niigata University Graduate School of Medical and Dental Sciences, 1-757 Asahimachi-dori, Chuo-ku, Niigata 951-8510, Japan
| |
Collapse
|
36
|
Jagannadham MV, Gayatri P, Binny TM, Raman B, Kameshwari DB, Nagaraj R. Mass Spectral Analysis of Synthetic Peptides: Implications in Proteomics. J Biomol Tech 2020:jbt.2020-3201-001. [PMID: 33304200 PMCID: PMC7704033 DOI: 10.7171/jbt.2020-3201-001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Abstract
Sequence determination of peptides is a crucial step in mass spectrometry-based proteomics. Peptide sequences are determined either by database search or by de novo sequencing using tandem mass spectrometry. Determination of all the theoretical expected peptide fragments and eliminating false discoveries remains a challenge in proteomics. Developing standards for evaluating the performance of mass spectrometers and algorithms used for identification of proteins is important for proteomics studies. The current study is focused on these aspects by using synthetic peptides. A total of 599 peptides were designed from in silico tryptic digest with 1 or 2 missed cleavages from 199 human proteins, and synthetic peptides corresponding to these sequences were obtained. The peptides were mixed together, and analysis was carried out using liquid chromatography-electrospray ionization tandem mass spectrometry on a Q-Exactive HF mass spectrometer. The peptides and proteins were identified with SEQUEST program. The analysis was carried out using the proteomics workflows. A total of 573 peptides representing 196 proteins could be identified, and a spectral library was created for these peptides. Analysis parameters such as "no enzyme selection" gave the maximum number of detected peptides as compared with trypsin in the selection. False discoveries could be identified. This study highlights the limitations of peptide detection and the need for developing powerful algorithms along with tools to evaluate mass spectrometers and algorithms. It also shows the limitations of peptide detection even with high-end mass spectrometers. The mass spectral data are available in ProteomeXchange with accession no. PXD017992.
Collapse
Affiliation(s)
| | - Pratap Gayatri
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007,
India
| | - Taniya Mary Binny
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007,
India
| | - Bathisaran Raman
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007,
India
| | | | | |
Collapse
|
37
|
Toward Increased Reliability, Transparency, and Accessibility in Cross-linking Mass Spectrometry. Structure 2020; 28:1259-1268. [PMID: 33065067 DOI: 10.1016/j.str.2020.09.011] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 09/02/2020] [Accepted: 09/24/2020] [Indexed: 01/09/2023]
Abstract
Cross-linking mass spectrometry (MS) has substantially matured as a method over the past 2 decades through parallel development in multiple labs, demonstrating its applicability to protein structure determination, conformation analysis, and mapping protein interactions in complex mixtures. Cross-linking MS has become a much-appreciated and routinely applied tool, especially in structural biology. Therefore, it is timely that the community commits to the development of methodological and reporting standards. This white paper builds on an open process comprising a number of events at community conferences since 2015 and identifies aspects of Cross-linking MS for which guidelines should be developed as part of a Cross-linking MS standards initiative.
Collapse
|
38
|
Abstract
Metadata is essential in proteomics data repositories and is crucial to interpret and reanalyze the deposited data sets. For every proteomics data set, we should capture at least three levels of metadata: (i) data set description, (ii) the sample to data files related information, and (iii) standard data file formats (e.g., mzIdentML, mzML, or mzTab). While the data set description and standard data file formats are supported by all ProteomeXchange partners, the information regarding the sample to data files is mostly missing. Recently, members of the European Bioinformatics Community for Mass Spectrometry (EuBIC) have created an open-source project called Sample to Data file format for Proteomics (https://github.com/bigbio/proteomics-metadata-standard/) to enable the standardization of sample metadata of public proteomics data sets. Here, the project is presented to the proteomics community, and we call for contributors, including researchers, journals, and consortiums to provide feedback about the format. We believe this work will improve reproducibility and facilitate the development of new tools dedicated to proteomics data analysis.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | | |
Collapse
|
39
|
Choong WK, Wang JH, Sung TY. MinProtMaxVP: Generating a minimized number of protein variant sequences containing all possible variant peptides for proteogenomic analysis. J Proteomics 2020; 223:103819. [PMID: 32407886 DOI: 10.1016/j.jprot.2020.103819] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 05/04/2020] [Accepted: 05/09/2020] [Indexed: 12/12/2022]
Abstract
Identifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at the protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotations into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs. However, they may neglect the possibility of SAV combinations, e.g., haplotypes, existing in bio-samples. Therefore, it is necessary to consider all SAV combinations of a protein when generating SAV-harboring protein sequences. In this paper, we propose MinProtMaxVP, a novel approach which selects a minimized number of SAV-harboring protein sequences generated from the exhaustive approach, while still accommodating all possible variant peptides, by solving a classic set covering problem. Our study on known haplotype variations of TAS2R38 justifies the necessity for MinProtMaxVP to consider all combinations of SAVs. The performance of MinProtMaxVP is demonstrated by an in silico study on OR2T27 with five SAVs and real experimental data of the HEK293 cell line. Furthermore, assuming simulated somatic and germline variants of OR2T27 in tumor and normal tissues demonstrates that when adopting the appropriate somatic and germline SAV integration strategy, MinProtMaxVP is adaptable to labeling and label-free mass spectrometry-based experiments. SIGNIFICANCE: We present MinProtMaxVP, a novel approach to generate SAV-harboring protein sequences for constructing a customized protein sequence database, which is used in database searching for variant peptide identification. This approach outperforms the existing approaches in generating all possible variant peptides to be included in protein sequences and possibly leading to identification of more variant peptides in proteogenomic analysis.
Collapse
Affiliation(s)
- Wai-Kok Choong
- Institute of Information Science, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Jen-Hung Wang
- Institute of Information Science, Academia Sinica, Nankang, Taipei 11529, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, Nankang, Taipei 11529, Taiwan.
| |
Collapse
|
40
|
Perfetto L, Acencio ML, Bradley G, Cesareni G, Del Toro N, Fazekas D, Hermjakob H, Korcsmaros T, Kuiper M, Lægreid A, Lo Surdo P, Lovering RC, Orchard S, Porras P, Thomas PD, Touré V, Zobolas J, Licata L. CausalTAB: the PSI-MITAB 2.8 updated format for signalling data representation and dissemination. Bioinformatics 2020; 35:3779-3785. [PMID: 30793173 DOI: 10.1093/bioinformatics/btz132] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 02/01/2019] [Accepted: 02/19/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Combining multiple layers of information underlying biological complexity into a structured framework represent a challenge in systems biology. A key task is the formalization of such information in models describing how biological entities interact to mediate the response to external and internal signals. Several databases with signalling information, focus on capturing, organizing and displaying signalling interactions by representing them as binary, causal relationships between biological entities. The curation efforts that build these individual databases demand a concerted effort to ensure interoperability among resources. RESULTS Aware of the enormous benefits of standardization efforts in the molecular interaction research field, representatives of the signalling network community agreed to extend the PSI-MI controlled vocabulary to include additional terms representing aspects of causal interactions. Here, we present a common standard for the representation and dissemination of signalling information: the PSI Causal Interaction tabular format (CausalTAB) which is an extension of the existing PSI-MI tab-delimited format, now designated PSI-MITAB 2.8. We define the new term 'causal interaction', and related child terms, which are children of the PSI-MI 'molecular interaction' term. The new vocabulary terms in this extended PSI-MI format will enable systems biologists to model large-scale signalling networks more precisely and with higher coverage than before. AVAILABILITY AND IMPLEMENTATION PSI-MITAB 2.8 format and the new reference implementation of PSICQUIC are available online (https://psicquic.github.io/ and https://psicquic.github.io/MITAB28Format.html). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- L Perfetto
- Department of Biology, University of Rome Tor Vergata, Rome, Italy.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - M L Acencio
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - G Bradley
- Computational Biology and Statistics, Target Sciences, GSK, UK
| | - G Cesareni
- Department of Biology, University of Rome Tor Vergata, Rome, Italy.,IRCCS, Fondazione Santa Lucia, Rome, Italy
| | - N Del Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - D Fazekas
- Department of Genetics, Eötvös Loránd University, Budapest, Hungary.,Earlham Institute, Norwich, UK
| | - H Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.,State Key Laboratory of Proteomics, Beijing Institute of Life Omics, Beijing Proteome Research Center, National Center for Protein Sciences, Beijing, China
| | - T Korcsmaros
- Earlham Institute, Norwich, UK.,Quadram Institute, Norwich, UK
| | - M Kuiper
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - A Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - P Lo Surdo
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
| | - R C Lovering
- Department of Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London, UK
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - P Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - P D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA, USA
| | - V Touré
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - J Zobolas
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - L Licata
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
41
|
Aron AT, Gentry EC, McPhail KL, Nothias LF, Nothias-Esposito M, Bouslimani A, Petras D, Gauglitz JM, Sikora N, Vargas F, van der Hooft JJJ, Ernst M, Kang KB, Aceves CM, Caraballo-Rodríguez AM, Koester I, Weldon KC, Bertrand S, Roullier C, Sun K, Tehan RM, Boya P CA, Christian MH, Gutiérrez M, Ulloa AM, Tejeda Mora JA, Mojica-Flores R, Lakey-Beitia J, Vásquez-Chaves V, Zhang Y, Calderón AI, Tayler N, Keyzers RA, Tugizimana F, Ndlovu N, Aksenov AA, Jarmusch AK, Schmid R, Truman AW, Bandeira N, Wang M, Dorrestein PC. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat Protoc 2020; 15:1954-1991. [PMID: 32405051 DOI: 10.1038/s41596-020-0317-5] [Citation(s) in RCA: 376] [Impact Index Per Article: 75.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 03/03/2020] [Indexed: 02/06/2023]
Abstract
Global Natural Product Social Molecular Networking (GNPS) is an interactive online small molecule-focused tandem mass spectrometry (MS2) data curation and analysis infrastructure. It is intended to provide as much chemical insight as possible into an untargeted MS2 dataset and to connect this chemical insight to the user's underlying biological questions. This can be performed within one liquid chromatography (LC)-MS2 experiment or at the repository scale. GNPS-MassIVE is a public data repository for untargeted MS2 data with sample information (metadata) and annotated MS2 spectra. These publicly accessible data can be annotated and updated with the GNPS infrastructure keeping a continuous record of all changes. This knowledge is disseminated across all public data; it is a living dataset. Molecular networking-one of the main analysis tools used within the GNPS platform-creates a structured data table that reflects the molecular diversity captured in tandem mass spectrometry experiments by computing the relationships of the MS2 spectra as spectral similarity. This protocol provides step-by-step instructions for creating reproducible, high-quality molecular networks. For training purposes, the reader is led through a 90- to 120-min procedure that starts by recalling an example public dataset and its sample information and proceeds to creating and interpreting a molecular network. Each data analysis job can be shared or cloned to disseminate the knowledge gained, thus propagating information that can lead to the discovery of molecules, metabolic pathways, and ecosystem/community interactions.
Collapse
Affiliation(s)
- Allegra T Aron
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Emily C Gentry
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kerry L McPhail
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, USA
| | - Louis-Félix Nothias
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Mélissa Nothias-Esposito
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Amina Bouslimani
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Daniel Petras
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Julia M Gauglitz
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Nicole Sikora
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Fernando Vargas
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | | | - Madeleine Ernst
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Kyo Bin Kang
- College of Pharmacy, Sookmyung Women's University, Seoul, Korea
| | - Christine M Aceves
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | | | - Irina Koester
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Kelly C Weldon
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Center of Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Samuel Bertrand
- Groupe Mer, Molécules, Santé-EA 2160, UFR des Sciences Pharmaceutiques et Biologiques, Université de Nantes, Nantes, France
- ThalassOMICS Metabolomics Facility, Plateforme Corsaire, Biogenouest, Nantes, France
| | - Catherine Roullier
- College of Pharmacy, Sookmyung Women's University, Seoul, Korea
- ThalassOMICS Metabolomics Facility, Plateforme Corsaire, Biogenouest, Nantes, France
| | - Kunyang Sun
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Richard M Tehan
- Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, USA
| | - Cristopher A Boya P
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
- Department of Biotechnology, Acharya Nagarjuna University, Guntur, Nagarjuna Nagar, India
| | - Martin H Christian
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
| | - Marcelino Gutiérrez
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
| | | | | | - Randy Mojica-Flores
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
- Departamento de Química, Universidad Autónoma de Chiriquí (UNACHI), David, Chiriquí, Panama
| | - Johant Lakey-Beitia
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
| | - Victor Vásquez-Chaves
- Centro de Investigaciones en Productos Naturales (CIPRONA), Universidad de Costa Rica, San José, Costa Rica
| | - Yilue Zhang
- Department of Drug Discovery and Development, Harrison School of Pharmacy, Auburn University, Auburn, AL, USA
| | - Angela I Calderón
- Department of Drug Discovery and Development, Harrison School of Pharmacy, Auburn University, Auburn, AL, USA
| | - Nicole Tayler
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Panama City, Panama
- Department of Biotechnology, Acharya Nagarjuna University, Guntur, Nagarjuna Nagar, India
| | - Robert A Keyzers
- School of Chemical & Physical Sciences, Victoria University of Wellington, Wellington, New Zealand
| | - Fidele Tugizimana
- Centre for Plant Metabolomics Research, Department of Biochemistry, University of Johannesburg, Auckland Park, South Africa
- International R&D Division, Omnia Group (Pty) Ltd., Johannesburg, South Africa
| | - Nombuso Ndlovu
- Centre for Plant Metabolomics Research, Department of Biochemistry, University of Johannesburg, Auckland Park, South Africa
| | - Alexander A Aksenov
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Alan K Jarmusch
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Robin Schmid
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Andrew W Truman
- Department of Molecular Microbiology, John Innes Centre, Norwich, UK
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
| | - Mingxun Wang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA.
| | - Pieter C Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA.
- Center for Computational Mass Spectrometry, University of California, San Diego, La Jolla, CA, USA.
- Department of Pharmacology, University of California, San Diego, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
42
|
Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, Kundu DJ, Inuganti A, Griss J, Mayer G, Eisenacher M, Pérez E, Uszkoreit J, Pfeuffer J, Sachsenberg T, Yilmaz S, Tiwary S, Cox J, Audain E, Walzer M, Jarnuczak AF, Ternent T, Brazma A, Vizcaíno JA. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 2020; 47:D442-D450. [PMID: 30395289 PMCID: PMC6323896 DOI: 10.1093/nar/gky1106] [Citation(s) in RCA: 5314] [Impact Index Per Article: 1062.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Accepted: 10/22/2018] [Indexed: 02/06/2023] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Manuel Bernal-Llinares
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Avinash Inuganti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.,Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Vienna, 1090, Austria
| | - Gerhard Mayer
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
| | - Enrique Pérez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julian Uszkoreit
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
| | - Julianus Pfeuffer
- Applied Bioinformatics, Department for Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Germany
| | - Timo Sachsenberg
- Applied Bioinformatics, Department for Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Germany
| | - Sule Yilmaz
- Computational Systems Biochemistry, Max Planck Institute for Biochemistry, Martinsried, 82152, Germany
| | - Shivani Tiwary
- Computational Systems Biochemistry, Max Planck Institute for Biochemistry, Martinsried, 82152, Germany
| | - Jürgen Cox
- Computational Systems Biochemistry, Max Planck Institute for Biochemistry, Martinsried, 82152, Germany
| | - Enrique Audain
- Department of Congenital Heart Disease and Pediatric Cardiology, Universitätsklinikum Schleswig-Holstein Kiel, Kiel, 24105, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew F Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
43
|
Ma J, Chen T, Wu S, Yang C, Bai M, Shu K, Li K, Zhang G, Jin Z, He F, Hermjakob H, Zhu Y. iProX: an integrated proteome resource. Nucleic Acids Res 2020; 47:D1211-D1217. [PMID: 30252093 PMCID: PMC6323926 DOI: 10.1093/nar/gky869] [Citation(s) in RCA: 1156] [Impact Index Per Article: 231.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 09/14/2018] [Indexed: 11/13/2022] Open
Abstract
Sharing of research data in public repositories has become best practice in academia. With the accumulation of massive data, network bandwidth and storage requirements are rapidly increasing. The ProteomeXchange (PX) consortium implements a mode of centralized metadata and distributed raw data management, which promotes effective data sharing. To facilitate open access of proteome data worldwide, we have developed the integrated proteome resource iProX (http://www.iprox.org) as a public platform for collecting and sharing raw data, analysis results and metadata obtained from proteomics experiments. The iProX repository employs a web-based proteome data submission process and open sharing of mass spectrometry-based proteomics datasets. Also, it deploys extensive controlled vocabularies and ontologies to annotate proteomics datasets. Users can use a GUI to provide and access data through a fast Aspera-based transfer tool. iProX is a full member of the PX consortium; all released datasets are freely accessible to the public. iProX is based on a high availability architecture and has been deployed as part of the proteomics infrastructure of China, ensuring long-term and stable resource support. iProX will facilitate worldwide data analysis and sharing of proteomics experiments.
Collapse
Affiliation(s)
- Jie Ma
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Tao Chen
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Songfeng Wu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Chunyuan Yang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Mingze Bai
- Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Kunxian Shu
- Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Kenli Li
- National Supercomputing Center in Changsha, Hunan University, Changsha 410082, China
| | - Guoqing Zhang
- Shanghai Center for Bioinformation Technology, Shanghai Institutes of Biomedicine, Shanghai Academy of Science and Technology, Shanghai 200235, China
| | - Zhong Jin
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Henning Hermjakob
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| |
Collapse
|
44
|
Perez-Riverol Y, Moreno P. Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines. Proteomics 2020; 20:e1900147. [PMID: 31657527 PMCID: PMC7613303 DOI: 10.1002/pmic.201900147] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/30/2019] [Indexed: 12/29/2022]
Abstract
The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single-tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large-scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large-scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
45
|
Creydt M, Fischer M. Food authentication in real life: How to link nontargeted approaches with routine analytics? Electrophoresis 2020; 41:1665-1679. [PMID: 32249434 DOI: 10.1002/elps.202000030] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 03/19/2020] [Accepted: 03/23/2020] [Indexed: 12/20/2022]
Abstract
In times of increasing globalization and the resulting complexity of trade flows, securing food quality is an increasing challenge. The development of analytical methods for checking the integrity and, thus, the safety of food is one of the central questions for actors from science, politics, and industry. Targeted methods, for the detection of a few selected analytes, still play the most important role in routine analysis. In the past 5 years, nontargeted methods that do not aim at individual analytes but on analyte profiles that are as comprehensive as possible have increasingly come into focus. Instead of investigating individual chemical structures, data patterns are collected, evaluated and, depending on the problem, fed into databases that can be used for further nontargeted approaches. Alternatively, individual markers can be extracted and transferred to targeted methods. Such an approach requires (i) the availability of authentic reference material, (ii) the corresponding high-resolution laboratory infrastructure, and (iii) extensive expertise in processing and storing very large amounts of data. Probably due to the requirements mentioned above, only a few methods have really established themselves in routine analysis. This review article focuses on the establishment of nontargeted methods in routine laboratories. Challenges are summarized and possible solutions are presented.
Collapse
Affiliation(s)
- Marina Creydt
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Hamburg, Germany
| | - Markus Fischer
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Hamburg, Germany
| |
Collapse
|
46
|
Soh WT, Demir F, Dall E, Perrar A, Dahms SO, Kuppusamy M, Brandstetter H, Huesgen PF. ExteNDing Proteome Coverage with Legumain as a Highly Specific Digestion Protease. Anal Chem 2020; 92:2961-2971. [PMID: 31951383 PMCID: PMC7075662 DOI: 10.1021/acs.analchem.9b03604] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
![]()
Bottom-up
mass spectrometry-based proteomics utilizes proteolytic
enzymes with well characterized specificities to generate peptides
amenable for identification by high-throughput tandem mass spectrometry.
Trypsin, which cuts specifically after the basic residues lysine and
arginine, is the predominant enzyme used for proteome digestion, although
proteases with alternative specificities are required to detect sequences
that are not accessible after tryptic digest. Here, we show that the
human cysteine protease legumain exhibits a strict substrate specificity
for cleavage after asparagine and aspartic acid residues during in-solution
digestions of proteomes extracted from Escherichia
coli, mouse embryonic fibroblast cell cultures, and Arabidopsis thaliana leaves. Generating peptides
highly complementary in sequence, yet similar in their biophysical
properties, legumain (as compared to trypsin or GluC) enabled complementary
proteome and protein sequence coverage. Importantly, legumain further
enabled the identification and enrichment of protein N-termini not
accessible in GluC- or trypsin-digested samples. Legumain cannot cleave
after glycosylated Asn residues, which enabled the robust identification
and orthogonal validation of N-glycosylation sites based on alternating
sequential sample treatments with legumain and PNGaseF and vice versa.
Taken together, we demonstrate that legumain is a practical, efficient
protease for extending the proteome and sequence coverage achieved
with trypsin, with unique possibilities for the characterization of
post-translational modification sites.
Collapse
Affiliation(s)
- Wai Tuck Soh
- Department of Biosciences , University of Salzburg , 5020 Salzburg , Austria
| | - Fatih Demir
- Central Institute for Engineering, Electronics and Analytics, ZEA-3 , Forschungszentrum Jülich , 52428 Jülich , Germany
| | - Elfriede Dall
- Department of Biosciences , University of Salzburg , 5020 Salzburg , Austria
| | - Andreas Perrar
- Central Institute for Engineering, Electronics and Analytics, ZEA-3 , Forschungszentrum Jülich , 52428 Jülich , Germany
| | - Sven O Dahms
- Department of Biosciences , University of Salzburg , 5020 Salzburg , Austria
| | - Maithreyan Kuppusamy
- Central Institute for Engineering, Electronics and Analytics, ZEA-3 , Forschungszentrum Jülich , 52428 Jülich , Germany
| | - Hans Brandstetter
- Department of Biosciences , University of Salzburg , 5020 Salzburg , Austria
| | - Pitter F Huesgen
- Central Institute for Engineering, Electronics and Analytics, ZEA-3 , Forschungszentrum Jülich , 52428 Jülich , Germany.,Cologne Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases, Medical Faculty and University Hospital , University of Cologne , 50931 Cologne , Germany.,Institute for Biochemistry, Faculty of Mathematics and Natural Sciences , University of Cologne , 50674 Cologne , Germany
| |
Collapse
|
47
|
Deutsch EW, Bandeira N, Sharma V, Perez-Riverol Y, Carver JJ, Kundu DJ, García-Seisdedos D, Jarnuczak AF, Hewapathirana S, Pullman BS, Wertz J, Sun Z, Kawano S, Okuda S, Watanabe Y, Hermjakob H, MacLean B, MacCoss MJ, Zhu Y, Ishihama Y, Vizcaíno JA. The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics. Nucleic Acids Res 2020; 48:D1145-D1152. [PMID: 31686107 PMCID: PMC7145525 DOI: 10.1093/nar/gkz984] [Citation(s) in RCA: 352] [Impact Index Per Article: 70.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 10/11/2019] [Accepted: 10/14/2019] [Indexed: 11/24/2022] Open
Abstract
The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the proteomics field is now actively embracing public open data policies. At the end of June 2019, more than 14 100 datasets had been submitted to PX resources since 2012, and from those, more than 9 500 in just the last three years. In parallel, an unprecedented increase of data re-use activities in the field, including 'big data' approaches, is enabling novel research and new data resources. At last, we also outline some of our future plans for the coming years.
Collapse
Affiliation(s)
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | | | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jeremy J Carver
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - David García-Seisdedos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Andrew F Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Benjamin S Pullman
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Julie Wertz
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Zhi Sun
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Shin Kawano
- Faculty of Contemporary Society, Toyama University of International Studies, Toyama 930–1292, Japan
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Chiba 277–0871, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951–8510, Japan
| | - Yu Watanabe
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951–8510, Japan
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | | | | | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606–8501, Japan
| | - Juan A Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
48
|
Bittremieux W. spectrum_utils: A Python Package for Mass Spectrometry Data Processing and Visualization. Anal Chem 2019; 92:659-661. [PMID: 31809021 DOI: 10.1021/acs.analchem.9b04884] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Given the wide diversity in applications of biological mass spectrometry, custom data analyses are often needed to fully interpret the results of an experiment. Such bioinformatics scripts necessarily include similar basic functionality to read mass spectral data from standard file formats, process it, and visualize it. Rather than having to reimplement this functionality, to facilitate this task, spectrum_utils is a Python package for mass spectrometry data processing and visualization. Its high-level functionality enables developers to quickly prototype ideas for computational mass spectrometry projects in only a few lines of code. Notably, the data processing functionality is highly optimized for computational efficiency to be able to deal with the large volumes of data that are generated during mass spectrometry experiments. The visualization functionality makes it possible to easily produce publication-quality figures as well as interactive spectrum plots for inclusion on web pages. spectrum_utils is available for Python 3.6+, includes extensive online documentation and examples, and can be easily installed using conda. It is freely available as open source under the Apache 2.0 license at https://github.com/bittremieux/spectrum_utils .
Collapse
Affiliation(s)
- Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences , University of California San Diego , La Jolla , California 92093 , United States.,Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium.,Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
| |
Collapse
|
49
|
Lee SY, Hwang H, Kang YM, Kim H, Kim DG, Jeong JE, Kim JY, Yoo JS. SAAVpedia: Identification, Functional Annotation, and Retrieval of Single Amino Acid Variants for Proteogenomic Interpretation. J Proteome Res 2019; 18:4133-4142. [PMID: 31612721 DOI: 10.1021/acs.jproteome.9b00366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Next-generation genome sequencing has enabled the discovery of numerous disease- or drug-response-associated nonsynonymous single nucleotide variants (nsSNVs) that alter the amino acid sequences of a protein. Although several studies have attempted to characterize pathogenic nsSNVs, few have been confirmed as single amino acid variants (SAAVs) at the protein level. Here we developed the SAAVpedia platform to identify, annotate, and retrieve pathogenic SAAV candidates from proteomic and genomic data. The platform consists of four modules: SAAVidentifier, SAAVannotator, SNV/SAAVretriever, and SAAVvisualizer. The SAAVidentifier provides a reference database containing 18 206 090 SAAVs and performs the identification and quality assessment of SAAVs. The SAAVannotator provides functional annotation with biological, clinical, and pharmacological information for the interpretation of condition-specific SAAVs. The SNV/SAAVretriever module enables bidirectional navigation between relevant SAAVs and nsSNVs with diverse genomic and proteomic data. SAAVvisualizer provides various statistical plots based on functional annotations of detected SAAVs. To demonstrate the utility of SAAVpedia, the proteogenomic pipeline with protein-protein interaction network analysis was applied to proteomic data from breast cancer and glioblastoma patients. We identified 1326 and 12 breast-cancer- and glioblastoma-related genes that contained one or more SAAVs, including BRCA2 and FAM49B, respectively. SAAVpedia is a suitable platform for confirming whether a genomic variant is maintained in an amino acid sequence. Furthermore, as a result of the SAAV discovery of these positive controls, the SAAVpedia could play a key role in the protein functional study for the Human Proteome Project (HPP).
Collapse
Affiliation(s)
- Soo Youn Lee
- Research Center for Bioconvergence Analysis , Korea Basic Science Institute , 162 Yeongudaji-ro , Cheongju 28119 , Korea
| | - Heeyoun Hwang
- Research Center for Bioconvergence Analysis , Korea Basic Science Institute , 162 Yeongudaji-ro , Cheongju 28119 , Korea
| | - Young-Mook Kang
- Drug Information Platform Center , Korea Research Institute of Chemical Technology , 141 Gajeong-ro , Daejeon 34114 , Korea
| | - Hyejin Kim
- Research Center for Bioconvergence Analysis , Korea Basic Science Institute , 162 Yeongudaji-ro , Cheongju 28119 , Korea
- Graduate School of Analytical Science and Technology , Chungnam National University , 99 Daehak-ro , Daejeon 34134 , Korea
| | - Dong Geun Kim
- Research Center for Bioconvergence Analysis , Korea Basic Science Institute , 162 Yeongudaji-ro , Cheongju 28119 , Korea
- Graduate School of Analytical Science and Technology , Chungnam National University , 99 Daehak-ro , Daejeon 34134 , Korea
| | - Ji Eun Jeong
- Research Center for Bioconvergence Analysis , Korea Basic Science Institute , 162 Yeongudaji-ro , Cheongju 28119 , Korea
- Graduate School of Analytical Science and Technology , Chungnam National University , 99 Daehak-ro , Daejeon 34134 , Korea
| | - Jin Young Kim
- Research Center for Bioconvergence Analysis , Korea Basic Science Institute , 162 Yeongudaji-ro , Cheongju 28119 , Korea
| | - Jong Shin Yoo
- Research Center for Bioconvergence Analysis , Korea Basic Science Institute , 162 Yeongudaji-ro , Cheongju 28119 , Korea
- Graduate School of Analytical Science and Technology , Chungnam National University , 99 Daehak-ro , Daejeon 34134 , Korea
| |
Collapse
|
50
|
Deutsch EW, Lane L, Overall CM, Bandeira N, Baker MS, Pineau C, Moritz RL, Corrales F, Orchard S, Van Eyk JE, Paik YK, Weintraub ST, Vandenbrouck Y, Omenn GS. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0. J Proteome Res 2019; 18:4108-4116. [PMID: 31599596 PMCID: PMC6986310 DOI: 10.1021/acs.jproteome.9b00542] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Human Proteome Organization's (HUPO) Human Proteome Project (HPP) developed Mass Spectrometry (MS) Data Interpretation Guidelines that have been applied since 2016. These guidelines have helped ensure that the emerging draft of the complete human proteome is highly accurate and with low numbers of false-positive protein identifications. Here, we describe an update to these guidelines based on consensus-reaching discussions with the wider HPP community over the past year. The revised 3.0 guidelines address several major and minor identified gaps. We have added guidelines for emerging data independent acquisition (DIA) MS workflows and for use of the new Universal Spectrum Identifier (USI) system being developed by the HUPO Proteomics Standards Initiative (PSI). In addition, we discuss updates to the standard HPP pipeline for collecting MS evidence for all proteins in the HPP, including refinements to minimum evidence. We present a new plan for incorporating MassIVE-KB into the HPP pipeline for the next (HPP 2020) cycle in order to obtain more comprehensive coverage of public MS data sets. The main checklist has been reorganized under headings and subitems, and related guidelines have been grouped. In sum, Version 2.1 of the HPP MS Data Interpretation Guidelines has served well, and this timely update to version 3.0 will aid the HPP as it approaches its goal of collecting and curating MS evidence of translation and expression for all predicted ∼20 000 human proteins encoded by the human genome.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, CMU, Michel Servet 1, 1211 Geneva 4, Switzerland
| | - Christopher M. Overall
- Centre for Blood Research, Departments of Oral Biological & Medical Sciences, and Biochemistry & Molecular Biology, Faculty of Dentistry, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry and Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Mark S. Baker
- Department of Biomedical Sciences, Faculty of Medicine and Health Science, Macquarie University, Macquarie Park, NSW 2109, Australia
| | - Charles Pineau
- Univ. Rennes, Inserm, EHESP, Irset (Institut de Recherche en Santé, Environnement et Travail) − UMR_S 1085, F-35042 Rennes cedex, France
| | - Robert L. Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Fernando Corrales
- Functional Proteomics Laboratory, Centro Nacional de Biotecnología, Spanish Research Council, ProteoRed-.ISCIII, Madrid 117, Spain
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, The Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
| | - Young-Ki Paik
- Yonsei Proteome Research Center, Yonsei University, 50 Yonsei-ro, Sudaemoon-ku, Seoul 03720, Korea
| | - Susan T. Weintraub
- The University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229, United States
| | - Yves Vandenbrouck
- Univ. Grenoble Alpes, CEA, INSERM, IRIG-BGE, U1038, F-38000 Grenoble, France
| | - Gilbert S. Omenn
- Institute for Systems Biology, Seattle, Washington 98109, United States
- Departments of Computational Medicine & Bioinformatics, Internal Medicine, and Human Genetics and School of Public Health, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
| |
Collapse
|