1
|
Perez-Riverol Y, Bittremieux W, Noble WS, Martens L, Bilbao A, Lazear MR, Grüning B, Katz DS, MacCoss MJ, Dai C, Eng JK, Bouwmeester R, Shortreed MR, Audain E, Sachsenberg T, Van Goey J, Wallmann G, Wen B, Käll L, Fondrie WE. Open-Source and FAIR Research Software for Proteomics. J Proteome Res 2025; 24:2222-2234. [PMID: 40267229 PMCID: PMC12053954 DOI: 10.1021/acs.jproteome.4c01079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Revised: 03/14/2025] [Accepted: 04/11/2025] [Indexed: 04/25/2025]
Abstract
Scientific discovery relies on innovative software as much as experimental methods, especially in proteomics, where computational tools are essential for mass spectrometer setup, data analysis, and interpretation. Since the introduction of SEQUEST, proteomics software has grown into a complex ecosystem of algorithms, predictive models, and workflows, but the field faces challenges, including the increasing complexity of mass spectrometry data, limited reproducibility due to proprietary software, and difficulties integrating with other omics disciplines. Closed-source, platform-specific tools exacerbate these issues by restricting innovation, creating inefficiencies, and imposing hidden costs on the community. Open-source software (OSS), aligned with the FAIR Principles (Findable, Accessible, Interoperable, Reusable), offers a solution by promoting transparency, reproducibility, and community-driven development, which fosters collaboration and continuous improvement. In this manuscript, we explore the role of OSS in computational proteomics, its alignment with FAIR principles, and its potential to address challenges related to licensing, distribution, and standardization. Drawing on lessons from other omics fields, we present a vision for a future where OSS and FAIR principles underpin a transparent, accessible, and innovative proteomics community.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Genome
Campus, Cambridge CB10
1SD, U.K.
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - William S. Noble
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Lennart Martens
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Aivett Bilbao
- Environmental
Molecular Sciences Laboratory, Pacific Northwest
National Laboratory, Richland, Washington 99352, United States
- US
Department of Energy Agile BioFoundry, Emeryville, California 94608, United States
| | - Michael R. Lazear
- Belharra
Therapeutics, 3985 Sorrento
Valley Boulevard Suite C, San Diego, California 92121, United States
| | - Bjorn Grüning
- Bioinformatics
Group, Department of Computer Science, Albert-Ludwigs
University Freiburg, Freiburg 79110, Germany
| | - Daniel S. Katz
- National
Center for Supercomputing Applications & Siebel School of Computing
and Data Science & School of Information Sciences, University of Illinois Urbana−Champaign, Urbana, Illinois 61801, United States
| | - Michael J. MacCoss
- Department
of Genome Sciences, University of Washington, 3720 15th St. NE, Seattle, Washington 98195, United States
| | - Chengxin Dai
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, National
Center for Protein Sciences (Beijing), Beijing
Institute of Life Omics, Beijing 102206, China
| | - Jimmy K. Eng
- Proteomics
Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Michael R. Shortreed
- Department
of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Enrique Audain
- Institute
of Medical Genetics, University Medicine
Oldenburg, Carl von Ossietzky University, Oldenburg 26129, Germany
| | - Timo Sachsenberg
- Department
of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany
| | | | - Georg Wallmann
- Proteomics
and Signal Transduction, Max Planck Institute
of Biochemistry, Martinsried 82152, Germany
| | - Bo Wen
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Lukas Käll
- Science
for Life Laboratory, School of Engineering Sciences in Chemistry,
Biotechnology and Health, KTH Royal Institute
of Technology, Stockholm 17165, Sweden
| | | |
Collapse
|
2
|
Perez-Riverol Y, Bandla C, Kundu D, Kamatchinathan S, Bai J, Hewapathirana S, John N, Prakash A, Walzer M, Wang S, Vizcaíno J. The PRIDE database at 20 years: 2025 update. Nucleic Acids Res 2025; 53:D543-D553. [PMID: 39494541 PMCID: PMC11701690 DOI: 10.1093/nar/gkae1011] [Citation(s) in RCA: 115] [Impact Index Per Article: 115.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/11/2024] [Accepted: 10/16/2024] [Indexed: 11/05/2024] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's leading mass spectrometry (MS)-based proteomics data repository and one of the founding members of the ProteomeXchange consortium. This manuscript summarizes the developments in PRIDE resources and related tools for the last three years. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 534 datasets per month. This has been possible thanks to continuous improvements in infrastructure such as a new file transfer protocol for very large datasets (Globus), a new data resubmission pipeline and an automatic dataset validation process. Additionally, we will highlight novel activities such as the availability of the PRIDE chatbot (based on the use of open-source Large Language Models), and our work to improve support for MS crosslinking datasets. Furthermore, we will describe how we have increased our efforts to reuse, reanalyze and disseminate high-quality proteomics data into added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Nithu Sara John
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
3
|
Klein J, Lam H, Mak TD, Bittremieux W, Perez-Riverol Y, Gabriels R, Shofstahl J, Hecht H, Binz PA, Kawano S, Van Den Bossche T, Carver J, Neely BA, Mendoza L, Suomi T, Claeys T, Payne T, Schulte D, Sun Z, Hoffmann N, Zhu Y, Neumann S, Jones AR, Bandeira N, Vizcaíno JA, Deutsch EW. The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF. Anal Chem 2024; 96:18491-18501. [PMID: 39514576 PMCID: PMC11579979 DOI: 10.1021/acs.analchem.4c04091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 10/16/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024]
Abstract
Mass spectral libraries are collections of reference spectra, usually associated with specific analytes from which the spectra were generated, that are used for further downstream analysis of new spectra. There are many different formats used for encoding spectral libraries, but none have undergone a standardization process to ensure broad applicability to many applications. As part of the Human Proteome Organization Proteomics Standards Initiative (PSI), we have developed a standardized format for encoding spectral libraries, called mzSpecLib (https://psidev.info/mzSpecLib). It is primarily a data model that flexibly encodes metadata about the library entries using the extensible PSI-MS controlled vocabulary and can be encoded in and converted between different serialization formats. We have also developed a standardized data model and serialization for fragment ion peak annotations, called mzPAF (https://psidev.info/mzPAF). It is defined as a separate standard, since it may be used for other applications besides spectral libraries. The mzSpecLib and mzPAF standards are compatible with existing PSI standards such as ProForma 2.0 and the Universal Spectrum Identifier. The mzSpecLib and mzPAF standards have been primarily defined for peptides in proteomics applications with basic small molecule support. They could be extended in the future to other fields that need to encode spectral libraries for nonpeptidic analytes.
Collapse
Affiliation(s)
- Joshua Klein
- Program
for Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, 999077 Hong Kong, P. R. China
| | - Tytus D. Mak
- Mass
Spectrometry Data Center, National Institute
of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jim Shofstahl
- Thermo
Fisher
Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Helge Hecht
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 60200 Brno, Czech Republic
| | | | - Shin Kawano
- Database
Center for Life Science, Joint Support Center
for Data Science Research, Research Organization of Information and
Systems, Chiba 277-0871, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
| | - Benjamin A. Neely
- National
Institute of Standards and Technology (NIST) Charleston, Charleston, South Carolina 29412, United States
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tomi Suomi
- Turku Bioscience
Centre, University of Turku and Åbo
Akademi University, FI-20520 Turku, Finland
| | - Tine Claeys
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Thomas Payne
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Douwe Schulte
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584,
CH, Utrecht, The
Netherlands
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Steffen Neumann
- Computational
Plant Biochemistry, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, United Kingdom
| | - Nuno Bandeira
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
4
|
Stroggilos R, Tserga A, Zoidakis J, Vlahou A, Makridakis M. Tissue proteomics repositories for data reanalysis. MASS SPECTROMETRY REVIEWS 2024; 43:1270-1284. [PMID: 37534389 DOI: 10.1002/mas.21860] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 07/17/2023] [Accepted: 07/18/2023] [Indexed: 08/04/2023]
Abstract
We are approaching the third decade since the establishment of the very first proteomics repositories back in the mid-'00s. New experimental approaches and technologies continuously enrich the field while producing vast amounts of mass spectrometry data. Together with initiatives to establish standard terminology and file formats, proteomics is rapidly transforming into a mature component of systems biology. Here we describe the ProteomeXchange consortium repositories. We specifically search, collect and evaluate public human tissue datasets (categorized as "complete" by the repository) submitted in 2015-2022, to both map the existing information and assess the data set reusability. Human tissue data are variably represented in the repositories reviewed, ranging between 10% and 25% of the total data submitted, with cancers being the most represented, followed by neuronal and cardiovascular diseases. About half of the retrieved data sets were found to lack annotations or metadata necessary to directly replicate the analysis. This poses a rough challenge to data reusability and highlights the need to increase awareness of the mage-tab file format for metadata in the community. Overall, proteomics repositories have evolved greatly over the past 7 years, as they have grown in size and become equipped with various powerful applications and tools that enable data searching and analytical tasks. However, to make the most of this potential, priority must be given to finding ways to secure detailed metadata for each submission, which is likely the next major milestone for proteomics repositories.
Collapse
Affiliation(s)
- Rafael Stroggilos
- Biomedical Research Foundation, Academy of Athens, Department of Biotechnology, Athens, Greece
| | - Aggeliki Tserga
- Biomedical Research Foundation, Academy of Athens, Department of Biotechnology, Athens, Greece
| | - Jerome Zoidakis
- Biomedical Research Foundation, Academy of Athens, Department of Biotechnology, Athens, Greece
| | - Antonia Vlahou
- Biomedical Research Foundation, Academy of Athens, Department of Biotechnology, Athens, Greece
| | - Manousos Makridakis
- Biomedical Research Foundation, Academy of Athens, Department of Biotechnology, Athens, Greece
| |
Collapse
|
5
|
Combe CW, Kolbowski L, Fischer L, Koskinen V, Klein J, Leitner A, Jones AR, Vizcaíno JA, Rappsilber J. mzIdentML 1.3.0 - Essential progress on the support of crosslinking and other identifications based on multiple spectra. Proteomics 2024; 24:e2300385. [PMID: 39001627 DOI: 10.1002/pmic.202300385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/07/2024] [Accepted: 02/09/2024] [Indexed: 10/10/2024]
Abstract
The mzIdentML data format, originally developed by the Proteomics Standards Initiative in 2011, is the open XML data standard for peptide and protein identification results coming from mass spectrometry. We present mzIdentML version 1.3.0, which introduces new functionality and support for additional use cases. First of all, a new mechanism for encoding identifications based on multiple spectra has been introduced. Furthermore, the main mzIdentML specification document can now be supplemented by extension documents which provide further guidance for encoding specific use cases for different proteomics subfields. One extension document has been added, covering additional use cases for the encoding of crosslinked peptide identifications. The ability to add extension documents facilitates keeping the mzIdentML standard up to date with advances in the proteomics field, without having to change the main specification document. The crosslinking extension document provides further explanation of the crosslinking use cases already supported in mzIdentML version 1.2.0, and provides support for encoding additional scenarios that are critical to reflect developments in the crosslinking field and facilitate its integration in structural biology. These are: (i) support for cleavable crosslinkers, (ii) support for internally linked peptides, (iii) support for noncovalently associated peptides, and (iv) improved support for encoding scores and the corresponding thresholds.
Collapse
Affiliation(s)
- Colin W Combe
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | - Lars Kolbowski
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | - Lutz Fischer
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | | | - Joshua Klein
- Program for Bioinformatics, Boston University, Boston, Massachusetts, USA
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Andrew R Jones
- Department of Biochemistry & Systems Biology, University of Liverpool, Liverpool, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL-EBI), Hinxton, Cambridge, UK
| | - Juri Rappsilber
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| |
Collapse
|
6
|
Combe CW, Graham M, Kolbowski L, Fischer L, Rappsilber J. xiVIEW: Visualisation of Crosslinking Mass Spectrometry Data. J Mol Biol 2024; 436:168656. [PMID: 39237202 DOI: 10.1016/j.jmb.2024.168656] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/17/2024] [Accepted: 06/07/2024] [Indexed: 09/07/2024]
Abstract
Crosslinking mass spectrometry (MS) has emerged as an important technique for elucidating the in-solution structures of protein complexes and the topology of protein-protein interaction networks. However, the expanding user community lacked an integrated visualisation tool that helped them make use of the crosslinking data for investigating biological mechanisms. We addressed this need by developing xiVIEW, a web-based application designed to streamline crosslinking MS data analysis, which we present here. xiVIEW provides a user-friendly interface for accessing coordinated views of mass spectrometric data, network visualisation, annotations extracted from trusted repositories like UniProtKB, and available 3D structures. In accordance with recent recommendations from the crosslinking MS community, xiVIEW (i) provides a standards compliant parser to improve data integration and (ii) offers accessible visualisation tools. By promoting the adoption of standard file formats and providing a comprehensive visualisation platform, xiVIEW empowers both experimentalists and modellers alike to pursue their respective research interests. We anticipate that xiVIEW will advance crosslinking MS-inspired research, and facilitate broader and more effective investigations into complex biological systems.
Collapse
Affiliation(s)
- Colin W Combe
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK.
| | - Martin Graham
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK
| | - Lars Kolbowski
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK; Technische Universität Berlin, 10623 Berlin, Germany
| | - Lutz Fischer
- Technische Universität Berlin, 10623 Berlin, Germany.
| | - Juri Rappsilber
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK; Technische Universität Berlin, 10623 Berlin, Germany.
| |
Collapse
|
7
|
Klein J, Carvalho L, Zaia J. Expanding N-glycopeptide identifications by modeling fragmentation, elution, and glycome connectivity. Nat Commun 2024; 15:6168. [PMID: 39039063 PMCID: PMC11263600 DOI: 10.1038/s41467-024-50338-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 07/08/2024] [Indexed: 07/24/2024] Open
Abstract
Accurate glycopeptide identification in mass spectrometry-based glycoproteomics is a challenging problem at scale. Recent innovation has been made in increasing the scope and accuracy of glycopeptide identifications, with more precise uncertainty estimates for each part of the structure. We present a dynamically adapting relative retention time model for detecting and correcting ambiguous glycan assignments that are difficult to detect from fragmentation alone, a layered approach to glycopeptide fragmentation modeling that improves N-glycopeptide identification in samples without compromising identification quality, and a site-specific method to increase the depth of the glycoproteome confidently identifiable even further. We demonstrate our techniques on a set of previously published datasets, showing the performance gains at each stage of optimization. These techniques are provided in the open-source glycomics and glycoproteomics platform GlycReSoft available at https://github.com/mobiusklein/glycresoft .
Collapse
Affiliation(s)
- Joshua Klein
- Program for Bioinformatics, Boston University, Boston, MA, US.
| | - Luis Carvalho
- Program for Bioinformatics, Boston University, Boston, MA, US
- Department of Math and Statistics, Boston University, Boston, MA, US
| | - Joseph Zaia
- Program for Bioinformatics, Boston University, Boston, MA, US.
- Department of Biochemistry and Cell Biology, Boston University, Boston, MA, US.
| |
Collapse
|
8
|
Gatto L, Aebersold R, Cox J, Demichev V, Derks J, Emmott E, Franks AM, Ivanov AR, Kelly RT, Khoury L, Leduc A, MacCoss MJ, Nemes P, Perlman DH, Petelski AA, Rose CM, Schoof EM, Van Eyk J, Vanderaa C, Yates JR, Slavov N. Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nat Methods 2023; 20:375-386. [PMID: 36864200 PMCID: PMC10130941 DOI: 10.1038/s41592-023-01785-3] [Citation(s) in RCA: 82] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 01/24/2023] [Indexed: 03/04/2023]
Abstract
Analyzing proteins from single cells by tandem mass spectrometry (MS) has recently become technically feasible. While such analysis has the potential to accurately quantify thousands of proteins across thousands of single cells, the accuracy and reproducibility of the results may be undermined by numerous factors affecting experimental design, sample preparation, data acquisition and data analysis. We expect that broadly accepted community guidelines and standardized metrics will enhance rigor, data quality and alignment between laboratories. Here we propose best practices, quality controls and data-reporting recommendations to assist in the broad adoption of reliable quantitative workflows for single-cell proteomics. Resources and discussion forums are available at https://single-cell.net/guidelines .
Collapse
Affiliation(s)
- Laurent Gatto
- Computational Biology and Bioinformatics Unit, de Duve Institute, Université Catholique de Louvain, Brussels, Belgium
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Juergen Cox
- Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | - Jason Derks
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Edward Emmott
- Centre for Proteome Research, Department of Biochemistry and Systems Biology, University of Liverpool, Liverpool, UK
| | - Alexander M Franks
- Department of Statistics and Applied Probability, University of California Santa Barbara, Santa Barbara, CA, USA
| | - Alexander R Ivanov
- Department of Chemistry and Chemical Biology, Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA, USA
| | - Ryan T Kelly
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - Luke Khoury
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | | | - Peter Nemes
- Department of Chemistry and Biochemistry, University of Maryland, College Park, MD, USA
| | - David H Perlman
- Merck Exploratory Science Center, Merck Sharp & Dohme Corp., Cambridge, MA, USA
| | - Aleksandra A Petelski
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| | - Christopher M Rose
- Department of Microchemistry, Proteomics and Lipidomics, Genentech Inc., South San Francisco, CA, USA
| | - Erwin M Schoof
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark
| | | | - Christophe Vanderaa
- Computational Biology and Bioinformatics Unit, de Duve Institute, Université Catholique de Louvain, Brussels, Belgium
| | - John R Yates
- Departments of Molecular Medicine and Neurobiology, the Scripps Research Institute, La Jolla, CA, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA.
- Parallel Squared Technology Institute, Watertown, MA, USA.
| |
Collapse
|
9
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
10
|
Gabriels R, Declercq A, Bouwmeester R, Degroeve S, Martens L. psm_utils: A High-Level Python API for Parsing and Handling Peptide-Spectrum Matches and Proteomics Search Results. J Proteome Res 2023; 22:557-560. [PMID: 36508242 DOI: 10.1021/acs.jproteome.2c00609] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A plethora of proteomics search engine output file formats are in circulation. This lack of standardized output files greatly complicates generic downstream processing of peptide-spectrum matches (PSMs) and PSM files. While standards exist to solve this problem, these are far from universally supported by search engines. Moreover, software libraries are available to read a selection of PSM file formats, but a package to parse PSM files into a unified data structure has been missing. Here, we present psm_utils, a Python package to read and write various PSM file formats and to handle peptidoforms, PSMs, and PSM lists in a unified and user-friendly Python-, command line-, and web-interface. psm_utils was developed with pragmatism and maintainability in mind, adhering to community standards and relying on existing packages where possible. The Python API and command line interface greatly facilitate handling various PSM file formats. Moreover, a user-friendly web application was built using psm_utils that allows anyone to interconvert PSM files and retrieve basic PSM statistics. psm_utils is freely available under the permissive Apache2 license at https://github.com/compomics/psm_utils.
Collapse
Affiliation(s)
- Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
11
|
Hoopmann MR, Shteynberg DD, Zelter A, Riffle M, Lyon AS, Agard DA, Luan Q, Nolen BJ, MacCoss MJ, Davis TN, Moritz RL. Improved Analysis of Cross-Linking Mass Spectrometry Data with Kojak 2.0, Advanced by Integration into the Trans-Proteomic Pipeline. J Proteome Res 2023; 22:647-655. [PMID: 36629399 PMCID: PMC10234491 DOI: 10.1021/acs.jproteome.2c00670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Fragmentation ion spectral analysis of chemically cross-linked proteins is an established technology in the proteomics research repertoire for determining protein interactions, spatial orientation, and structure. Here we present Kojak version 2.0, a major update to the original Kojak algorithm, which was developed to identify cross-linked peptides from fragment ion spectra using a database search approach. A substantially improved algorithm with updated scoring metrics, support for cleavable cross-linkers, and identification of cross-links between 15N-labeled homomultimers are among the newest features of Kojak 2.0 presented here. Kojak 2.0 is now integrated into the Trans-Proteomic Pipeline, enabling access to dozens of additional tools within that suite. In particular, the PeptideProphet and iProphet tools for validation of cross-links improve the sensitivity and accuracy of correct cross-link identifications at user-defined thresholds. These new features improve the versatility of the algorithm, enabling its use in a wider range of experimental designs and analysis pipelines. Kojak 2.0 remains open-source and multiplatform.
Collapse
Affiliation(s)
| | | | - Alex Zelter
- Department of Biochemistry, University of Washington, Seattle, WA, USA 98195
| | - Michael Riffle
- Department of Biochemistry, University of Washington, Seattle, WA, USA 98195
| | - Andrew S. Lyon
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA 94143
| | - David A. Agard
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA 94143
| | - Qing Luan
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, USA 97403
| | - Brad J. Nolen
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR, USA 97403
| | - Michael J. MacCoss
- Department of Genome Sciences, University of Washington, Seattle, WA, USA 98195
| | - Trisha N. Davis
- Department of Biochemistry, University of Washington, Seattle, WA, USA 98195
| | | |
Collapse
|
12
|
Deutsch EW, Vizcaíno JA, Jones AR, Binz PA, Lam H, Klein J, Bittremieux W, Perez-Riverol Y, Tabb DL, Walzer M, Ricard-Blum S, Hermjakob H, Neumann S, Mak TD, Kawano S, Mendoza L, Van Den Bossche T, Gabriels R, Bandeira N, Carver J, Pullman B, Sun Z, Hoffmann N, Shofstahl J, Zhu Y, Licata L, Quaglia F, Tosatto SCE, Orchard SE. Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work. J Proteome Res 2023; 22:287-301. [PMID: 36626722 PMCID: PMC9903322 DOI: 10.1021/acs.jproteome.2c00637] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Indexed: 01/11/2023]
Abstract
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Pierre-Alain Binz
- Clinical
Chemistry Service, Lausanne University Hospital, 1011 976 Lausanne, Switzerland
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, P. R. China.
| | - Joshua Klein
- Program for
Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Wout Bittremieux
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David L. Tabb
- SA MRC
Centre for TB Research, DST/NRF Centre of Excellence for Biomedical
TB Research, Division of Molecular Biology and Human Genetics, Faculty
of Medicine and Health Sciences, Stellenbosch
University, Cape Town 7602, South Africa
| | - Mathias Walzer
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sylvie Ricard-Blum
- Univ.
Lyon, Université Lyon 1, ICBMS, UMR 5246, 69622 Villeurbanne, France
| | - Henning Hermjakob
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Steffen Neumann
- Bioinformatics
and Scientific Data, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), 04103 Halle-Jena-Leipzig, Germany
| | - Tytus D. Mak
- Mass Spectrometry
Data Center, National Institute of Standards
and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United
States
| | - Shin Kawano
- Database
Center for Life Science, Joint Support Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- Faculty
of Contemporary Society, Toyama University
of International Studies, Toyama 930-1292, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Nuno Bandeira
- Skaggs
School
of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Benjamin Pullman
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego 92093-0404, United States
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Jim Shofstahl
- Thermo
Fisher Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Luana Licata
- Fondazione
Human Technopole, 20157 Milan, Italy
- Department
of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Quaglia
- Institute
of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), 70126 Bari, Italy
- Department
of Biomedical Sciences, University of Padova, 35131 Padova, Italy
| | | | - Sandra E. Orchard
- European
Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
13
|
Deutsch EW, Bandeira N, Perez-Riverol Y, Sharma V, Carver J, Mendoza L, Kundu DJ, Wang S, Bandla C, Kamatchinathan S, Hewapathirana S, Pullman B, Wertz J, Sun Z, Kawano S, Okuda S, Watanabe Y, MacLean B, MacCoss M, Zhu Y, Ishihama Y, Vizcaíno J. The ProteomeXchange consortium at 10 years: 2023 update. Nucleic Acids Res 2023; 51:D1539-D1548. [PMID: 36370099 PMCID: PMC9825490 DOI: 10.1093/nar/gkac1040] [Citation(s) in RCA: 361] [Impact Index Per Article: 180.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/20/2022] [Accepted: 10/23/2022] [Indexed: 11/13/2022] Open
Abstract
Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination of public MS proteomics data. It is now 10 years since the initial data workflow was implemented. In this manuscript, we describe the main developments in PX since the previous update manuscript in Nucleic Acids Research was published in 2020. The six members of the Consortium are PRIDE, PeptideAtlas (including PASSEL), MassIVE, jPOST, iProX and Panorama Public. We report the current data submission statistics, showcasing that the number of datasets submitted to PX resources has continued to increase every year. As of June 2022, more than 34 233 datasets had been submitted to PX resources, and from those, 20 062 (58.6%) just in the last three years. We also report the development of the Universal Spectrum Identifiers and the improvements in capturing the experimental metadata annotations. In parallel, we highlight that data re-use activities of public datasets continue to increase, enabling connections between PX resources and other popular bioinformatics resources, novel research and also new data resources. Finally, we summarise the current state-of-the-art in data management practices for sensitive human (clinical) proteomics data.
Collapse
Affiliation(s)
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Jeremy J Carver
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Luis Mendoza
- Institute for Systems Biology, Seattle WA 98109, USA
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Benjamin S Pullman
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Julie Wertz
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Dept. Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Zhi Sun
- Institute for Systems Biology, Seattle WA 98109, USA
| | - Shin Kawano
- Faculty of Contemporary Society, Toyama University of International Studies, Toyama 930-1292, Japan
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Chiba 277-0871, Japan
- School of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Yu Watanabe
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | | | | | - Yunping Zhu
- Beijing Proteome Research Center, National Center for Protein Sciences, Beijing Institute of Lifeomics, Beijing 102206, China
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
14
|
Jones AR, Deutsch EW, Vizcaíno JA. Is DIA proteomics data FAIR? Current data sharing practices, available bioinformatics infrastructure and recommendations for the future. Proteomics 2022; 23:e2200014. [PMID: 36074795 PMCID: PMC10155627 DOI: 10.1002/pmic.202200014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/27/2022] [Accepted: 08/29/2022] [Indexed: 11/06/2022]
Abstract
Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in e.g. instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards, since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 3BX, UK
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, USA
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
15
|
Perez-Riverol Y. Proteomic repository data submission, dissemination, and reuse: key messages. Expert Rev Proteomics 2022; 19:297-310. [PMID: 36529941 PMCID: PMC7614296 DOI: 10.1080/14789450.2022.2160324] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 12/07/2022] [Indexed: 12/23/2022]
Abstract
INTRODUCTION The creation of ProteomeXchange data workflows in 2012 transformed the field of proteomics, consisting of the standardization of data submission and dissemination and enabling the widespread reanalysis of public MS proteomics data worldwide. ProteomeXchange has triggered a growing trend toward public dissemination of proteomics data, facilitating the assessment, reuse, comparative analyses, and extraction of new findings from public datasets. By 2022, the consortium is integrated by PRIDE, PeptideAtlas, MassIVE, jPOST, iProX, and Panorama Public. AREAS COVERED Here, we review and discuss the current ecosystem of resources, guidelines, and file formats for proteomics data dissemination and reanalysis. Special attention is drawn to new exciting quantitative and post-translational modification-oriented resources. The challenges and future directions on data depositions including the lack of metadata and cloud-based and high-performance software solutions for fast and reproducible reanalysis of the available data are discussed. EXPERT OPINION The success of ProteomeXchange and the amount of proteomics data available in the public domain have triggered the creation and/or growth of other protein knowledgebase resources. Data reuse is a leading, active, and evolving field; supporting the creation of new formats, tools, and workflows to rediscover and reshape the public proteomics data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
16
|
Hoffmann N, Mayer G, Has C, Kopczynski D, Al Machot F, Schwudke D, Ahrends R, Marcus K, Eisenacher M, Turewicz M. A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics. Metabolites 2022; 12:584. [PMID: 35888710 PMCID: PMC9319858 DOI: 10.3390/metabo12070584] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/17/2022] [Accepted: 06/19/2022] [Indexed: 12/13/2022] Open
Abstract
Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography-mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software tools. We further discuss how mzTab-M can serve as a bridge between data acquisition and lipid bioinformatics tools for interpretation, capturing their output and transmitting rich annotated data for downstream processing. However, we identified several challenges of currently available tools and standards. Potential areas for improvement were: adaptation of common nomenclature and standardized reporting to enable high throughput lipidomics and improve its data handling. Finally, we suggest specific areas where tools and repositories need to improve to become FAIRer.
Collapse
Affiliation(s)
- Nils Hoffmann
- Forschungszentrum Jülich GmbH, Institute for Bio- and Geosciences (IBG-5), 52425 Jülich, Germany
| | - Gerhard Mayer
- Institute of Medical Systems Biology, Ulm University, 89081 Ulm, Germany;
| | - Canan Has
- Biological Mass Spectrometry, Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany;
- University Hospital Carl Gustav Carus, 01307 Dresden, Germany
- CENTOGENE GmbH, 18055 Rostock, Germany
| | - Dominik Kopczynski
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Fadi Al Machot
- Faculty of Science and Technology, Norwegian University for Life Science (NMBU), 1433 Ås, Norway;
| | - Dominik Schwudke
- Bioanalytical Chemistry, Forschungszentrum Borstel, Leibniz Lung Center, 23845 Borstel, Germany;
- Airway Research Center North, German Center for Lung Research (DZL), 23845 Borstel, Germany
- German Center for Infection Research (DZIF), TTU Tuberculosis, 23845 Borstel, Germany
| | - Robert Ahrends
- Department of Analytical Chemistry, University of Vienna, 1090 Vienna, Austria; (D.K.); (R.A.)
| | - Katrin Marcus
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
| | - Martin Eisenacher
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany; (K.M.); (M.E.)
- Faculty of Medicine, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Michael Turewicz
- Institute for Clinical Biochemistry and Pathobiochemistry, German Diabetes Center (DDZ), Leibniz Center for Diabetes Research at Heinrich-Heine-University Düsseldorf, 40225 Düsseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Düsseldorf, 85764 Neuherberg, Germany
| |
Collapse
|
17
|
Simple, efficient and thorough shotgun proteomic analysis with PatternLab V. Nat Protoc 2022; 17:1553-1578. [DOI: 10.1038/s41596-022-00690-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 02/08/2022] [Indexed: 11/08/2022]
|
18
|
LeDuc RD, Deutsch EW, Binz PA, Fellers RT, Cesnik AJ, Klein JA, Van Den Bossche T, Gabriels R, Yalavarthi A, Perez-Riverol Y, Carver J, Bittremieux W, Kawano S, Pullman B, Bandeira N, Kelleher NL, Thomas PM, Vizcaíno JA. Proteomics Standards Initiative's ProForma 2.0: Unifying the Encoding of Proteoforms and Peptidoforms. J Proteome Res 2022; 21:1189-1195. [PMID: 35290070 PMCID: PMC7612572 DOI: 10.1021/acs.jproteome.1c00771] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
It is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chemically induced, and artifactual modifications. The Human Proteome Organization Proteomics Standards Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a standard notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP. ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. ProForma 2.0 supports use cases needed for bottom-up and middle-/top-down proteomics approaches and allows the encoding of highly modified proteins and peptides using a human- and machine-readable string. ProForma 2.0 can be used to represent protein modifications in a specified or ambiguous location, designated by mass shifts, chemical formulas, or controlled vocabulary terms, including cross-links (natural and chemical) and atomic isotopes. Notational conventions are based on public controlled vocabularies and ontologies. The most up-to-date full specification document and information about software implementations are available at http://psidev.info/proforma.
Collapse
Affiliation(s)
- Richard D LeDuc
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, Illinois 60611, United States
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Pierre-Alain Binz
- Clinical Chemistry Service, Lausanne University Hospital, 1011 Lausanne, Switzerland
| | - Ryan T Fellers
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, Illinois 60611, United States
| | - Anthony J Cesnik
- Department of Genetics, Stanford University, Stanford, California 94305, United States
- Chan Zuckerberg Biohub, 499 Illinois Street, San Francisco, California 94158, United States
- SciLifeLab, School of Engineering Sciences in Chemistry Biotechnology and Health, KTH-Royal Institute of Technology, SE-171 21 Solna, Stockholm, Sweden 113 51
| | - Joshua A Klein
- Program for Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Technologiepark 75-FSVM II, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Technologiepark 75-FSVM II, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9000 Ghent, Belgium
| | - Arshika Yalavarthi
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, Illinois 60611, United States
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| | | | | | - Shin Kawano
- Toyama University of International Studies, Toyama, 930-1292 Toyama, Higashikuromaki, 6 5-1, Japan
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa, Chiba 277-0871, Japan
| | | | | | - Neil L Kelleher
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, Illinois 60611, United States
| | - Paul M Thomas
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, Illinois 60611, United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
19
|
Hollas MAR, Robey M, Fellers R, LeDuc R, Thomas P, Kelleher N. The Human Proteoform Atlas: a FAIR community resource for experimentally derived proteoforms. Nucleic Acids Res 2022; 50:D526-D533. [PMID: 34986596 PMCID: PMC8728143 DOI: 10.1093/nar/gkab1086] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/06/2021] [Accepted: 11/14/2021] [Indexed: 01/01/2023] Open
Abstract
The Human Proteoform Atlas (HPfA) is a web-based repository of experimentally verified human proteoforms on-line at http://human-proteoform-atlas.org and is a direct descendant of the Consortium of Top-Down Proteomics' (CTDP) Proteoform Atlas. Proteoforms are the specific forms of protein molecules expressed by our cells and include the unique combination of post-translational modifications (PTMs), alternative splicing and other sources of variation deriving from a specific gene. The HPfA uses a FAIR system to assign persistent identifiers to proteoforms which allows for redundancy calling and tracking from prior and future studies in the growing community of proteoform biology and measurement. The HPfA is organized around open ontologies and enables flexible classification of proteoforms. To achieve this, a public registry of experimentally verified proteoforms was also created. Submission of new proteoforms can be processed through email vianrtdphelp@northwestern.edu, and future iterations of these proteoform atlases will help to organize and assign function to proteoforms, their PTMs and their complexes in the years ahead.
Collapse
Affiliation(s)
- Michael A R Hollas
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Matthew T Robey
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Ryan T Fellers
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Richard D LeDuc
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Paul M Thomas
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Neil L Kelleher
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
20
|
Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, Kundu D, Prakash A, Frericks-Zipper A, Eisenacher M, Walzer M, Wang S, Brazma A, Vizcaíno J. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 2022; 50:D543-D552. [PMID: 34723319 PMCID: PMC8728295 DOI: 10.1093/nar/gkab1038] [Citation(s) in RCA: 3952] [Impact Index Per Article: 1317.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 12/12/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David García-Seisdedos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anika Frericks-Zipper
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
21
|
A proteomics sample metadata representation for multiomics integration and big data analysis. Nat Commun 2021; 12:5854. [PMID: 34615866 PMCID: PMC8494749 DOI: 10.1038/s41467-021-26111-3] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 09/16/2021] [Indexed: 11/08/2022] Open
Abstract
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.
Collapse
|
22
|
Kurt LU, Clasen MA, Santos MDM, Lyra ESB, Santos LO, Ramos CHI, Lima DB, Gozzo FC, Carvalho PC. Characterizing protein conformers by cross-linking mass spectrometry and pattern recognition. Bioinformatics 2021; 37:3035-3037. [PMID: 33681984 DOI: 10.1093/bioinformatics/btab149] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 02/25/2021] [Accepted: 03/02/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Chemical cross-linking coupled to mass spectrometry (XLMS) emerged as a powerful technique for studying protein structures and large-scale protein-protein interactions. Nonetheless, XLMS lacks software tailored toward dealing with multiple conformers; this scenario can lead to high-quality identifications that are mutually exclusive. This limitation hampers the applicability of XLMS in structural experiments of dynamic protein systems, where less abundant conformers of the target protein are expected in the sample. RESULTS We present QUIN-XL, a software that uses unsupervised clustering to group cross-link identifications by their quantitative profile across multiple samples. QUIN-XL highlights regions of the protein or system presenting changes in its conformation when comparing different biological conditions. We demonstrate our software's usefulness by revisiting the HSP90 protein, comparing three of its different conformers. QUIN-XL's clusters correlate directly to known protein 3D structures of the conformers and therefore validates our software. AVAILABILITYAND IMPLEMENTATION QUIN-XL and a user tutorial are freely available at http://patternlabforproteomics.org/quinxl for academic users. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Louise U Kurt
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, Fiocruz, Paraná 81350-010, Brazil
| | - Milan A Clasen
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, Fiocruz, Paraná 81350-010, Brazil
| | - Marlon D M Santos
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, Fiocruz, Paraná 81350-010, Brazil
| | - Eduardo S B Lyra
- Institute of Chemistry, University of Campinas, São Paulo 13083-862, Brazil
| | - Luana O Santos
- Institute of Chemistry, University of Campinas, São Paulo 13083-862, Brazil
| | - Carlos H I Ramos
- Institute of Chemistry, University of Campinas, São Paulo 13083-862, Brazil
| | - Diogo B Lima
- Department of Chemical Biology, Leibniz - Forschungsinstitut für Molekulare Pharmakologie (FMP), Berlin 13125, Germany
| | - Fabio C Gozzo
- Institute of Chemistry, University of Campinas, São Paulo 13083-862, Brazil
| | - Paulo C Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, Fiocruz, Paraná 81350-010, Brazil
| |
Collapse
|
23
|
Mayer G, Müller W, Schork K, Uszkoreit J, Weidemann A, Wittig U, Rey M, Quast C, Felden J, Glöckner FO, Lange M, Arend D, Beier S, Junker A, Scholz U, Schüler D, Kestler HA, Wibberg D, Pühler A, Twardziok S, Eils J, Eils R, Hoffmann S, Eisenacher M, Turewicz M. Implementing FAIR data management within the German Network for Bioinformatics Infrastructure (de.NBI) exemplified by selected use cases. Brief Bioinform 2021; 22:bbab010. [PMID: 33589928 PMCID: PMC8425304 DOI: 10.1093/bib/bbab010] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 12/21/2020] [Accepted: 01/06/2021] [Indexed: 12/21/2022] Open
Abstract
This article describes some use case studies and self-assessments of FAIR status of de.NBI services to illustrate the challenges and requirements for the definition of the needs of adhering to the FAIR (findable, accessible, interoperable and reusable) data principles in a large distributed bioinformatics infrastructure. We address the challenge of heterogeneity of wet lab technologies, data, metadata, software, computational workflows and the levels of implementation and monitoring of FAIR principles within the different bioinformatics sub-disciplines joint in de.NBI. On the one hand, this broad service landscape and the excellent network of experts are a strong basis for the development of useful research data management plans. On the other hand, the large number of tools and techniques maintained by distributed teams renders FAIR compliance challenging.
Collapse
Affiliation(s)
- Gerhard Mayer
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
- Ulm University, Institute of Medical Systems Biology, Ulm, Germany
| | - Wolfgang Müller
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Karin Schork
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Julian Uszkoreit
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Andreas Weidemann
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Maja Rey
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | | | - Janine Felden
- Jacobs University Bremen gGmbH, Bremen, Germany
- University of Bremen, MARUM - Center for Marine Environmental Sciences, Bremen, Germany
| | - Frank Oliver Glöckner
- Jacobs University Bremen gGmbH, Bremen, Germany
- University of Bremen, MARUM - Center for Marine Environmental Sciences, Bremen, Germany
- Alfred Wegener Institute - Helmholtz Center for Polar- and Marine Research, Bremerhaven, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Hans A Kestler
- Ulm University, Institute of Medical Systems Biology, Ulm, Germany
- Leibniz Institute on Ageing - Fritz Lipmann Institute, Jena
| | - Daniel Wibberg
- Bielefeld University, Center for Biotechnology (CeBiTec), Bielefeld, Germany
| | - Alfred Pühler
- Bielefeld University, Center for Biotechnology (CeBiTec), Bielefeld, Germany
| | - Sven Twardziok
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Jürgen Eils
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Roland Eils
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
- Heidelberg University Hospital and BioQuant, Health Data Science Unit, Heidelberg, Germany
| | - Steve Hoffmann
- Leibniz Institute on Ageing - Fritz Lipmann Institute, Jena
| | - Martin Eisenacher
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Michael Turewicz
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| |
Collapse
|
24
|
Lima DB, Dupré M, Duchateau M, Gianetto QG, Rey M, Matondo M, Chamot-Rooke J. ProteoCombiner: integrating bottom-up with top-down proteomics data for improved proteoform assessment. Bioinformatics 2021; 37:2206-2208. [PMID: 33165572 DOI: 10.1093/bioinformatics/btaa958] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 10/26/2020] [Accepted: 11/02/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION We present a high-performance software integrating shotgun with top-down proteomic data. The tool can deal with multiple experiments and search engines. Enable rapid and easy visualization, manual validation and comparison of the identified proteoform sequences including the post-translational modification characterization. RESULTS We demonstrate the effectiveness of our approach on a large-scale Escherichia coli dataset; ProteoCombiner unambiguously shortlisted proteoforms among those identified by the multiple search engines. AVAILABILITY AND IMPLEMENTATION ProteoCombiner, a demonstration video and user tutorial are freely available at https://proteocombiner.pasteur.fr, for academic use; all data are thus available from the ProteomeXchange consortium (identifier PXD017618). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Diogo B Lima
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Mathieu Dupré
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Magalie Duchateau
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Quentin Giai Gianetto
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France.,Bioinformatics and Biostatistics HUB, Computational Biology Department, Institut Pasteur, CNRS USR 3756, Paris, France
| | - Martial Rey
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Mariette Matondo
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| | - Julia Chamot-Rooke
- Mass Spectrometry for Biology Unit, Institut Pasteur, CNRS USR 2000, Paris, France
| |
Collapse
|
25
|
Dorfer V, Strobl M, Winkler S, Mechtler K. MS Amanda 2.0: Advancements in the standalone implementation. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021; 35:e9088. [PMID: 33759252 PMCID: PMC8244010 DOI: 10.1002/rcm.9088] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 02/27/2021] [Accepted: 03/18/2021] [Indexed: 06/12/2023]
Abstract
RATIONALE Database search engines are the preferred method to identify peptides in mass spectrometry data. However, valuable software is in this context not only defined by a powerful algorithm to separate correct from false identifications, but also by constant maintenance and continuous improvements. METHODS In 2014, we presented our peptide identification algorithm MS Amanda, showing its suitability for identifying peptides in high-resolution tandem mass spectrometry data and its ability to outperform widely used tools to identify peptides. Since then, we have continuously worked on improvements to enhance its usability and to support new trends and developments in this fast-growing field, while keeping the original scoring algorithm to assess the quality of a peptide spectrum match unchanged. RESULTS We present the outcome of these efforts, MS Amanda 2.0, a faster and more flexible standalone version with the original scoring algorithm. The new implementation has led to a 3-5× speedup, is able to handle new ion types and supports standard data formats. We also show that MS Amanda 2.0 works best when using only the most common ion types in a particular search instead of all possible ion types. CONCLUSIONS MS Amanda is available free of charge from https://ms.imp.ac.at/index.php?action=msamanda.
Collapse
Affiliation(s)
- Viktoria Dorfer
- Bioinformatics Research GroupUniversity of Applied Sciences Upper AustriaSoftwarepark 11, 4232 HagenbergAustria
| | - Marina Strobl
- Bioinformatics Research GroupUniversity of Applied Sciences Upper AustriaSoftwarepark 11, 4232 HagenbergAustria
| | - Stephan Winkler
- Bioinformatics Research GroupUniversity of Applied Sciences Upper AustriaSoftwarepark 11, 4232 HagenbergAustria
| | - Karl Mechtler
- Institute of Molecular Pathology (IMP)Vienna BioCenter (VBC)Campus‐Vienna‐Biocenter 1Vienna1030Austria
- Institute of Molecular Biotechnology (IMBA)Austrian Academy of Sciences, Vienna BioCenter (VBC)Dr. Bohr‐Gasse 3Vienna1030Austria
- Gregor Mendel Institute (GMI)Austrian Academy of Sciences, Vienna BioCenter (VBC)Dr. Bohr‐ Gasse 3Vienna1030Austria
| |
Collapse
|
26
|
McCabe A, Jones AR. lcmsWorld: High-Performance 3D Visualization Software for Mass Spectrometry. J Proteome Res 2021; 20:1981-1985. [PMID: 33710902 DOI: 10.1021/acs.jproteome.0c00618] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Complex biological samples, in particular, in proteomics and metabolomics research, are often analyzed using mass spectrometry paired with liquid chromatography or gas chromatography. The chromatography stage adds a third dimension (retention time) to the usual 2D mass spectrometry output (mass/charge, detected ion counts). Experimental results are often discovered by complex computational analysis, but it is not always possible to know if the data has been correctly interpreted. To perform quality-control checks, it can often be helpful to verify the results by manually examining the raw data, and it is typically easier to understand the data in a graphical, rather than numerical, form. 3D graphics hardware is present in most modern computers but is rarely utilized by bioinformatics software, even when the data to be viewed are naturally 3D. lcmsWorld is new software that uses graphics hardware to quickly and smoothly examine and compare LC-MS data. A preprocessing step allows the software to subsequently access any area of the data instantly at multiple levels of detail. The data can then be freely navigated while the software automatically selects, loads, and displays the most appropriate detail. lcmsWorld is open source. Releases, source code, and example data files are available via https://github.com/PGB-LIV/lcmsWorld.
Collapse
Affiliation(s)
- Antony McCabe
- Computational Biology Facility, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| | - Andrew R Jones
- Computational Biology Facility, University of Liverpool, Liverpool L69 7ZB, United Kingdom.,Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, United Kingdom
| |
Collapse
|
27
|
Netz E, Dijkstra TMH, Sachsenberg T, Zimmermann L, Walzer M, Monecke T, Ficner R, Dybkov O, Urlaub H, Kohlbacher O. OpenPepXL: An Open-Source Tool for Sensitive Identification of Cross-Linked Peptides in XL-MS. Mol Cell Proteomics 2020; 19:2157-2168. [PMID: 33067342 PMCID: PMC7710140 DOI: 10.1074/mcp.tir120.002186] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Revised: 09/15/2020] [Indexed: 11/06/2022] Open
Abstract
Cross-linking MS (XL-MS) has been recognized as an effective source of information about protein structures and interactions. In contrast to regular peptide identification, XL-MS has to deal with a quadratic search space, where peptides from every protein could potentially be cross-linked to any other protein. To cope with this search space, most tools apply different heuristics for search space reduction. We introduce a new open-source XL-MS database search algorithm, OpenPepXL, which offers increased sensitivity compared with other tools. OpenPepXL searches the full search space of an XL-MS experiment without using heuristics to reduce it. Because of efficient data structures and built-in parallelization OpenPepXL achieves excellent runtimes and can also be deployed on large compute clusters and cloud services while maintaining a slim memory footprint. We compared OpenPepXL to several other commonly used tools for identification of noncleavable labeled and label-free cross-linkers on a diverse set of XL-MS experiments. In our first comparison, we used a data set from a fraction of a cell lysate with a protein database of 128 targets and 128 decoys. At 5% FDR, OpenPepXL finds from 7% to over 50% more unique residue pairs (URPs) than other tools. On data sets with available high-resolution structures for cross-link validation OpenPepXL reports from 7% to over 40% more structurally validated URPs than other tools. Additionally, we used a synthetic peptide data set that allows objective validation of cross-links without relying on structural information and found that OpenPepXL reports at least 12% more validated URPs than other tools. It has been built as part of the OpenMS suite of tools and supports Windows, macOS, and Linux operating systems. OpenPepXL also supports the MzIdentML 1.2 format for XL-MS identification results. It is freely available under a three-clause BSD license at https://openms.org/openpepxl.
Collapse
Affiliation(s)
- Eugen Netz
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany; Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany; Applied Bioinformatics, Dept. of Computer Science, University of Tübingen, Tübingen, Germany.
| | - Tjeerd M H Dijkstra
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany; Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany; Applied Bioinformatics, Dept. of Computer Science, University of Tübingen, Tübingen, Germany; Center for Women's Health, University Clinic Tübingen, Tübingen, Germany
| | - Timo Sachsenberg
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany; Applied Bioinformatics, Dept. of Computer Science, University of Tübingen, Tübingen, Germany
| | - Lukas Zimmermann
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany; Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany; Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany
| | - Mathias Walzer
- Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Thomas Monecke
- X-Ray Crystallography Facility, Institute of Pharmaceutical Biotechnology, University of Ulm, Ulm, Germany; Department of Molecular Structural BiologyInstitute for Microbiology and GeneticsGZMB, Georg-August-University Göttingen, Göttingen, Germany
| | - Ralf Ficner
- Department of Molecular Structural BiologyInstitute for Microbiology and GeneticsGZMB, Georg-August-University Göttingen, Göttingen, Germany
| | - Olexandr Dybkov
- Department for Cellular BiochemistryMax Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Henning Urlaub
- Bioanalytical Mass SpectrometryMax Planck Institute for Biophysical Chemistry, Göttingen, Germany; BioanalyticsInstitute for Clinical Chemistry, University Medical Center, Göttingen, Germany
| | - Oliver Kohlbacher
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany; Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany; Applied Bioinformatics, Dept. of Computer Science, University of Tübingen, Tübingen, Germany; Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany; Quantitative Biology Center, University of Tübingen, Tübingen, Germany.
| |
Collapse
|
28
|
Takan S, Allmer J. DNMSO; an ontology for representing de novo sequencing results from Tandem-MS data. PeerJ 2020; 8:e10216. [PMID: 33150092 PMCID: PMC7585381 DOI: 10.7717/peerj.10216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 09/28/2020] [Indexed: 11/20/2022] Open
Abstract
For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdentML is the current community standard for data representation. There is no community standard for representing de novo sequencing results, but we previously proposed the de novo markup language (DNML). At the moment, each de novo sequencing solution uses different data representation, complicating downstream data integration, which is crucial since ensemble predictions may be more useful than predictions of a single tool. We here propose the de novo MS Ontology (DNMSO), which can, for example, provide many-to-many mappings between spectra and peptide predictions. Additionally, an application programming interface (API) that supports any file operation necessary for de novo sequencing from spectra input to reading, writing, creating, of the DNMSO format, as well as conversion from many other file formats, has been implemented. This API removes all overhead from the production of de novo sequencing tools and allows developers to concentrate on algorithm development completely. We make the API and formal descriptions of the format freely available at https://github.com/savastakan/dnmso.
Collapse
Affiliation(s)
- Savaş Takan
- Department of Computer Engineering, Faculty of Engineering, Izmir Institute of Technology, Izmir, Turkey
| | - Jens Allmer
- Hochschule Ruhr West, University of Applied Sciences, Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Mülheim an der Ruhr, Germany
| |
Collapse
|
29
|
Toward Increased Reliability, Transparency, and Accessibility in Cross-linking Mass Spectrometry. Structure 2020; 28:1259-1268. [PMID: 33065067 DOI: 10.1016/j.str.2020.09.011] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 09/02/2020] [Accepted: 09/24/2020] [Indexed: 01/09/2023]
Abstract
Cross-linking mass spectrometry (MS) has substantially matured as a method over the past 2 decades through parallel development in multiple labs, demonstrating its applicability to protein structure determination, conformation analysis, and mapping protein interactions in complex mixtures. Cross-linking MS has become a much-appreciated and routinely applied tool, especially in structural biology. Therefore, it is timely that the community commits to the development of methodological and reporting standards. This white paper builds on an open process comprising a number of events at community conferences since 2015 and identifies aspects of Cross-linking MS for which guidelines should be developed as part of a Cross-linking MS standards initiative.
Collapse
|
30
|
Ma J, Chen T, Wu S, Yang C, Bai M, Shu K, Li K, Zhang G, Jin Z, He F, Hermjakob H, Zhu Y. iProX: an integrated proteome resource. Nucleic Acids Res 2020; 47:D1211-D1217. [PMID: 30252093 PMCID: PMC6323926 DOI: 10.1093/nar/gky869] [Citation(s) in RCA: 1156] [Impact Index Per Article: 231.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 09/14/2018] [Indexed: 11/13/2022] Open
Abstract
Sharing of research data in public repositories has become best practice in academia. With the accumulation of massive data, network bandwidth and storage requirements are rapidly increasing. The ProteomeXchange (PX) consortium implements a mode of centralized metadata and distributed raw data management, which promotes effective data sharing. To facilitate open access of proteome data worldwide, we have developed the integrated proteome resource iProX (http://www.iprox.org) as a public platform for collecting and sharing raw data, analysis results and metadata obtained from proteomics experiments. The iProX repository employs a web-based proteome data submission process and open sharing of mass spectrometry-based proteomics datasets. Also, it deploys extensive controlled vocabularies and ontologies to annotate proteomics datasets. Users can use a GUI to provide and access data through a fast Aspera-based transfer tool. iProX is a full member of the PX consortium; all released datasets are freely accessible to the public. iProX is based on a high availability architecture and has been deployed as part of the proteomics infrastructure of China, ensuring long-term and stable resource support. iProX will facilitate worldwide data analysis and sharing of proteomics experiments.
Collapse
Affiliation(s)
- Jie Ma
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Tao Chen
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Songfeng Wu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Chunyuan Yang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Mingze Bai
- Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Kunxian Shu
- Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Kenli Li
- National Supercomputing Center in Changsha, Hunan University, Changsha 410082, China
| | - Guoqing Zhang
- Shanghai Center for Bioinformation Technology, Shanghai Institutes of Biomedicine, Shanghai Academy of Science and Technology, Shanghai 200235, China
| | - Zhong Jin
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
| | - Fuchu He
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Henning Hermjakob
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| |
Collapse
|
31
|
Perez-Riverol Y, Moreno P. Scalable Data Analysis in Proteomics and Metabolomics Using BioContainers and Workflows Engines. Proteomics 2020; 20:e1900147. [PMID: 31657527 PMCID: PMC7613303 DOI: 10.1002/pmic.201900147] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 09/30/2019] [Indexed: 12/29/2022]
Abstract
The recent improvements in mass spectrometry instruments and new analytical methods are increasing the intersection between proteomics and big data science. In addition, bioinformatics analysis is becoming increasingly complex and convoluted, involving multiple algorithms and tools. A wide variety of methods and software tools have been developed for computational proteomics and metabolomics during recent years, and this trend is likely to continue. However, most of the computational proteomics and metabolomics tools are designed as single-tiered software application where the analytics tasks cannot be distributed, limiting the scalability and reproducibility of the data analysis. In this paper the key steps of metabolomics and proteomics data processing, including the main tools and software used to perform the data analysis, are summarized. The combination of software containers with workflows environments for large-scale metabolomics and proteomics analysis is discussed. Finally, a new approach for reproducible and large-scale data analysis based on BioContainers and two of the most popular workflow environments, Galaxy and Nextflow, is introduced to the proteomics and metabolomics communities.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
32
|
McGowan T, Johnson JE, Kumar P, Sajulga R, Mehta S, Jagtap PD, Griffin TJ. Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration. Gigascience 2020; 9:giaa025. [PMID: 32236523 PMCID: PMC7102281 DOI: 10.1093/gigascience/giaa025] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 02/13/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Proteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate 'omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing, and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation. FINDINGS MVP is built as an HTML Galaxy plug-in, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input-a custom data type (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface. CONCLUSIONS MVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.
Collapse
Affiliation(s)
- Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant Street SE, Minneapolis, MN 55455, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant Street SE, Minneapolis, MN 55455, USA
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
- Bioinformatics and Computational Biology program, University of Minnesota-Rochester, 111 South Broadway, Suite 300, Rochester, MN 55904, USA
| | - Ray Sajulga
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| |
Collapse
|
33
|
Klein JA, Zaia J. A Perspective on the Confident Comparison of Glycoprotein Site-Specific Glycosylation in Sample Cohorts. Biochemistry 2019; 59:3089-3097. [PMID: 31833756 DOI: 10.1021/acs.biochem.9b00730] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein glycosylation, resulting from glycosyl transferase reactions under complex control in the secretory pathway, consists of a distribution of related glycoforms at each glycosylation site. Because the biosynthetic substrate concentration and transport rates depend on architecture and other aspects of cellular phenotypes, site-specific glycosylation cannot be predicted accurately from genomic, transcriptomic, or proteomic information. Rather, it is necessary to quantify glycosylation at each protein site and how this changes among a sample cohort to provide information about disease mechanisms. At present, mature mass spectrometry-based methods allow for qualitative assignment of the glycan composition and glycosylation site of singly glycosylated proteolytic peptides. To make such quantitative comparisons, it is necessary to sample the glycosylation distribution with sufficient coverage and accuracy for confident assessment of the glycosylation changes that occur in the biological cohort. In this Perspective, we discuss the unmet needs for mass spectrometry acquisition methods and bioinformatics for the confident comparison of protein site-specific glycosylation among sample cohorts.
Collapse
|
34
|
Shteynberg DD, Deutsch EW, Campbell DS, Hoopmann MR, Kusebauch U, Lee D, Mendoza L, Midha MK, Sun Z, Whetton AD, Moritz RL. PTMProphet: Fast and Accurate Mass Modification Localization for the Trans-Proteomic Pipeline. J Proteome Res 2019; 18:4262-4272. [PMID: 31290668 PMCID: PMC6898736 DOI: 10.1021/acs.jproteome.9b00205] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Spectral matching sequence database search engines commonly used on mass spectrometry-based proteomics experiments excel at identifying peptide sequence ions, and in addition, possible sequence ions carrying post-translational modifications (PTMs), but most do not provide confidence metrics for the exact localization of those PTMs when several possible sites are available. Localization is absolutely required for downstream molecular cell biology analysis of PTM function in vitro and in vivo. Therefore, we developed PTMProphet, a free and open-source software tool integrated into the Trans-Proteomic Pipeline, which reanalyzes identified spectra from any search engine for which pepXML output is available to provide localization confidence to enable appropriate further characterization of biologic events. Localization of any type of mass modification (e.g., phosphorylation) is supported. PTMProphet applies Bayesian mixture models to compute probabilities for each site/peptide spectrum match where a PTM has been identified. These probabilities can be combined to compute a global false localization rate at any threshold to guide downstream analysis. We describe the PTMProphet tool, its underlying algorithms, and demonstrate its performance on ground-truth synthetic peptide reference data sets, one previously published small data set, one new larger data set, and also on a previously published phosphoenriched data set where the correct sites of modification are unknown. Data have been deposited to ProteomeXchange with identifier PXD013210.
Collapse
Affiliation(s)
| | | | | | | | | | - Dave Lee
- Stoller Biomarker Discovery Centre, University of Manchester, Manchester, M13 9PL, UK
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, WA, 98008, USA
| | | | - Zhi Sun
- Institute for Systems Biology, Seattle, WA, 98008, USA
| | - Anthony D. Whetton
- Stoller Biomarker Discovery Centre, University of Manchester, Manchester, M13 9PL, UK
| | | |
Collapse
|
35
|
Berman HM, Adams PD, Bonvin AA, Burley SK, Carragher B, Chiu W, DiMaio F, Ferrin TE, Gabanyi MJ, Goddard TD, Griffin PR, Haas J, Hanke CA, Hoch JC, Hummer G, Kurisu G, Lawson CL, Leitner A, Markley JL, Meiler J, Montelione GT, Phillips GN, Prisner T, Rappsilber J, Schriemer DC, Schwede T, Seidel CAM, Strutzenberg TS, Svergun DI, Tajkhorshid E, Trewhella J, Vallat B, Velankar S, Vuister GW, Webb B, Westbrook JD, White KL, Sali A. Federating Structural Models and Data: Outcomes from A Workshop on Archiving Integrative Structures. Structure 2019; 27:1745-1759. [PMID: 31780431 DOI: 10.1016/j.str.2019.11.002] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/31/2019] [Accepted: 11/06/2019] [Indexed: 12/23/2022]
Abstract
Structures of biomolecular systems are increasingly computed by integrative modeling. In this approach, a structural model is constructed by combining information from multiple sources, including varied experimental methods and prior models. In 2019, a Workshop was held as a Biophysical Society Satellite Meeting to assess progress and discuss further requirements for archiving integrative structures. The primary goal of the Workshop was to build consensus for addressing the challenges involved in creating common data standards, building methods for federated data exchange, and developing mechanisms for validating integrative structures. The summary of the Workshop and the recommendations that emerged are presented here.
Collapse
Affiliation(s)
- Helen M Berman
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Bridge Institute, Michelson Center, University of Southern California, Los Angeles, CA 90089, USA.
| | - Paul D Adams
- Physical Biosciences Division, Lawrence Berkeley Laboratory, Berkeley, CA 94720-8235, USA; Department of Bioengineering, University of California-Berkeley, Berkeley, CA 94720, USA
| | - Alexandre A Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences and San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA; Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08903, USA
| | - Bridget Carragher
- Simons Electron Microscopy Center, New York Structural Biology Center, New York, NY 10027, USA; Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Wah Chiu
- Department of Bioengineering, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305-5447, USA; SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA
| | - Frank DiMaio
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Thomas E Ferrin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | - Margaret J Gabanyi
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Thomas D Goddard
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
| | | | - Juergen Haas
- Swiss Institute of Bioinformatics and Biozentrum, University of Basel, 4056 Basel, Switzerland
| | - Christian A Hanke
- Molecular Physical Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Jeffrey C Hoch
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT 06030, USA
| | - Gerhard Hummer
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany; Institute for Biophysics, Goethe University Frankfurt, 60438 Frankfurt am Main, Germany
| | - Genji Kurisu
- Protein Data Bank Japan (PDBj), Institute for Protein Research, Osaka University, Osaka 565-0871, Japan
| | - Catherine L Lawson
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - John L Markley
- BioMagResBank (BMRB), Biochemistry Department, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, 465 21st Avenue South, Nashville, TN 37221, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Biochemistry, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytech Institute, Troy, NY 12180, USA
| | - George N Phillips
- BioSciences at Rice and Department of Chemistry, Rice University, Houston, TX 77251, USA
| | - Thomas Prisner
- Institute of Physical and Theoretical Chemistry and Center of Biomolecular Magnetic Resonance, Goethe University Frankfurt, 60438 Frankfurt am Main, Germany
| | - Juri Rappsilber
- Wellcome Trust Centre for Cell Biology, Edinburgh EH9 3JR, Scotland
| | - David C Schriemer
- Department of Biochemistry & Molecular Biology, Robson DNA Science Centre, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Torsten Schwede
- Swiss Institute of Bioinformatics and Biozentrum, University of Basel, 4056 Basel, Switzerland
| | - Claus A M Seidel
- Molecular Physical Chemistry, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | | | - Dmitri I Svergun
- European Molecular Biology Laboratory (EMBL), Hamburg Outstation, Notkestrasse 85, 22607 Hamburg, Germany
| | - Emad Tajkhorshid
- Department of Biochemistry, NIH Center for Macromolecular Modeling and Bioinformatics, Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jill Trewhella
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia; Department of Chemistry, University of Utah, Salt Lake City, UT 84112, USA
| | - Brinda Vallat
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sameer Velankar
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Geerten W Vuister
- Department of Molecular and Cell Biology, Leicester Institute of Structural and Chemical Biology, University of Leicester, Leicester LE1 9HN, UK
| | - Benjamin Webb
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Kate L White
- Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Bridge Institute, Michelson Center, University of Southern California, Los Angeles, CA 90089, USA
| | - Andrej Sali
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
36
|
Review of Issues and Solutions to Data Analysis Reproducibility and Data Quality in Clinical Proteomics. Methods Mol Biol 2019. [PMID: 31552637 DOI: 10.1007/978-1-4939-9744-2_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
In any analytical discipline, data analysis reproducibility is closely interlinked with data quality. In this book chapter focused on mass spectrometry-based proteomics approaches, we introduce how both data analysis reproducibility and data quality can influence each other and how data quality and data analysis designs can be used to increase robustness and improve reproducibility. We first introduce methods and concepts to design and maintain robust data analysis pipelines such that reproducibility can be increased in parallel. The technical aspects related to data analysis reproducibility are challenging, and current ways to increase the overall robustness are multifaceted. Software containerization and cloud infrastructures play an important part.We will also show how quality control (QC) and quality assessment (QA) approaches can be used to spot analytical issues, reduce the experimental variability, and increase confidence in the analytical results of (clinical) proteomics studies, since experimental variability plays a substantial role in analysis reproducibility. Therefore, we give an overview on existing solutions for QC/QA, including different quality metrics, and methods for longitudinal monitoring. The efficient use of both types of approaches undoubtedly provides a way to improve the experimental reliability, reproducibility, and level of consistency in proteomics analytical measurements.
Collapse
|
37
|
Kolbowski L, Combe C, Rappsilber J. xiSPEC: web-based visualization, analysis and sharing of proteomics data. Nucleic Acids Res 2019; 46:W473-W478. [PMID: 29741719 PMCID: PMC6030980 DOI: 10.1093/nar/gky353] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2018] [Accepted: 04/24/2018] [Indexed: 01/25/2023] Open
Abstract
We present xiSPEC, a standard compliant, next-generation web-based spectrum viewer for visualizing, analyzing and sharing mass spectrometry data. Peptide-spectrum matches from standard proteomics and cross-linking experiments are supported. xiSPEC is to date the only browser-based tool supporting the standardized file formats mzML and mzIdentML defined by the proteomics standards initiative. Users can either upload data directly or select files from the PRIDE data repository as input. xiSPEC allows users to save and share their datasets publicly or password protected for providing access to collaborators or readers and reviewers of manuscripts. The identification table features advanced interaction controls and spectra are presented in three interconnected views: (i) annotated mass spectrum, (ii) peptide sequence fragmentation key and (iii) quality control error plots of matched fragments. Highlighting or selecting data points in any view is represented in all other views. Views are interactive scalable vector graphic elements, which can be exported, e.g. for use in publication. xiSPEC allows for re-annotation of spectra for easy hypothesis testing by modifying input data. xiSPEC is freely accessible at http://spectrumviewer.org and the source code is openly available on https://github.com/Rappsilber-Laboratory/xiSPEC.
Collapse
Affiliation(s)
- Lars Kolbowski
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, UK.,Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
| | - Colin Combe
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Juri Rappsilber
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, UK.,Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
| |
Collapse
|
38
|
Martínez-Bartolomé S, Bamberger C, Lavallée-Adam M, McClatchy DB, Yates JR. Proteomics INTegrator (PINT): An Online Tool To Store, Query, and Visualize Large Proteomics Experiment Results. J Proteome Res 2019; 18:2999-3008. [PMID: 31260318 PMCID: PMC8278777 DOI: 10.1021/acs.jproteome.8b00711] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The characterization of complex biological systems based on high-throughput protein quantification through mass spectrometry commonly involves differential expression analysis between replicate samples originating from different experimental conditions. Here we present Proteomics INTegrator (PINT), a new user-friendly Web-based platform-independent system to store, visualize, and query proteomics experiment results. PINT provides an extremely flexible query interface that allows advanced Boolean algebra-based data filtering of many different proteomics features such as confidence values, abundance levels or ratios, data set overlaps, sample characteristics, as well as UniProtKB annotations, which are transparently incorporated into the system. In addition, PINT allows developers to incorporate data visualization and analysis tools, such as PSEA-Quant and Reactome pathway analysis, for data set enrichment analysis. PINT serves as a centralized hub for large-scale proteomics data and as a platform for data analysis, facilitating the interpretation of proteomics results and expediting biologically relevant conclusions.
Collapse
Affiliation(s)
- Salvador Martínez-Bartolomé
- Department of Molecular Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California, 92037, United States
| | - Casimir Bamberger
- Department of Molecular Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California, 92037, United States
| | - Mathieu Lavallée-Adam
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, K1H 8M5, Canada
| | - Daniel B. McClatchy
- Department of Molecular Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California, 92037, United States
| | - John R. Yates
- Department of Molecular Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California, 92037, United States
| |
Collapse
|
39
|
Iacobucci C, Piotrowski C, Aebersold R, Amaral BC, Andrews P, Bernfur K, Borchers C, Brodie NI, Bruce JE, Cao Y, Chaignepain S, Chavez JD, Claverol S, Cox J, Davis T, Degliesposti G, Dong MQ, Edinger N, Emanuelsson C, Gay M, Götze M, Gomes-Neto F, Gozzo FC, Gutierrez C, Haupt C, Heck AJR, Herzog F, Huang L, Hoopmann MR, Kalisman N, Klykov O, Kukačka Z, Liu F, MacCoss MJ, Mechtler K, Mesika R, Moritz RL, Nagaraj N, Nesati V, Neves-Ferreira AGC, Ninnis R, Novák P, O’Reilly FJ, Pelzing M, Petrotchenko E, Piersimoni L, Plasencia M, Pukala T, Rand KD, Rappsilber J, Reichmann D, Sailer C, Sarnowski CP, Scheltema RA, Schmidt C, Schriemer DC, Shi Y, Skehel JM, Slavin M, Sobott F, Solis-Mezarino V, Stephanowitz H, Stengel F, Stieger CE, Trabjerg E, Trnka M, Vilaseca M, Viner R, Xiang Y, Yilmaz S, Zelter A, Ziemianowicz D, Leitner A, Sinz A. First Community-Wide, Comparative Cross-Linking Mass Spectrometry Study. Anal Chem 2019; 91:6953-6961. [PMID: 31045356 PMCID: PMC6625963 DOI: 10.1021/acs.analchem.9b00658] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The number of publications in the field of chemical cross-linking combined with mass spectrometry (XL-MS) to derive constraints for protein three-dimensional structure modeling and to probe protein-protein interactions has increased during the last years. As the technique is now becoming routine for in vitro and in vivo applications in proteomics and structural biology there is a pressing need to define protocols as well as data analysis and reporting formats. Such consensus formats should become accepted in the field and be shown to lead to reproducible results. This first, community-based harmonization study on XL-MS is based on the results of 32 groups participating worldwide. The aim of this paper is to summarize the status quo of XL-MS and to compare and evaluate existing cross-linking strategies. Our study therefore builds the framework for establishing best practice guidelines to conduct cross-linking experiments, perform data analysis, and define reporting formats with the ultimate goal of assisting scientists to generate accurate and reproducible XL-MS results.
Collapse
Affiliation(s)
- Claudio Iacobucci
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute
of Pharmacy, Charles Tanford Protein Center, Martin Luther University
Halle-Wittenberg, Kurt-Mothes-Strasse 3a, 06120 Halle/Saale, Germany
| | - Christine Piotrowski
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute
of Pharmacy, Charles Tanford Protein Center, Martin Luther University
Halle-Wittenberg, Kurt-Mothes-Strasse 3a, 06120 Halle/Saale, Germany
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH
Zurich, Otto-Stern-Weg 3, 8093 Zurich, Switzerland
- Faculty of Science, University of Zurich, 8006 Zurich,
Switzerland
| | - Bruno C. Amaral
- Institute of Chemistry, University of Campinas, Campinas São
Paulo 13083-970, Brazil
| | - Philip Andrews
- Departments of Biological Chemistry, Bioinformatics, and Chemistry,
University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Katja Bernfur
- Department of Biochemistry and Structural Biology, Center for
Molecular Protein Science, Lund University, 221 00 Lund, Sweden
| | - Christoph Borchers
- University of Victoria–Genome British Columbia Proteomics
Centre, Vancouver Island Technology Park, Victoria, British Columbia V8Z 7X8,
Canada
- Department of Biochemistry and Microbiology, University of Victoria,
Petch Building, Room 270d, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2,
Canada
- Gerald Bronfman Department of Oncology, Jewish General Hospital,
McGill University, 3755 Côte Ste-Catherine Road, Montréal, Quebec H3T
1E2, Canada
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish
General Hospital, McGill University, 3755 Côte Ste-Catherine Road,
Montréal, Quebec H3T 1E2, Canada
| | - Nicolas I. Brodie
- University of Victoria–Genome British Columbia Proteomics
Centre, Vancouver Island Technology Park, Victoria, British Columbia V8Z 7X8,
Canada
| | - James E. Bruce
- Department of Genome Sciences, University of Washington, Seattle,
Washington 98195, United States
| | - Yong Cao
- National Institute of Biological Sciences, Beijing 7 Science Park
Road, ZGC Life Science Park, 102206 Beijing, China
| | - Stéphane Chaignepain
- CBMN, UMR 5248, CNRS, Université de Bordeaux, INP Bordeaux,
Pessac 33607, France
| | - Juan D. Chavez
- Department of Genome Sciences, University of Washington, Seattle,
Washington 98195, United States
| | - Stéphane Claverol
- Centre de Génomique Fonctionnelle, Plateforme
Protéome, Université de Bordeaux, Bordeaux33000, France
| | - Jürgen Cox
- Computational Systems Biochemistry Research Group,
Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried,
Germany
| | - Trisha Davis
- Department of Biochemistry, University of Washington, Seattle,
Washington 98195, United States
| | - Gianluca Degliesposti
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus,
Francis Crick Avenue, Cambridge CB2 0QH, U.K
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing 7 Science Park
Road, ZGC Life Science Park, 102206 Beijing, China
| | - Nufar Edinger
- Department of Biological Chemistry, The Alexander Silberman
Institute of Life Sciences, Safra Campus Givat Ram, The Hebrew University of
Jerusalem, Jerusalem 91904, Israel
| | - Cecilia Emanuelsson
- Department of Biochemistry and Structural Biology, Center for
Molecular Protein Science, Lund University, 221 00 Lund, Sweden
| | - Marina Gay
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona
Institute of Science and Technology (BIST), Baldiri Reixac 10, 08028 Barcelona,
Spain
| | - Michael Götze
- Institute for Biochemistry and Biotechnology, Charles Tanford
Protein Center, Martin Luther University Halle-Wittenberg, Kurt-Mothes-Strasse 3a,
06120 Halle/Saale, Germany
| | - Francisco Gomes-Neto
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Avenida
Brasil 4365 (Moorish Castle), Manguinhos, Rio de Janeiro, Rio de Janeiro 21040-900,
Brazil
| | - Fabio C. Gozzo
- Institute of Chemistry, University of Campinas, Campinas São
Paulo 13083-970, Brazil
| | - Craig Gutierrez
- Department of Physiology & Biophysics, University of
California, Irvine, California 92697, United States
| | - Caroline Haupt
- Interdisciplinary Research Center HALOmem, Institute for
Biochemistry and Biotechnology, Charles Tanford Protein Center, Martin Luther
University Halle-Wittenberg, Kurt-Mothes-Strasse 3a, 06120 Halle/Saale,
Germany
| | - Albert J. R. Heck
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for
Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University
of Utrecht and Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The
Netherlands
| | - Franz Herzog
- Gene Center Munich, Department of Biochemistry, Faculty of Chemistry
and Pharmacy, Ludwig Maximilians University of Munich, Feodor-Lynen-Strasse 25,
81377 Munich, Germany
| | - Lan Huang
- Department of Physiology & Biophysics, University of
California, Irvine, California 92697, United States
| | - Michael R. Hoopmann
- Institute for Systems Biology, 401 Terry Avenue North, Seattle,
Washington 98109, United States
| | - Nir Kalisman
- Department of Biological Chemistry, The Alexander Silberman
Institute of Life Sciences, Safra Campus Givat Ram, The Hebrew University of
Jerusalem, Jerusalem 91904, Israel
| | - Oleg Klykov
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for
Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University
of Utrecht and Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The
Netherlands
| | - Zdeněk Kukačka
- Institute of Microbiology, BIOCEV, Prumyslova 595, 252 50 Vestec,
Czech Republic
| | - Fan Liu
- Leibniz Institute of Molecular Pharmacology (FMP),
Robert-Rössle-Strasse 10, 13125 Berlin, Germany
| | - Michael J. MacCoss
- Department of Genome Sciences, University of Washington, Seattle,
Washington 98195, United States
| | - Karl Mechtler
- Protein Chemistry Facility, Research Institute of Molecular
Pathology (IMP) and Institute of Molecular Biotechnology (IMBA), Vienna Biocenter
(VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Ravit Mesika
- Department of Biological Chemistry, The Alexander Silberman
Institute of Life Sciences, Safra Campus Givat Ram, The Hebrew University of
Jerusalem, Jerusalem 91904, Israel
| | - Robert L. Moritz
- Institute for Systems Biology, 401 Terry Avenue North, Seattle,
Washington 98109, United States
| | - Nagarjuna Nagaraj
- Biochemistry Core Facility, Max-Planck-Institute of Biochemistry,
Am Klopferspitz 18, 82152 Martinsried, Germany
| | - Victor Nesati
- Analytical Biochemistry, CSL Limited, Bio21 Institute, 30
Flemington Road, 3010 Parkville, Melbourne, Australia
| | - Ana G. C. Neves-Ferreira
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Avenida
Brasil 4365 (Moorish Castle), Manguinhos, Rio de Janeiro, Rio de Janeiro 21040-900,
Brazil
| | - Robert Ninnis
- Analytical Biochemistry, CSL Limited, Bio21 Institute, 30
Flemington Road, 3010 Parkville, Melbourne, Australia
| | - Petr Novák
- Institute of Microbiology, BIOCEV, Prumyslova 595, 252 50 Vestec,
Czech Republic
| | - Francis J. O’Reilly
- Chair of Bioanalytics, Institute of Biotechnology Technische
Universität Berlin, 13355 Berlin, Germany
| | - Matthias Pelzing
- Analytical Biochemistry, CSL Limited, Bio21 Institute, 30
Flemington Road, 3010 Parkville, Melbourne, Australia
| | - Evgeniy Petrotchenko
- University of Victoria–Genome British Columbia Proteomics
Centre, Vancouver Island Technology Park, Victoria, British Columbia V8Z 7X8,
Canada
| | - Lolita Piersimoni
- Departments of Biological Chemistry, Bioinformatics, and Chemistry,
University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Manolo Plasencia
- Departments of Biological Chemistry, Bioinformatics, and Chemistry,
University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Tara Pukala
- Discipline of Chemistry, Faculty of Sciences, University of
Adelaide, North Terrace, Adelaide, South Australia 5005, Australia
| | - Kasper D. Rand
- Department of Pharmacy, University of Copenhagen, 2100 Copenhagen,
Denmark
| | - Juri Rappsilber
- Chair of Bioanalytics, Institute of Biotechnology Technische
Universität Berlin, 13355 Berlin, Germany
- Wellcome Trust Centre for Cell Biology, School of Biological
Sciences, University of Edinburgh, EH9 3BF Edinburgh, U.K
| | - Dana Reichmann
- Department of Biological Chemistry, The Alexander Silberman
Institute of Life Sciences, Safra Campus Givat Ram, The Hebrew University of
Jerusalem, Jerusalem 91904, Israel
| | - Carolin Sailer
- University of Konstanz, Department of Biology,
Universitätsstrasse 10, 78457 Konstanz, Germany
| | - Chris P. Sarnowski
- Department of Biology, Institute of Molecular Systems Biology, ETH
Zurich, Otto-Stern-Weg 3, 8093 Zurich, Switzerland
- PhD Program in Systems Biology, University of Zurich and ETH
Zurich, 8092 Zurich, Switzerland
| | - Richard A. Scheltema
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for
Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University
of Utrecht and Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The
Netherlands
| | - Carla Schmidt
- Interdisciplinary Research Center HALOmem, Institute for
Biochemistry and Biotechnology, Charles Tanford Protein Center, Martin Luther
University Halle-Wittenberg, Kurt-Mothes-Strasse 3a, 06120 Halle/Saale,
Germany
| | - David C. Schriemer
- Department of Biochemistry & Molecular Biology, Robson DNA
Science Centre, University of Calgary, 3330 Hospital Drive North West, Calgary,
Alberta T2N 4N1, Canada
| | - Yi Shi
- Department of Cell Biology, University of Pittsburgh, School of
Medicine, Pittsburgh, Pennsylvania 15213, United States
| | - J. Mark Skehel
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus,
Francis Crick Avenue, Cambridge CB2 0QH, U.K
| | - Moriya Slavin
- Department of Biological Chemistry, The Alexander Silberman
Institute of Life Sciences, Safra Campus Givat Ram, The Hebrew University of
Jerusalem, Jerusalem 91904, Israel
| | - Frank Sobott
- Department of Chemistry, University of Antwerp, Groenenborgerlaan
171, 2020 Antwerp, Belgium
- The Astbury Centre for Structural Molecular Biology and School of
Molecular and Cellular Biology, University of Leeds, LS2 9JT Leeds, U.K
| | - Victor Solis-Mezarino
- Gene Center Munich, Department of Biochemistry, Faculty of Chemistry
and Pharmacy, Ludwig Maximilians University of Munich, Feodor-Lynen-Strasse 25,
81377 Munich, Germany
| | - Heike Stephanowitz
- Leibniz Institute of Molecular Pharmacology (FMP),
Robert-Rössle-Strasse 10, 13125 Berlin, Germany
| | - Florian Stengel
- University of Konstanz, Department of Biology,
Universitätsstrasse 10, 78457 Konstanz, Germany
| | - Christian E. Stieger
- Protein Chemistry Facility, Research Institute of Molecular
Pathology (IMP) and Institute of Molecular Biotechnology (IMBA), Vienna Biocenter
(VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Esben Trabjerg
- Department of Pharmacy, University of Copenhagen, 2100 Copenhagen,
Denmark
| | - Michael Trnka
- UCSF Mass Spectrometry Facility, Genentech Hall, 600 16th Street,
San Francisco, California 94158, United States
| | - Marta Vilaseca
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona
Institute of Science and Technology (BIST), Baldiri Reixac 10, 08028 Barcelona,
Spain
| | - Rosa Viner
- Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose,
California 95134, United States
| | - Yufei Xiang
- Department of Cell Biology, University of Pittsburgh, School of
Medicine, Pittsburgh, Pennsylvania 15213, United States
| | - Sule Yilmaz
- Computational Systems Biochemistry Research Group,
Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried,
Germany
| | - Alex Zelter
- Department of Biochemistry, University of Washington, Seattle,
Washington 98195, United States
| | - Daniel Ziemianowicz
- Department of Biochemistry & Molecular Biology, Robson DNA
Science Centre, University of Calgary, 3330 Hospital Drive North West, Calgary,
Alberta T2N 4N1, Canada
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH
Zurich, Otto-Stern-Weg 3, 8093 Zurich, Switzerland
| | - Andrea Sinz
- Department of Pharmaceutical Chemistry and Bioanalytics, Institute
of Pharmacy, Charles Tanford Protein Center, Martin Luther University
Halle-Wittenberg, Kurt-Mothes-Strasse 3a, 06120 Halle/Saale, Germany
| |
Collapse
|
40
|
Binz PA, Shofstahl J, Vizcaíno JA, Barsnes H, Chalkley RJ, Menschaert G, Alpi E, Clauser K, Eng JK, Lane L, Seymour SL, Sánchez LFH, Mayer G, Eisenacher M, Perez-Riverol Y, Kapp EA, Mendoza L, Baker PR, Collins A, Van Den Bossche T, Deutsch EW. Proteomics Standards Initiative Extended FASTA Format. J Proteome Res 2019; 18:2686-2692. [PMID: 31081335 DOI: 10.1021/acs.jproteome.9b00064] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .
Collapse
Affiliation(s)
- Pierre-Alain Binz
- CHUV Centre Hospitalier Universitaire Vaudois , CH-1011 Lausanne 14 , Switzerland
| | - Jim Shofstahl
- Thermo Fisher Scientific , 355 River Oaks Parkway , San Jose , California 95134 , United States
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Harald Barsnes
- Proteomics Unit, Department of Biomedicine , University of Bergen , N-5009 Bergen , Norway.,Computational Biology Unit, Department of Informatics , University of Bergen , N-5008 Bergen , Norway
| | - Robert J Chalkley
- University California at San Francisco , San Francisco , California 94143 , United States
| | - Gerben Menschaert
- Biobix, Department of Data Analysis and Mathematical Modelling , Ghent University , 9000 Ghent , Belgium
| | - Emanuele Alpi
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Karl Clauser
- Broad Institute , Cambridge , Massachusetts 02142 , United States
| | - Jimmy K Eng
- University of Washington , Seattle , Washington 98195 , United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics , CH-1211 Geneva 4 , Switzerland.,Department of Microbiology and Molecular Medicine, Faculty of Medicine , University of Geneva , CH-1211 Geneva 4 , Switzerland
| | - Sean L Seymour
- Seymour Data Science, LLC , San Francisco , California 95000 , United States
| | - Luis Francisco Hernández Sánchez
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science , University of Bergen , 5021 Bergen , Norway.,Center for Medical Genetics and Molecular Medicine , Haukeland University Hospital , 5021 Bergen , Norway
| | - Gerhard Mayer
- Medical Faculty, Medizinisches Proteom-Center , Ruhr University Bochum , D-44801 Bochum , Germany
| | - Martin Eisenacher
- Medical Faculty, Medizinisches Proteom-Center , Ruhr University Bochum , D-44801 Bochum , Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Eugene A Kapp
- Walter & Eliza Hall Institute of Medical Research and the University of Melbourne , Melbourne , VIC 3052 , Australia
| | - Luis Mendoza
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| | - Peter R Baker
- University California at San Francisco , San Francisco , California 94143 , United States
| | - Andrew Collins
- Department of Functional and Comparative Genomics, Institute of Integrated Biology , University of Liverpool , Liverpool L69 7ZB , United Kingdom
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology , Ghent University , 9000 Ghent , Belgium
| | - Eric W Deutsch
- Institute for Systems Biology , Seattle , Washington 98109 , United States
| |
Collapse
|
41
|
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019; 16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Tim Van Den Bossche
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| | - Stephan Fuchs
- d FG13 Division of Nosocomial Pathogens and Antibiotic Resistances , Robert Koch Institute , Wernigerode , Germany
| | - Bernhard Y Renard
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Thilo Muth
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Lennart Martens
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| |
Collapse
|
42
|
Klein J, Zaia J. psims - A Declarative Writer for mzML and mzIdentML for Python. Mol Cell Proteomics 2019; 18:571-575. [PMID: 30563850 PMCID: PMC6398200 DOI: 10.1074/mcp.rp118.001070] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 12/12/2018] [Indexed: 01/04/2023] Open
Abstract
mzML and mzIdentML are commonly used, powerful tools for representing mass spectrometry data and derived identification information. These formats are complex, requiring non-trivial logic to translate data into the appropriate representation. Most published implementations are tightly coupled to data structures. The most complete implementations are written in compiled languages that cannot expose the complete flexibility of the implementation to external programs or bindings. To our knowledge, there are no complete implementations for mzML or mzIdentML available to scripting languages like Python or R. We present psims, a library written in Python for writing mzML and mzIdentML. The library allows writing either XML format using built-in Python data structures. It includes a controlled vocabulary resolution system to simplify the encoding process and an identity tracking system to manage entity relationships. The source code is available at https://github.com/mobiusklein/psims, and through the Python Package Index as psims, licensed under the Apache 2 common license.
Collapse
Affiliation(s)
- Joshua Klein
- From the ‡Program for Bioinformatics, Boston University, Boston, Massachusetts 02215
| | - Joseph Zaia
- From the ‡Program for Bioinformatics, Boston University, Boston, Massachusetts 02215;
- §Department of Biochemistry, Boston University, Boston, Massachusetts 02118
| |
Collapse
|
43
|
Levitsky LI, Klein JA, Ivanov MV, Gorshkov MV. Pyteomics 4.0: Five Years of Development of a Python Proteomics Framework. J Proteome Res 2019; 18:709-714. [PMID: 30576148 DOI: 10.1021/acs.jproteome.8b00717] [Citation(s) in RCA: 106] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Many of the novel ideas that drive today's proteomic technologies are focused essentially on experimental or data-processing workflows. The latter are implemented and published in a number of ways, from custom scripts and programs, to projects built using general-purpose or specialized workflow engines; a large part of routine data processing is performed manually or with custom scripts that remain unpublished. Facilitating the development of reproducible data-processing workflows becomes essential for increasing the efficiency of proteomic research. To assist in overcoming the bioinformatics challenges in the daily practice of proteomic laboratories, 5 years ago we developed and announced Pyteomics, a freely available open-source library providing Python interfaces to proteomic data. We summarize the new functionality of Pyteomics developed during the time since its introduction.
Collapse
Affiliation(s)
- Lev I Levitsky
- Moscow Institute of Physics and Technology , Dolgoprudny, Moscow Region 141701 , Russia.,V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , Moscow 119334 , Russia
| | - Joshua A Klein
- Bioinformatics Program , Boston University , Boston , Massachusetts 02215 , United States
| | - Mark V Ivanov
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , Moscow 119334 , Russia
| | - Mikhail V Gorshkov
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , Moscow 119334 , Russia
| |
Collapse
|
44
|
Ren Z, Qi D, Pugh N, Li K, Wen B, Zhou R, Xu S, Liu S, Jones AR. Improvements to the Rice Genome Annotation Through Large-Scale Analysis of RNA-Seq and Proteomics Data Sets. Mol Cell Proteomics 2019; 18:86-98. [PMID: 30293062 PMCID: PMC6317475 DOI: 10.1074/mcp.ra118.000832] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 08/31/2018] [Indexed: 01/22/2023] Open
Abstract
Rice (Oryza sativa) is one of the most important worldwide crops. The genome has been available for over 10 years and has undergone several rounds of annotation. We created a comprehensive database of transcripts from 29 public RNA sequencing data sets, officially predicted genes from Ensembl plants, and common contaminants in which to search for protein-level evidence. We re-analyzed nine publicly accessible rice proteomics data sets. In total, we identified 420K peptide spectrum matches from 47K peptides and 8,187 protein groups. 4168 peptides were initially classed as putative novel peptides (not matching official genes). Following a strict filtration scheme to rule out other possible explanations, we discovered 1,584 high confidence novel peptides. The novel peptides were clustered into 692 genomic loci where our results suggest annotation improvements. 80% of the novel peptides had an ortholog match in the curated protein sequence set from at least one other plant species. For the peptides clustering in intergenic regions (and thus potentially new genes), 101 loci were identified, for which 43 had a high-confidence hit for a protein domain. Our results can be displayed as tracks on the Ensembl genome or other browsers supporting Track Hubs, to support re-annotation of the rice genome.
Collapse
Affiliation(s)
- Zhe Ren
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Da Qi
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Nina Pugh
- §Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK
| | - Kai Li
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Bo Wen
- ‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030;; ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030
| | - Ruo Zhou
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Shaohang Xu
- From the ‡BGI-Shenzhen, Shenzhen 518083, China
| | - Siqi Liu
- From the ‡BGI-Shenzhen, Shenzhen 518083, China;.
| | - Andrew R Jones
- §Institute of Integrative Biology, University of Liverpool, Liverpool, L69 7ZB, UK;.
| |
Collapse
|
45
|
Uszkoreit J, Perez-Riverol Y, Eggers B, Marcus K, Eisenacher M. Protein Inference Using PIA Workflows and PSI Standard File Formats. J Proteome Res 2018; 18:741-747. [DOI: 10.1021/acs.jproteome.8b00723] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Julian Uszkoreit
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, Universitaetsstrasse 150, D-44801 Bochum, Germany
| | - Yasset Perez-Riverol
- EMBL Outstation,
European Bioinformatics Institute, Proteomics Services, Wellcome Trust Genome Campus,
Hinxton, Cambridge, United Kingdom
| | - Britta Eggers
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, Universitaetsstrasse 150, D-44801 Bochum, Germany
| | - Katrin Marcus
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, Universitaetsstrasse 150, D-44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, Universitaetsstrasse 150, D-44801 Bochum, Germany
| |
Collapse
|
46
|
Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology. Nat Struct Mol Biol 2018; 25:1000-1008. [PMID: 30374081 DOI: 10.1038/s41594-018-0147-0] [Citation(s) in RCA: 240] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 09/19/2018] [Indexed: 01/11/2023]
Abstract
Over the past decade, cross-linking mass spectrometry (CLMS) has developed into a robust and flexible tool that provides medium-resolution structural information. CLMS data provide a measure of the proximity of amino acid residues and thus offer information on the folds of proteins and the topology of their complexes. Here, we highlight notable successes of this technique as well as common pipelines. Novel CLMS applications, such as in-cell cross-linking, probing conformational changes and tertiary-structure determination, are now beginning to make contributions to molecular biology and the emerging fields of structural systems biology and interactomics.
Collapse
|
47
|
Martínez-Bartolomé S, Medina-Aunon JA, López-García MÁ, González-Tejedo C, Prieto G, Navajas R, Salazar-Donate E, Fernández-Costa C, Yates JR, Albar JP. PACOM: A Versatile Tool for Integrating, Filtering, Visualizing, and Comparing Multiple Large Mass Spectrometry Proteomics Data Sets. J Proteome Res 2018; 17:1547-1558. [DOI: 10.1021/acs.jproteome.7b00858] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Affiliation(s)
- Salvador Martínez-Bartolomé
- Proteomics Laboratory, National Center for Biotechnology, CSIC, Madrid 28049, Spain
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | | | | | | | - Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU), Bilbao 48013, Spain
| | - Rosana Navajas
- Proteomics Laboratory, National Center for Biotechnology, CSIC, Madrid 28049, Spain
| | | | - Carolina Fernández-Costa
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, United States
- Immunology, Centro de Investigaciones Biomédicas (CINBIO), Centro singular de Investigación de Galicia: Instituto de Investigación Sanitaria Galicia Sur (IIS-GS), University of Vigo, Campus Universitario, s/n, Vigo 36310, Spain
| | - John R. Yates
- Department of Chemical Physiology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Juan Pablo Albar
- Proteomics Laboratory, National Center for Biotechnology, CSIC, Madrid 28049, Spain
| |
Collapse
|
48
|
Abstract
The recent establishment of cloud computing, high-throughput networking, and more versatile web standards and browsers has led to a renewed interest in web-based applications. While traditionally big data has been the domain of optimized desktop and server applications, it is now possible to store vast amounts of data and perform the necessary calculations offsite in cloud storage and computing providers, with the results visualized in a high-quality cross-platform interface via a web browser. There are number of emerging platforms for cloud-based mass spectrometry data analysis; however, there is limited pre-existing code accessible to web developers, especially for those that are constrained to a shared hosting environment where Java and C applications are often forbidden from use by the hosting provider. To remedy this, we provide an open-source mass spectrometry library for one of the most commonly used web development languages, PHP. Our new library, phpMs, provides objects for storing and manipulating spectra and identification data as well as utilities for file reading, file writing, calculations, peptide fragmentation, and protein digestion as well as a software interface for controlling search engines. We provide a working demonstration of some of the capabilities at http://pgb.liv.ac.uk/phpMs .
Collapse
Affiliation(s)
- Andrew Collins
- Department of Functional and Comparative Genomics, Institute of Integrated Biology, University of Liverpool , Liverpool, L69 7ZB, United Kingdom
| | - Andrew R Jones
- Department of Functional and Comparative Genomics, Institute of Integrated Biology, University of Liverpool , Liverpool, L69 7ZB, United Kingdom
| |
Collapse
|
49
|
Muth T, Kohrs F, Heyer R, Benndorf D, Rapp E, Reichl U, Martens L, Renard BY. MPA Portable: A Stand-Alone Software Package for Analyzing Metaproteome Samples on the Go. Anal Chem 2017; 90:685-689. [PMID: 29215871 PMCID: PMC5757220 DOI: 10.1021/acs.analchem.7b03544] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
![]()
Metaproteomics,
the mass spectrometry-based analysis of proteins
from multispecies samples faces severe challenges concerning data
analysis and results interpretation. To overcome these shortcomings,
we here introduce the MetaProteomeAnalyzer (MPA) Portable software.
In contrast to the original server-based MPA application, this newly
developed tool no longer requires computational expertise for installation
and is now independent of any relational database system. In addition,
MPA Portable now supports state-of-the-art database search engines
and a convenient command line interface for high-performance data
processing tasks. While search engine results can easily be combined
to increase the protein identification yield, an additional two-step
workflow is implemented to provide sufficient analysis resolution
for further postprocessing steps, such as protein grouping as well
as taxonomic and functional annotation. Our new application has been
developed with a focus on intuitive usability, adherence to data standards,
and adaptation to Web-based workflow platforms. The open source software
package can be found at https://github.com/compomics/meta-proteome-analyzer.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute , 13353 Berlin, Germany
| | - Fabian Kohrs
- Bioprocess Engineering, Otto von Guericke University Magdeburg , 39106 Magdeburg, Germany
| | - Robert Heyer
- Bioprocess Engineering, Otto von Guericke University Magdeburg , 39106 Magdeburg, Germany
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University Magdeburg , 39106 Magdeburg, Germany.,Max Planck Institute for Dynamics of Complex Technical Systems , Bioprocess Engineering, 39106 Magdeburg, Germany
| | - Erdmann Rapp
- Max Planck Institute for Dynamics of Complex Technical Systems , Bioprocess Engineering, 39106 Magdeburg, Germany
| | - Udo Reichl
- Bioprocess Engineering, Otto von Guericke University Magdeburg , 39106 Magdeburg, Germany.,Max Planck Institute for Dynamics of Complex Technical Systems , Bioprocess Engineering, 39106 Magdeburg, Germany
| | - Lennart Martens
- Department of Biochemistry, Ghent University , 9000 Ghent, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB , 9000 Ghent, Belgium
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute , 13353 Berlin, Germany
| |
Collapse
|
50
|
Deutsch EW, Orchard S, Binz PA, Bittremieux W, Eisenacher M, Hermjakob H, Kawano S, Lam H, Mayer G, Menschaert G, Perez-Riverol Y, Salek RM, Tabb DL, Tenzer S, Vizcaíno JA, Walzer M, Jones AR. Proteomics Standards Initiative: Fifteen Years of Progress and Future Work. J Proteome Res 2017; 16:4288-4298. [PMID: 28849660 PMCID: PMC5715286 DOI: 10.1021/acs.jproteome.7b00370] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Indexed: 12/21/2022]
Abstract
The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community standards and software tools in the field of proteomics for 15 years. Under the guidance of the chair, cochairs, and other leadership positions, the PSI working groups are tasked with the development and maintenance of community standards via special workshops and ongoing work. Among the existing ratified standards, the PSI working groups continue to update PSI-MI XML, MITAB, mzML, mzIdentML, mzQuantML, mzTab, and the MIAPE (Minimum Information About a Proteomics Experiment) guidelines with the advance of new technologies and techniques. Furthermore, new standards are currently either in the final stages of completion (proBed and proBAM for proteogenomics results as well as PEFF) or in early stages of design (a spectral library standard format, a universal spectrum identifier, the qcML quality control format, and the Protein Expression Interface (PROXI) web services Application Programming Interface). In this work we review the current status of all of these aspects of the PSI, describe synergies with other efforts such as the ProteomeXchange Consortium, the Human Proteome Project, and the metabolomics community, and provide a look at future directions of the PSI.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Sandra Orchard
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Pierre-Alain Binz
- CHUV
Centre Hospitalier Universitaire Vaudois, 1011 Lausanne, Switzerland
| | - Wout Bittremieux
- Department
of Mathematics and Computer Science, University
of Antwerp, Middelheimlaan
1, 2020 Antwerp, Belgium
| | - Martin Eisenacher
- Medizinisches
Proteom Center (MPC), Ruhr-Universität
Bochum, D-44801 Bochum, Germany
| | - Henning Hermjakob
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing
Institute of Radiation Medicine, National
Center for Protein Sciences, Beijing, Beijing 102206, China
| | - Shin Kawano
- Database
Center for Life Science, Joint Support Center for Data Science Research,
Research Organization of Information and Systems, Kashiwa, Chiba 277-0871, Japan
| | - Henry Lam
- Division
of Biomedical Engineering, The Hong Kong
University of Science and Technology, Clear Water Bay, Hong Kong, P. R. China
- Department
of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, P. R. China
| | - Gerhard Mayer
- Medizinisches
Proteom Center (MPC), Ruhr-Universität
Bochum, D-44801 Bochum, Germany
| | - Gerben Menschaert
- Lab of Bioinformatics
and Computational Genomics (BioBix), Faculty of Bioscience Engineering, Ghent University, 9000 Ghent, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Reza M. Salek
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David L. Tabb
- SA
MRC Centre
for TB Research, DST/NRF Centre of Excellence for Biomedical TB Research,
Division of Molecular Biology and Human Genetics, Faculty of Medicine
and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Stefan Tenzer
- Institute
for Immunology, University Medical Center
of the Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Mathias Walzer
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Andrew R. Jones
- Institute
of Integrative Biology, University of Liverpool, South Wirral L64 4AY, United Kingdom
| |
Collapse
|