1
|
Rusconi F. Free Open Source Software for Protein and Peptide Mass Spectrometry- based Science. Curr Protein Pept Sci 2021; 22:134-147. [PMID: 33461461 DOI: 10.2174/1389203722666210118160946] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 10/12/2020] [Accepted: 01/04/2021] [Indexed: 12/28/2022]
Abstract
In the field of biology, and specifically in protein and peptide science, the power of mass spectrometry is that it is applicable to a vast spectrum of applications. Mass spectrometry can be applied to identify proteins and peptides in complex mixtures, to identify and locate post-translational modifications, to characterize the structure of proteins and peptides to the most detailed level or to detect protein-ligand non-covalent interactions. Thanks to the Free and Open Source Software (FOSS) movement, scientists have limitless opportunities to deepen their skills in software development to code software that solves mass spectrometric data analysis problems. After the conversion of raw data files into open standard format files, the entire spectrum of data analysis tasks can now be performed integrally on FOSS platforms, like GNU/Linux, and only with FOSS solutions. This review presents a brief history of mass spectrometry open file formats and goes on with the description of FOSS projects that are commonly used in protein and peptide mass spectrometry fields of endeavor: identification projects that involve mostly automated pipelines, like proteomics and peptidomics, and bio-structural characterization projects that most often involve manual scrutiny of the mass data. Projects of the last kind usually involve software that allows the user to delve into the mass data in an interactive graphics-oriented manner. Software projects are thus categorized on the basis of these criteria: software libraries for software developers vs desktop-based graphical user interface, software for the end-user and automated pipeline-based data processing vs interactive graphics-based mass data scrutiny.
Collapse
Affiliation(s)
- Filippo Rusconi
- PAPPSO, Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| |
Collapse
|
2
|
Deutsch EW, Albar JP, Binz PA, Eisenacher M, Jones AR, Mayer G, Omenn GS, Orchard S, Vizcaíno JA, Hermjakob H. Development of data representation standards by the human proteome organization proteomics standards initiative. J Am Med Inform Assoc 2015; 22:495-506. [PMID: 25726569 PMCID: PMC4457114 DOI: 10.1093/jamia/ocv001] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Revised: 09/29/2014] [Accepted: 01/05/2015] [Indexed: 11/22/2022] Open
Abstract
OBJECTIVE To describe the goals of the Proteomics Standards Initiative (PSI) of the Human Proteome Organization, the methods that the PSI has employed to create data standards, the resulting output of the PSI, lessons learned from the PSI's evolution, and future directions and synergies for the group. MATERIALS AND METHODS The PSI has 5 categories of deliverables that have guided the group. These are minimum information guidelines, data formats, controlled vocabularies, resources and software tools, and dissemination activities. These deliverables are produced via the leadership and working group organization of the initiative, driven by frequent workshops and ongoing communication within the working groups. Official standards are subjected to a rigorous document process that includes several levels of peer review prior to release. RESULTS We have produced and published minimum information guidelines describing what information should be provided when making data public, either via public repositories or other means. The PSI has produced a series of standard formats covering mass spectrometer input, mass spectrometer output, results of informatics analysis (both qualitative and quantitative analyses), reports of molecular interaction data, and gel electrophoresis analyses. We have produced controlled vocabularies that ensure that concepts are uniformly annotated in the formats and engaged in extensive software development and dissemination efforts so that the standards can efficiently be used by the community.Conclusion In its first dozen years of operation, the PSI has produced many standards that have accelerated the field of proteomics by facilitating data exchange and deposition to data repositories. We look to the future to continue developing standards for new proteomics technologies and workflows and mechanisms for integration with other omics data types. Our products facilitate the translation of genomics and proteomics findings to clinical and biological phenotypes. The PSI website can be accessed at http://www.psidev.info.
Collapse
Affiliation(s)
| | - Juan Pablo Albar
- Died July 18, 2014 Proteomics Facility, Centro Nacional de Biotecnología - CSIC, Madrid, Spain ProteoRed Consortium, Spanish National Institute of Proteomics, Madrid, Spain
| | - Pierre-Alain Binz
- CHUV Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Martin Eisenacher
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, Bochum, Germany
| | - Andrew R Jones
- Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Gerhard Mayer
- Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, Bochum, Germany
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, USA Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, USA
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
3
|
Martínez-Bartolomé S, Binz PA, Albar JP. The Minimal Information about a Proteomics Experiment (MIAPE) from the Proteomics Standards Initiative. Methods Mol Biol 2014; 1072:765-80. [PMID: 24136562 DOI: 10.1007/978-1-62703-631-3_53] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
During the last 10 years, the Proteomics Standards Initiative from the Human Proteome Organization (HUPO-PSI) has worked on defining standards for proteomics data representation as well as guidelines that state the minimum information that should be included when reporting a proteomics experiment (MIAPE). Such minimum information must describe the complete experiment, including both experimental protocols and data processing methods, allowing a critical evaluation of the whole process and the potential recreation of the work. In this chapter we describe the standardization work performed by the HUPO-PSI, and then we concentrate on the MIAPE guidelines, highlighting its importance when publishing proteomics experiments particularly in specialized proteomics journals. Finally, we describe existing bioinformatics resources that generate MIAPE compliant reports or that check proteomics data files for MIAPE compliance.
Collapse
|
4
|
Walzer M, Qi D, Mayer G, Uszkoreit J, Eisenacher M, Sachsenberg T, Gonzalez-Galarza FF, Fan J, Bessant C, Deutsch EW, Reisinger F, Vizcaíno JA, Medina-Aunon JA, Albar JP, Kohlbacher O, Jones AR. The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 2013; 12:2332-40. [PMID: 23599424 PMCID: PMC3734589 DOI: 10.1074/mcp.o113.028506] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The range of heterogeneous approaches available for quantifying protein abundance via mass spectrometry (MS)1 leads to considerable challenges in modeling, archiving, exchanging, or submitting experimental data sets as supplemental material to journals. To date, there has been no widely accepted format for capturing the evidence trail of how quantitative analysis has been performed by software, for transferring data between software packages, or for submitting to public databases. In the context of the Proteomics Standards Initiative, we have developed the mzQuantML data standard. The standard can represent quantitative data about regions in two-dimensional retention time versus mass/charge space (called features), peptides, and proteins and protein groups (where there is ambiguity regarding peptide-to-protein inference), and it offers limited support for small molecule (metabolomic) data. The format has structures for representing replicate MS runs, grouping of replicates (for example, as study variables), and capturing the parameters used by software packages to arrive at these values. The format has the capability to reference other standards such as mzML and mzIdentML, and thus the evidence trail for the MS workflow as a whole can now be described. Several software implementations are available, and we encourage other bioinformatics groups to use mzQuantML as an input, internal, or output format for quantitative software and for structuring local repositories. All project resources are available in the public domain from the HUPO Proteomics Standards Initiative http://www.psidev.info/mzquantml.
Collapse
Affiliation(s)
- Mathias Walzer
- Quantitative Biology Center and Department of Computer Science, Center for Bioinformatics, University of Tübingen, Sand 14, 72076 Tübingen, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Medina-Aunon JA, Krishna R, Ghali F, Albar JP, Jones AJ. A guide for integration of proteomic data standards into laboratory workflows. Proteomics 2013; 13:480-92. [DOI: 10.1002/pmic.201200268] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Revised: 08/14/2012] [Accepted: 09/10/2012] [Indexed: 01/28/2023]
Affiliation(s)
| | - Ritesh Krishna
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| | - Fawaz Ghali
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| | - Juan P. Albar
- Centro Nacional de Biotecnología; CSIC; Madrid; Spain
| | - Andrew J. Jones
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| |
Collapse
|
6
|
Deutsch EW. File formats commonly used in mass spectrometry proteomics. Mol Cell Proteomics 2012; 11:1612-21. [PMID: 22956731 PMCID: PMC3518119 DOI: 10.1074/mcp.r112.019695] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Revised: 08/06/2012] [Indexed: 11/06/2022] Open
Abstract
The application of mass spectrometry (MS) to the analysis of proteomes has enabled the high-throughput identification and abundance measurement of hundreds to thousands of proteins per experiment. However, the formidable informatics challenge associated with analyzing MS data has required a wide variety of data file formats to encode the complex data types associated with MS workflows. These formats encompass the encoding of input instruction for instruments, output products of the instruments, and several levels of information and results used by and produced by the informatics analysis tools. A brief overview of the most common file formats in use today is presented here, along with a discussion of related topics.
Collapse
|
7
|
Hoekman B, Breitling R, Suits F, Bischoff R, Horvatovich P. msCompare: a framework for quantitative analysis of label-free LC-MS data for comparative candidate biomarker studies. Mol Cell Proteomics 2012; 11:M111.015974. [PMID: 22318370 DOI: 10.1074/mcp.m111.015974] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets.
Collapse
Affiliation(s)
- Berend Hoekman
- Department of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands
| | | | | | | | | |
Collapse
|
8
|
Utility of gel-free, label-free shotgun proteomics approaches to investigate microorganisms. Appl Microbiol Biotechnol 2011; 90:407-16. [DOI: 10.1007/s00253-011-3172-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Revised: 02/03/2011] [Accepted: 02/04/2011] [Indexed: 10/18/2022]
|
9
|
Ahmad I, Suits F, Hoekman B, Swertz MA, Byelas H, Dijkstra M, Hooft R, Katsubo D, van Breukelen B, Bischoff R, Horvatovich P. A high-throughput processing service for retention time alignment of complex proteomics and metabolomics LC-MS data. ACTA ACUST UNITED AC 2011; 27:1176-8. [PMID: 21349866 DOI: 10.1093/bioinformatics/btr094] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED Warp2D is a novel time alignment approach, which uses the overlapping peak volume of the reference and sample peak lists to correct misleading peak shifts. Here, we present an easy-to-use web interface for high-throughput Warp2D batch processing time alignment service using the Dutch Life Science Grid, reducing processing time from days to hours. This service provides the warping function, the sample chromatogram peak list with adjusted retention times and normalized quality scores based on the sum of overlapping peak volume of all peaks. Heat maps before and after time alignment are created from the arithmetic mean of the sum of overlapping peak area rearranged with hierarchical clustering, allowing the quality control of the time alignment procedure. Taverna workflow and command line tool are provided for remote processing of local user data. AVAILABILITY online data processing service is available at http://www.nbpp.nl/warp2d.html. Taverna workflow is available at myExperiment with title '2D Time Alignment-Webservice and Workflow' at http://www.myexperiment.org/workflows/1283.html. Command line tool is available at http://www.nbpp.nl/Warp2D_commandline.zip. CONTACT p.l.horvatovich@rug.nl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Isthiaq Ahmad
- Analytical Biochemistry, Department of Pharmacy, University of Groningen, Groningen, The Netherlands
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Rezeli M, Végvári Á, Fehniger TE, Laurell T, Marko-Varga G. Moving towards high density clinical signature studies with a human proteome catalogue developing multiplexing mass spectrometry assay panels. J Clin Bioinforma 2011; 1:7. [PMID: 21884626 PMCID: PMC3164614 DOI: 10.1186/2043-9113-1-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Accepted: 02/08/2011] [Indexed: 11/10/2022] Open
Abstract
A perspective overview is given describing the current development of multiplex mass spectrometry assay technology platforms utilized for high throughput clinical sample analysis. The development of targeted therapies with novel personalized medicine drugs will require new tools for monitoring efficacy and outcome that will rely on both the quantification of disease progression related biomarkers as well as the measurement of disease specific pathway/signaling proteins.The bioinformatics developments play a key central role in the area of clinical proteomics where targeted peptide expressions in health and disease are investigated in small-, medium- and large-scaled clinical studies.An outline is presented describing applications of the selected reaction monitoring (SRM) mass spectrometry assay principle. This assay form enables the simultaneous description of multiple protein biomarkers and is an area under a fast and progressive development throughout the community. The Human Proteome Organization, HUPO, recently launched the Human Proteome Project (HPP) that will map the organization of proteins on specific chromosomes, on a chromosome-by-chromosome basis utilizing the SRM technology platform. Specific examples of an SRM-multiplex quantitative assay platform dedicated to the cardiovascular disease area, screening Apo A1, Apo A4, Apo B, Apo CI, Apo CII, Apo CIII, Apo D, Apo E, Apo H, and CRP biomarkers used in daily diagnosis routines in clinical hospitals globally, are presented. We also provide data on prostate cancer studies that have identified a variety of PSA isoforms characterized by high-resolution separation interfaced to mass spectrometry.
Collapse
Affiliation(s)
- Melinda Rezeli
- Div. Clinical Protein Science & Imaging, Biomedical Center, Dept. of Measurement Technology and Industrial Electrical Engineering, Lund University, BMC C13, SE-221 84 Lund, Sweden
| | - Ákos Végvári
- Div. Clinical Protein Science & Imaging, Biomedical Center, Dept. of Measurement Technology and Industrial Electrical Engineering, Lund University, BMC C13, SE-221 84 Lund, Sweden
| | - Thomas E Fehniger
- Div. Clinical Protein Science & Imaging, Biomedical Center, Dept. of Measurement Technology and Industrial Electrical Engineering, Lund University, BMC C13, SE-221 84 Lund, Sweden
- Institute of Clinical Medicine, Tallinn University of Technology, Akadeemia tee 15, 12618 Tallinn, Estonia
| | - Thomas Laurell
- Div. Clinical Protein Science & Imaging, Biomedical Center, Dept. of Measurement Technology and Industrial Electrical Engineering, Lund University, BMC C13, SE-221 84 Lund, Sweden
| | - György Marko-Varga
- Div. Clinical Protein Science & Imaging, Biomedical Center, Dept. of Measurement Technology and Industrial Electrical Engineering, Lund University, BMC C13, SE-221 84 Lund, Sweden
- First Department of Surgery, Tokyo Medical University, 6-7-1 Nishishinjiku Shinjiku-ku, Tokyo, 160-0023 Japan
| |
Collapse
|