1
|
Révész Á, Hevér H, Steckel A, Schlosser G, Szabó D, Vékey K, Drahos L. Collision energies: Optimization strategies for bottom-up proteomics. MASS SPECTROMETRY REVIEWS 2023; 42:1261-1299. [PMID: 34859467 DOI: 10.1002/mas.21763] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 06/07/2023]
Abstract
Mass-spectrometry coupled to liquid chromatography is an indispensable tool in the field of proteomics. In the last decades, more and more complex and diverse biochemical and biomedical questions have arisen. Problems to be solved involve protein identification, quantitative analysis, screening of low abundance modifications, handling matrix effect, and concentrations differing by orders of magnitude. This led the development of more tailored protocols and problem centered proteomics workflows, including advanced choice of experimental parameters. In the most widespread bottom-up approach, the choice of collision energy in tandem mass spectrometric experiments has outstanding role. This review presents the collision energy optimization strategies in the field of proteomics which can help fully exploit the potential of MS based proteomics techniques. A systematic collection of use case studies is then presented to serve as a starting point for related further scientific work. Finally, this article discusses the issue of comparing results from different studies or obtained on different instruments, and it gives some hints on methodology transfer between laboratories based on measurement of reference species.
Collapse
Affiliation(s)
- Ágnes Révész
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - Helga Hevér
- Chemical Works of Gedeon Richter Plc, Budapest, Hungary
| | - Arnold Steckel
- Department of Analytical Chemistry, MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Gitta Schlosser
- Department of Analytical Chemistry, MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Dániel Szabó
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - Károly Vékey
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - László Drahos
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
2
|
Marissen R, Palmblad M. mzRecal: universal MS1 recalibration in mzML using identified peptides in mzIdentML as internal calibrants. Bioinformatics 2021; 37:2768-2769. [PMID: 33538780 DOI: 10.1093/bioinformatics/btab056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 12/31/2020] [Accepted: 01/26/2021] [Indexed: 11/13/2022] Open
Abstract
SUMMARY In mass spectrometry-based proteomics, accurate peptide masses improve identifications, alignment and quantitation. Getting the most out of any instrument therefore requires proper calibration. Here we present a new stand-alone software, mzRecal, for universal automatic recalibration of data from all common mass analyzers using standard open formats and based on physical principles. AVAILABILITY AND IMPLEMENTATION mzRecal is implemented in Go and freely available on https://github.com/524D/mzRecal. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rob Marissen
- Center for Proteomics and Metabolomics, Leiden University Medical Center, RC Leiden, The Netherlands
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, RC Leiden, The Netherlands
| |
Collapse
|
3
|
Révész Á, Milley MG, Nagy K, Szabó D, Kalló G, Csősz É, Vékey K, Drahos L. Tailoring to Search Engines: Bottom-Up Proteomics with Collision Energies Optimized for Identification Confidence. J Proteome Res 2020; 20:474-484. [PMID: 33284634 PMCID: PMC7786379 DOI: 10.1021/acs.jproteome.0c00518] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
![]()
Bottom-up
proteomics relies on identification of peptides from
tandem mass spectra, usually via matching against sequence databases.
Confidence in a peptide–spectrum match can be characterized
by a score value given by the database search engines, and it depends
on the information content and the quality of the spectrum. The latter
are influenced by experimental parameters, of which the collision
energy is the most important one in the case of collision-induced
dissociation. We examined how the identification score of the Byonic
and Andromeda (MaxQuant) engines varies with collision energy for
more than a thousand individual peptides from a HeLa tryptic digest
on a QTof instrument. We thereby extended our earlier study on Mascot
scores and corroborated its findings on the potential bimodal nature
of this energy dependence. Optimal energies as a function of m/z show comparable linear trends for the
three engines. On the basis of peptide-level results, we designed
methods with one or two liquid chromatography–tandem mass spectrometry
(LC-MS/MS) runs and various collision energy settings and assessed
their practical performance in peptide and protein identification
from the HeLa standard sample. A 10–40% gain in various measures,
such as the number of identified proteins or sequence coverage, was
obtained over the factory default settings. Best performing methods
differ for the three engines, suggesting that the experimental parameters
should be fine-tuned to the choice of the engine. We also recommend
a simple approach and provide reference data to ease the transfer
of the optimized methods to other mass spectrometers relevant for
proteomics. We demonstrate the utility of this approach on an Orbitrap
instrument. Data sets can be accessed via the MassIVE repository (MSV000086379).
Collapse
Affiliation(s)
- Ágnes Révész
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, H-1117 Budapest, Hungary
| | - Márton Gyula Milley
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, H-1117 Budapest, Hungary
| | - Kinga Nagy
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, H-1117 Budapest, Hungary
| | - Dániel Szabó
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, H-1117 Budapest, Hungary.,Faculty of Science, Institute of Chemistry, Hevesy György PhD School of Chemistry, ELTE, Eötvös Loránd University, Pázmány Péter Sétány 1/A, H-1117 Budapest, Hungary
| | - Gergő Kalló
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Egyetem tér 1, 4032 Debrecen, Hungary
| | - Éva Csősz
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Egyetem tér 1, 4032 Debrecen, Hungary
| | - Károly Vékey
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, H-1117 Budapest, Hungary
| | - László Drahos
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok Körútja 2, H-1117 Budapest, Hungary
| |
Collapse
|
4
|
Palmblad M, Lamprecht AL, Ison J, Schwämmle V. Automated workflow composition in mass spectrometry-based proteomics. Bioinformatics 2019; 35:656-664. [PMID: 30060113 PMCID: PMC6378944 DOI: 10.1093/bioinformatics/bty646] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 07/06/2018] [Accepted: 07/26/2018] [Indexed: 11/28/2022] Open
Abstract
Motivation Numerous software utilities operating on mass spectrometry (MS) data are described in the literature and provide specific operations as building blocks for the assembly of on-purpose workflows. Working out which tools and combinations are applicable or optimal in practice is often hard. Thus researchers face difficulties in selecting practical and effective data analysis pipelines for a specific experimental design. Results We provide a toolkit to support researchers in identifying, comparing and benchmarking multiple workflows from individual bioinformatics tools. Automated workflow composition is enabled by the tools’ semantic annotation in terms of the EDAM ontology. To demonstrate the practical use of our framework, we created and evaluated a number of logically and semantically equivalent workflows for four use cases representing frequent tasks in MS-based proteomics. Indeed we found that the results computed by the workflows could vary considerably, emphasizing the benefits of a framework that facilitates their systematic exploration. Availability and implementation The project files and workflows are available from https://github.com/bio-tools/biotoolsCompose/tree/master/Automatic-Workflow-Composition. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, RC Leiden, The Netherlands
| | - Anna-Lena Lamprecht
- Department of Information and Computing Sciences, Utrecht University, CC Utrecht, The Netherlands
| | - Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
5
|
Svensson D, Sjögren R, Sundell D, Sjödin A, Trygg J. doepipeline: a systematic approach to optimizing multi-level and multi-step data processing workflows. BMC Bioinformatics 2019; 20:498. [PMID: 31615395 PMCID: PMC6794737 DOI: 10.1186/s12859-019-3091-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 09/10/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Selecting the proper parameter settings for bioinformatic software tools is challenging. Not only will each parameter have an individual effect on the outcome, but there are also potential interaction effects between parameters. Both of these effects may be difficult to predict. To make the situation even more complex, multiple tools may be run in a sequential pipeline where the final output depends on the parameter configuration for each tool in the pipeline. Because of the complexity and difficulty of predicting outcomes, in practice parameters are often left at default settings or set based on personal or peer experience obtained in a trial and error fashion. To allow for the reliable and efficient selection of parameters for bioinformatic pipelines, a systematic approach is needed. RESULTS We present doepipeline, a novel approach to optimizing bioinformatic software parameters, based on core concepts of the Design of Experiments methodology and recent advances in subset designs. Optimal parameter settings are first approximated in a screening phase using a subset design that efficiently spans the entire search space, then optimized in the subsequent phase using response surface designs and OLS modeling. Doepipeline was used to optimize parameters in four use cases; 1) de-novo assembly, 2) scaffolding of a fragmented genome assembly, 3) k-mer taxonomic classification of Oxford Nanopore Technologies MinION reads, and 4) genetic variant calling. In all four cases, doepipeline found parameter settings that produced a better outcome with respect to the characteristic measured when compared to using default values. Our approach is implemented and available in the Python package doepipeline. CONCLUSIONS Our proposed methodology provides a systematic and robust framework for optimizing software parameter settings, in contrast to labor- and time-intensive manual parameter tweaking. Implementation in doepipeline makes our methodology accessible and user-friendly, and allows for automatic optimization of tools in a wide range of cases. The source code of doepipeline is available at https://github.com/clicumu/doepipeline and it can be installed through conda-forge.
Collapse
Affiliation(s)
- Daniel Svensson
- Department of Chemistry, Computational Life Science Cluster (CLiC), Umeå University, Umeå, Sweden
| | - Rickard Sjögren
- Department of Chemistry, Computational Life Science Cluster (CLiC), Umeå University, Umeå, Sweden
- Corporate Research, Sartorius AG, Umeå, Sweden
| | - David Sundell
- Division of CBRN Security and Defence, FOI - Swedish Defence Research Agency, Umeå, Sweden
| | - Andreas Sjödin
- Division of CBRN Security and Defence, FOI - Swedish Defence Research Agency, Umeå, Sweden
| | - Johan Trygg
- Department of Chemistry, Computational Life Science Cluster (CLiC), Umeå University, Umeå, Sweden.
- Corporate Research, Sartorius AG, Umeå, Sweden.
| |
Collapse
|
6
|
Karim MR, Michel A, Zappa A, Baranov P, Sahay R, Rebholz-Schuhmann D. Improving data workflow systems with cloud services and use of open data for bioinformatics research. Brief Bioinform 2019; 19:1035-1050. [PMID: 28419324 PMCID: PMC6169675 DOI: 10.1093/bib/bbx039] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 11/22/2022] Open
Abstract
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community.
Collapse
Affiliation(s)
- Md Rezaul Karim
- Semantics in eHealth and Life Sciences (SeLS), Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
| | - Audrey Michel
- School of Biochemistry and Cell Biology, University College Cork, Ireland
| | - Achille Zappa
- Insight Centre for Data Analytics, National University of Ireland Galway, Dangan, Galway, Ireland
| | - Pavel Baranov
- School of Biochemistry and Cell Biology, University College Cork, Ireland
| | - Ratnesh Sahay
- Semantics in eHealth and Life Sciences (SeLS), Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
| | | |
Collapse
|
7
|
Révész Á, Rokob TA, Jeanne Dit Fouque D, Turiák L, Memboeuf A, Vékey K, Drahos L. Selection of Collision Energies in Proteomics Mass Spectrometry Experiments for Best Peptide Identification: Study of Mascot Score Energy Dependence Reveals Double Optimum. J Proteome Res 2018; 17:1898-1906. [PMID: 29607649 DOI: 10.1021/acs.jproteome.7b00912] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Collision energy is a key parameter determining the information content of beam-type collision induced dissociation tandem mass spectrometry (MS/MS) spectra, and its optimal choice largely affects successful peptide and protein identification in MS-based proteomics. For an MS/MS spectrum, quality of peptide match based on sequence database search, often characterized in terms of a single score, is a complex function of spectrum characteristics, and its collision energy dependence has remained largely unexplored. We carried out electrospray ionization-quadrupole-time of flight (ESI-Q-TOF)-MS/MS measurements on 2807 peptides from tryptic digests of HeLa and E. coli at 21 different collision energies. Agglomerative clustering of the resulting Mascot score versus energy curves revealed that only few of them display a single, well-defined maximum; rather, they feature either a broad plateau or two clear peaks. Nonlinear least-squares fitting of one or two Gaussian functions allowed the characteristic energies to be determined. We found that the double peaks and the plateaus in Mascot score can be associated with the different energy dependence of b- and y-type fragment ion intensities. We determined that the energies for optimum Mascot scores follow separate linear trends for the unimodal and bimodal cases with rather large residual variance even after differences in proton mobility are taken into account. This leaves room for experiment optimization and points to the possible influence of further factors beyond m/ z.
Collapse
Affiliation(s)
| | | | - Dany Jeanne Dit Fouque
- UMR CNRS 6521, CEMCA , Université de Bretagne Occidentale , 6 Av. Le Gorgeu , 29238 Brest Cedex 3 , France
| | | | - Antony Memboeuf
- UMR CNRS 6521, CEMCA , Université de Bretagne Occidentale , 6 Av. Le Gorgeu , 29238 Brest Cedex 3 , France
| | | | | |
Collapse
|