1
|
Zhao Y, Wang S, Huang J, Meng B, An D, Fang X, Wei Y, Dai X. A transformer-based semi-autoregressive framework for high-speed and accurate de novo peptide sequencing. Commun Biol 2025; 8:234. [PMID: 39948275 PMCID: PMC11825679 DOI: 10.1038/s42003-025-07584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 01/21/2025] [Indexed: 02/16/2025] Open
Abstract
De novo peptide sequencing directly identifies peptides from mass spectrometry data, playing a critical role in discovering novel proteins and analyzing complex biological samples without reliance on existing databases. To address challenges in both speed and accuracy, a transformer-based model, TSARseqNovo, incorporates two key innovations: a Semi-Autoregressive decoder for parallel prediction of multiple amino acids and a Masking Refinement decoder for refining low-confidence predictions. These features significantly enhance sequencing efficiency and accuracy. Evaluations on the Nine-Species, Aggregated, and Glycoproteomic datasets, demonstrate that TSARseqNovo outperforms state-of-the-art models, including CasaNovo, NovoB, InstaNovo + , and π-HelixNovo. Specifically, TSARseqNovo achieves up to a 2-fold speed increase over CasaNovo and π-HelixNovo, and approximately 10-fold over NovoB and InstaNovo + , while also showing substantial improvements in peptide prediction precision, especially for long peptides. These advancements position TSARseqNovo as a powerful tool for accelerating high-throughput proteomics research and addressing increasingly complex biological questions.
Collapse
Affiliation(s)
- Yang Zhao
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China.
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China.
| | - Shuo Wang
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China
| | - Jinze Huang
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China
| | - Bo Meng
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China
| | - Dong An
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China
| | - Xiang Fang
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China.
| | - Yaoguang Wei
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China.
| | - Xinhua Dai
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China.
| |
Collapse
|
2
|
Klein J, Lam H, Mak TD, Bittremieux W, Perez-Riverol Y, Gabriels R, Shofstahl J, Hecht H, Binz PA, Kawano S, Van Den Bossche T, Carver J, Neely BA, Mendoza L, Suomi T, Claeys T, Payne T, Schulte D, Sun Z, Hoffmann N, Zhu Y, Neumann S, Jones AR, Bandeira N, Vizcaíno JA, Deutsch EW. The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF. Anal Chem 2024; 96:18491-18501. [PMID: 39514576 PMCID: PMC11579979 DOI: 10.1021/acs.analchem.4c04091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 10/16/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024]
Abstract
Mass spectral libraries are collections of reference spectra, usually associated with specific analytes from which the spectra were generated, that are used for further downstream analysis of new spectra. There are many different formats used for encoding spectral libraries, but none have undergone a standardization process to ensure broad applicability to many applications. As part of the Human Proteome Organization Proteomics Standards Initiative (PSI), we have developed a standardized format for encoding spectral libraries, called mzSpecLib (https://psidev.info/mzSpecLib). It is primarily a data model that flexibly encodes metadata about the library entries using the extensible PSI-MS controlled vocabulary and can be encoded in and converted between different serialization formats. We have also developed a standardized data model and serialization for fragment ion peak annotations, called mzPAF (https://psidev.info/mzPAF). It is defined as a separate standard, since it may be used for other applications besides spectral libraries. The mzSpecLib and mzPAF standards are compatible with existing PSI standards such as ProForma 2.0 and the Universal Spectrum Identifier. The mzSpecLib and mzPAF standards have been primarily defined for peptides in proteomics applications with basic small molecule support. They could be extended in the future to other fields that need to encode spectral libraries for nonpeptidic analytes.
Collapse
Affiliation(s)
- Joshua Klein
- Program
for Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, 999077 Hong Kong, P. R. China
| | - Tytus D. Mak
- Mass
Spectrometry Data Center, National Institute
of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jim Shofstahl
- Thermo
Fisher
Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Helge Hecht
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 60200 Brno, Czech Republic
| | | | - Shin Kawano
- Database
Center for Life Science, Joint Support Center
for Data Science Research, Research Organization of Information and
Systems, Chiba 277-0871, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
| | - Benjamin A. Neely
- National
Institute of Standards and Technology (NIST) Charleston, Charleston, South Carolina 29412, United States
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tomi Suomi
- Turku Bioscience
Centre, University of Turku and Åbo
Akademi University, FI-20520 Turku, Finland
| | - Tine Claeys
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Thomas Payne
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Douwe Schulte
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584,
CH, Utrecht, The
Netherlands
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Steffen Neumann
- Computational
Plant Biochemistry, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, United Kingdom
| | - Nuno Bandeira
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
3
|
Sun Y, Xing Z, Liang S, Miao Z, Zhuo LB, Jiang W, Zhao H, Gao H, Xie Y, Zhou Y, Yue L, Cai X, Chen YM, Zheng JS, Guo T. metaExpertPro: A Computational Workflow for Metaproteomics Spectral Library Construction and Data-Independent Acquisition Mass Spectrometry Data Analysis. Mol Cell Proteomics 2024; 23:100840. [PMID: 39278598 PMCID: PMC11795700 DOI: 10.1016/j.mcpro.2024.100840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 08/04/2024] [Accepted: 09/11/2024] [Indexed: 09/18/2024] Open
Abstract
Analysis of large-scale data-independent acquisition mass spectrometry metaproteomics data remains a computational challenge. Here, we present a computational pipeline called metaExpertPro for metaproteomics data analysis. This pipeline encompasses spectral library generation using data-dependent acquisition MS, protein identification and quantification using data-independent acquisition mass spectrometry, functional and taxonomic annotation, as well as quantitative matrix generation for both microbiota and hosts. By integrating FragPipe and DIA-NN, metaExpertPro offers compatibility with both Orbitrap and timsTOF MS instruments. To evaluate the depth and accuracy of identification and quantification, we conducted extensive assessments using human fecal samples and benchmark tests. Performance tests conducted on human fecal samples indicated that metaExpertPro quantified an average of 45,000 peptides in a 60-min diaPASEF injection. Notably, metaExpertPro outperformed three existing software tools by characterizing a higher number of peptides and proteins. Importantly, metaExpertPro maintained a low factual false discovery rate of approximately 5% for protein groups across four benchmark tests. Applying a filter of five peptides per genus, metaExpertPro achieved relatively high accuracy (F-score = 0.67-0.90) in genus diversity and showed a high correlation (rSpearman = 0.73-0.82) between the measured and true genus relative abundance in benchmark tests. Additionally, the quantitative results at the protein, taxonomy, and function levels exhibited high reproducibility and consistency across the commonly adopted public human gut microbial protein databases IGC and UHGP. In a metaproteomic analysis of dyslipidemia patients, metaExpertPro revealed characteristic alterations in microbial functions and potential interactions between the microbiota and the host.
Collapse
Affiliation(s)
- Yingying Sun
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Ziyuan Xing
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Shuang Liang
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; State Key Laboratory for Managing Biotic and Chemical Treats to the Quality and Safety of Agro-products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Zelei Miao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
| | - Lai-Bao Zhuo
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Wenhao Jiang
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Hui Zhao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China
| | - Huanhuan Gao
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yuting Xie
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yan Zhou
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Liang Yue
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Xue Cai
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Yu-Ming Chen
- Department of Epidemiology, Guangdong Provincial Key Laboratory of Food, Nutrition and Health, School of Public Health, Sun Yat-sen University, Guangzhou, China.
| | - Ju-Sheng Zheng
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China; Key Laboratory of Growth Regulation and Translational Research of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China.
| | - Tiannan Guo
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China; School of Medicine, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China; Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China.
| |
Collapse
|
4
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
5
|
Zakopcanik M, Kavan D, Kukacka Z, Novak P, Loginov DS. Data-Independent Acquisition Represents a Promising Alternative for Fast Photochemical Oxidation of Proteins (FPOP) Samples Analysis. Anal Chem 2024; 96:11273-11279. [PMID: 38967040 PMCID: PMC11256011 DOI: 10.1021/acs.analchem.4c01084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 06/27/2024] [Accepted: 06/28/2024] [Indexed: 07/06/2024]
Abstract
Fast Photochemical Oxidation of Proteins (FPOP) is a protein footprinting method utilizing hydroxyl radicals to provide valuable information on the solvent-accessible surface area. The extensive number of oxidative modifications that are created by FPOP is both advantageous, leading to great spatial resolution, and challenging, increasing the complexity of data processing. The precise localization of the modification together with the appropriate reproducibility is crucial to obtain relevant structural information. In this paper, we propose a novel approach combining validated spectral libraries together with utilizing DIA data. First, the DDA data searched by FragPipe are subsequently validated using Skyline software to form a spectral library. This library is then matched against the DIA data to filter out nonrepresentative IDs. In comparison with FPOP data processing using only a search engine followed by generally applied filtration steps, the manually validated spectral library offers higher confidence in identifications and increased spatial resolution. Furthermore, the reproducibility of quantification was compared for DIA, DDA, and MS-only acquisition modes on timsTOF SCP. Comparison of coefficients of variation (CV) showed that the DIA and MS acquisition modes exhibit significantly better reproducibility in quantification (CV medians 0.1233 and 0.1494, respectively) compared to the DDA mode (CV median 0.2104).
Collapse
Affiliation(s)
- Marek Zakopcanik
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
- Department
of Biochemistry, Faculty of Science, Charles
University, 12820 Prague, Czech
Republic
| | - Daniel Kavan
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
| | - Zdenek Kukacka
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
| | - Petr Novak
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
| | - Dmitry S. Loginov
- Institute
of Microbiology, The Czech Academy of Sciences, 14220 Prague, Czech Republic
| |
Collapse
|
6
|
Baeza J, Coons BE, Lin Z, Riley J, Mendoza M, Peranteau WH, Garcia BA. In utero pulse injection of isotopic amino acids quantifies protein turnover rates during murine fetal development. CELL REPORTS METHODS 2024; 4:100713. [PMID: 38412836 PMCID: PMC10921036 DOI: 10.1016/j.crmeth.2024.100713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 12/20/2023] [Accepted: 01/29/2024] [Indexed: 02/29/2024]
Abstract
Protein translational control is critical for ensuring that the fetus develops correctly and that necessary organs and tissues are formed and functional. We developed an in utero method to quantify tissue-specific protein dynamics by monitoring amino acid incorporation into the proteome after pulse injection. Fetuses of pregnant mice were injected with isotopically labeled lysine and arginine via the vitelline vein at various embyonic days, and organs and tissues were harvested. By analyzing the nascent proteome, unique signatures of each tissue were identified by hierarchical clustering. In addition, the quantified proteome-wide turnover rates were calculated between 3.81E-5 and 0.424 h-1. We observed similar protein turnover profiles for analyzed organs (e.g., liver vs. brain); however, their distributions of turnover rates vary significantly. The translational kinetic profiles of developing organs displayed differentially expressed protein pathways and synthesis rates, which correlated with known physiological changes during mouse development.
Collapse
Affiliation(s)
- Josue Baeza
- Department of Biochemistry & Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Barbara E Coons
- The Center for Fetal Research, Division of Pediatric General, Thoracis and Fetal Surgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Zongtao Lin
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - John Riley
- The Center for Fetal Research, Division of Pediatric General, Thoracis and Fetal Surgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Mariel Mendoza
- Department of Biochemistry & Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - William H Peranteau
- The Center for Fetal Research, Division of Pediatric General, Thoracis and Fetal Surgery, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| | - Benjamin A Garcia
- Department of Biochemistry & Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, St. Louis, MO 63110, USA.
| |
Collapse
|
7
|
Lapin J, Yan X, Dong Q. UniSpec: Deep Learning for Predicting the Full Range of Peptide Fragment Ion Series to Enhance the Proteomics Data Analysis Workflow. Anal Chem 2024. [PMID: 38329031 DOI: 10.1021/acs.analchem.3c02321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
We present UniSpec, an attention-driven deep neural network designed to predict comprehensive collision-induced fragmentation spectra, thereby improving peptide identification in shotgun proteomics. Utilizing a training data set of 1.8 million unique high-quality tandem mass spectra (MS2) from 0.8 million unique peptide ions, UniSpec learned with a peptide fragmentation dictionary encompassing 7919 fragment peaks. Among these, 5712 are neutral loss peaks, with 2310 corresponding to modification-specific neutral losses. Remarkably, UniSpec can predict 73%-77% of fragment intensities based on our NIST reference library spectra, a significant leap from the 35%-45% coverage of only b and y ions. Comparative studies with Prosit elucidate that while both models are strong at predicting their respective fragment ion series, UniSpec particularly shines in generating more complex MS2 spectra with diverse ion annotations. The integration of UniSpec's predictions into shotgun proteomics data analysis boosts the identification rate of tryptic peptides by 48% at a 1% false discovery rate (FDR) and 60% at a more confident 0.1% FDR. Using UniSpec's predicted in-silico spectral library, the search results closely matched those from search engines and experimental spectral libraries used in peptide identification, highlighting its potential as a stand-alone identification tool. The source code and Python scripts are available on GitHub (https://github.com/usnistgov/UniSpec) and Zenodo (https://zenodo.org/records/10452792), and all data sets and analysis results generated in this work were deposited in Zenodo (https://zenodo.org/records/10052268).
Collapse
Affiliation(s)
- Joel Lapin
- Department of Physics, Georgetown University, Washington, D.C. 20057, United States
- Associate, Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Xinjian Yan
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Qian Dong
- Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
8
|
Ni X, Murray NB, Archer-Hartmann S, Pepi LE, Helm RF, Azadi P, Hong P. Toward Automatic Inference of Glycan Linkages Using MS n and Machine Learning─Proof of Concept Using Sialic Acid Linkages. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2023; 34:2127-2135. [PMID: 37621000 PMCID: PMC10557947 DOI: 10.1021/jasms.3c00132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 08/10/2023] [Accepted: 08/15/2023] [Indexed: 08/26/2023]
Abstract
Glycosidic linkages in oligosaccharides play essential roles in determining their chemical properties and biological activities. MSn has been widely used to infer glycosidic linkages but requires a substantial amount of starting material, which limits its application. In addition, there is a lack of rigorous research on what MSn protocols are proper for characterizing glycosidic linkages. In this work, to deliver high-quality experimental data and analysis results, we propose a machine learning-based framework to establish appropriate MSn protocols and build effective data analysis methods. We demonstrate the proof-of-principle by applying our approach to elucidate sialic acid linkages (α2'-3' and α2'-6') in a set of sialyllactose standards and NIST sialic acid-containing N-glycans as well as identify several protocol configurations for producing high-quality experimental data. Our companion data analysis method achieves nearly 100% accuracy in classifying α2'-3' vs α2'-6' using MS5, MS4, MS3, or even MS2 spectra alone. The ability to determine glycosidic linkages using MS2 or MS3 is significant as it requires substantially less sample, enabling linkage analysis for quantity-limited natural glycans and synthesized materials, as well as shortens the overall experimental time. MS2 is also more amenable than MS3/4/5 to automation when coupled to direct infusion or LC-MS. Additionally, our method can predict the ratio of α2'-3' and α2'-6' in a mixture with 8.6% RMSE (root-mean-square error) across data sets using MS5 spectra. We anticipate that our framework will be generally applicable to analysis of other glycosidic linkages.
Collapse
Affiliation(s)
- Xinyi Ni
- Computer
Science, Brandeis University, Waltham, Massachusetts 02453, United States
| | - Nathan B. Murray
- Complex
Carbohydrate Research Center, University
of Georgia, Athens, Georgia 30602, United States
| | | | - Lauren E. Pepi
- Complex
Carbohydrate Research Center, University
of Georgia, Athens, Georgia 30602, United States
| | - Richard F. Helm
- Department
of Biochemistry, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Parastoo Azadi
- Complex
Carbohydrate Research Center, University
of Georgia, Athens, Georgia 30602, United States
| | - Pengyu Hong
- Computer
Science, Brandeis University, Waltham, Massachusetts 02453, United States
| |
Collapse
|
9
|
Chandel S, Bhattacharya A, Gautam A, Zeng W, Alka O, Sachsenberg T, Gupta GD, Narang RK, Ravichandiran V, Singh R. Investigation of the anti-cancer potential of epoxyazadiradione in neuroblastoma: experimental assays and molecular analysis. J Biomol Struct Dyn 2023; 42:11377-11395. [PMID: 37753734 DOI: 10.1080/07391102.2023.2262593] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 09/15/2023] [Indexed: 09/28/2023]
Abstract
Neuroblastoma, the most common childhood solid tumor, originates from primitive sympathetic nervous system cells. Epoxyazadiradione (EAD) is a limonoid derived from Azadirachta indica, belonging to the family Meliaceae. In this study, we isolated the EAD from Azadirachta indica seed and studied the anti-cancer potential against neuroblastoma. Herein, EAD demonstrated significant efficacy against neuroblastoma by suppressing cell proliferation, enhancing the rate of apoptosis and cycle arrest at the SubG0 and G2/M phases. EAD enhanced the pro-apoptotic Caspase 3 and Caspase 9 and inhibited the NF-kβ translocation in a dose-dependent manner. In order to identify the specific EAD target, a gel-free quantitative proteomics study on SH-SY5Y cells using Liquid Chromatography with tandem mass spectrometry was done in a dose-dependent manner, followed by detailed bioinformatics analysis to identify effects on protein. Proteomics data identified that Enolase1 and HSP90 were up-regulated in neuroblastoma. EAD inhibited the expression of Enolase1 and HSP90, validated by mRNA expression, immunoblotting, Enolase1 and HSP90 kit and flow-cytometry based bioassay. Molecular docking study, Molecular dynamic simulation, and along with molecular mechanics/Poisson-Boltzmann surface area analysis also suggested that EAD binds at the active site of the proteins and were stable throughout the 100 ns Molecular dynamic simulation study. Overall, this study suggested EAD exhibited anti-cancer activity against neuroblastoma by targeting Enolase1 and HSP90 pathways.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shivani Chandel
- Department of Pharmacognosy, ISF College of Pharmacy, Moga, Punjab, India
| | - Arka Bhattacharya
- Department of Pharmaceutical Chemistry, ISF College of Pharmacy, Moga, Punjab, India
| | - Anupam Gautam
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- International Max Planck Research School "From Molecules to Organisms", Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Cluster of Excellence: EXC 2124: Controlling Microbes to Fight Infection, University of Tübingen, Tübingen, Germany
| | - Wenhuan Zeng
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Oliver Alka
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Timo Sachsenberg
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
| | - G D Gupta
- Department of Pharmaceutics, ISF College of Pharmacy, Moga, Punjab, India
| | - Raj Kumar Narang
- Department of Pharmaceutics, ISF College of Pharmacy, Moga, Punjab, India
| | - V Ravichandiran
- Department of Natural Products, National Institute of Pharmaceutical Education and Research, Kolkata, India
| | - Rajveer Singh
- Department of Pharmacognosy, ISF College of Pharmacy, Moga, Punjab, India
| |
Collapse
|
10
|
Hao C, Elias JE, Lee PKH, Lam H. metaSpectraST: an unsupervised and database-independent analysis workflow for metaproteomic MS/MS data using spectrum clustering. MICROBIOME 2023; 11:176. [PMID: 37550758 PMCID: PMC10405559 DOI: 10.1186/s40168-023-01602-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 06/18/2023] [Indexed: 08/09/2023]
Abstract
BACKGROUND The high diversity and complexity of the microbial community make it a formidable challenge to identify and quantify the large number of proteins expressed in the community. Conventional metaproteomics approaches largely rely on accurate identification of the MS/MS spectra to their corresponding short peptides in the digested samples, followed by protein inference and subsequent taxonomic and functional analysis of the detected proteins. These approaches are dependent on the availability of protein sequence databases derived either from sample-specific metagenomic data or from public repositories. Due to the incompleteness and imperfections of these protein sequence databases, and the preponderance of homologous proteins expressed by different bacterial species in the community, this computational process of peptide identification and protein inference is challenging and error-prone, which hinders the comparison of metaproteomes across multiple samples. RESULTS We developed metaSpectraST, an unsupervised and database-independent metaproteomics workflow, which quantitatively profiles and compares metaproteomics samples by clustering experimentally observed MS/MS spectra based on their spectral similarity. We applied metaSpectraST to fecal samples collected from littermates of two different mother mice right after weaning. Quantitative proteome profiles of the microbial communities of different mice were obtained without any peptide-spectrum identification and used to evaluate the overall similarity between samples and highlight any differentiating markers. Compared to the conventional database-dependent metaproteomics analysis, metaSpectraST is more successful in classifying the samples and detecting the subtle microbiome changes of mouse gut microbiomes post-weaning. metaSpectraST could also be used as a tool to select the suitable biological replicates from samples with wide inter-individual variation. CONCLUSIONS metaSpectraST enables rapid profiling of metaproteomic samples quantitatively, without the need for constructing the protein sequence database or identification of the MS/MS spectra. It maximally preserves information contained in the experimental MS/MS spectra by clustering all of them first and thus is able to better profile the complex microbial communities and highlight their functional changes, as compared with conventional approaches. tag the videobyte in this section as ESM4 Video Abstract.
Collapse
Affiliation(s)
- Chunlin Hao
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
| | | | - Patrick K. H. Lee
- School of Energy and Environment, City University of Hong Kong, Hong Kong SAR, China
- State Key Laboratory of Marine Pollution, City University of Hong Kong, Hong Kong SAR, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| |
Collapse
|
11
|
Baeza J, Coons BE, Lin Z, Riley J, Mendoza M, Peranteau WH, Garcia BA. In utero pulse injection of isotopic amino acids quantifies protein turnover rates during murine fetal development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.18.541242. [PMID: 37293076 PMCID: PMC10245746 DOI: 10.1101/2023.05.18.541242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein translational control is highly regulated step in the gene expression program during mammalian development that is critical for ensuring that the fetus develops correctly and that all of the necessary organs and tissues are formed and functional. Defects in protein expression during fetal development can lead to severe developmental abnormalities or premature death. Currently, quantitative techniques to monitor protein synthesis rates in a developing fetus (in utero) are limited. Here, we developed a novel in utero stable isotope labeling approach to quantify tissue-specific protein dynamics of the nascent proteome during mouse fetal development. Fetuses of pregnant C57BL/6J mice were injected with isotopically labeled lysine (Lys8) and arginine (Arg10) via the vitelline vein at various gestational days. After treatment, fetal organs/tissues including brain, liver, lung, and heart were harvested for sample preparation and proteomic analysis. We show that the mean incorporation rate for injected amino acids into all organs was 17.50 ± 0.6%. By analyzing the nascent proteome, unique signatures of each tissue were identified by hierarchical clustering. In addition, the quantified proteome-wide turnover rates (kobs) were calculated between 3.81E-5 and 0.424 hour-1. We observed similar protein turnover profiles for analyzed organs (e.g., liver versus brain), however, their distributions of turnover rates vary significantly. The translational kinetic profiles of developing organs displayed differentially expressed protein pathways and synthesis rates which correlated with known physiological changes during mouse development.
Collapse
Affiliation(s)
- Josue Baeza
- Department of Biochemistry & Biophysics, University of Pennsylvania, Philadelphia, PA 19104
- Contributed equally to this work
| | - Barbara E. Coons
- The Center for Fetal Research, Division of Pediatric General, Thoracis and Fetal Surgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104
- Contributed equally to this work
| | - Zongtao Lin
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, St. Louis, MO 63110
| | - John Riley
- The Center for Fetal Research, Division of Pediatric General, Thoracis and Fetal Surgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104
| | - Mariel Mendoza
- Department of Biochemistry & Biophysics, University of Pennsylvania, Philadelphia, PA 19104
| | - William H. Peranteau
- The Center for Fetal Research, Division of Pediatric General, Thoracis and Fetal Surgery, Children’s Hospital of Philadelphia, Philadelphia, PA 19104
| | - Benjamin A Garcia
- Department of Biochemistry & Biophysics, University of Pennsylvania, Philadelphia, PA 19104
- Department of Biochemistry and Molecular Biophysics, Washington University in St. Louis, St. Louis, MO 63110
| |
Collapse
|
12
|
Madej D, Lam H. Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics. J Proteome Res 2023; 22:1159-1171. [PMID: 36962508 DOI: 10.1021/acs.jproteome.2c00604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2023]
Abstract
One of the chief objectives in mass spectrometry-based peptide identification in proteomics is the statistical validation of top-scoring peptide-spectrum matches (PSMs) in the form of false discovery rate (FDR) estimation. Existing methods construct a null model that captures the characteristics of incorrect target PSMs to estimate the FDR, most often with the help of decoys. Decoy-based methods, however, increase the computational cost and rely on the difficult-to-verify assumption that decoy PSMs constitute a sufficient and representative sample of the population of possible incorrect target PSMs. On the other hand, the possibility of FDR estimation assisted by the plentiful non-top-scoring PSMs, which are almost always incorrect, has been scarcely explored. In this work, we propose a novel decoy-free procedure for developing null models for top-scoring PSMs using the transformed e-value (TEV) score and the distributions of non-top-scoring target PSMs. The method relies on a theoretically derivable relationship between the parameters of the distributions of lower-order statistics of the TEV score and a necessary empirical optimization to fit a single parameter to actual data. The framework was tested on multiple different data sets and two search engines. We present evidence that our method is comparable to and occasionally outperforms popular decoy-free and decoy-based methods in FDR estimation.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, China
| |
Collapse
|
13
|
Arab I, Fondrie WE, Laukens K, Bittremieux W. Semisupervised Machine Learning for Sensitive Open Modification Spectral Library Searching. J Proteome Res 2023; 22:585-593. [PMID: 36688569 DOI: 10.1021/acs.jproteome.2c00616] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
A key analysis task in mass spectrometry proteomics is matching the acquired tandem mass spectra to their originating peptides by sequence database searching or spectral library searching. Machine learning is an increasingly popular postprocessing approach to maximize the number of confident spectrum identifications that can be obtained at a given false discovery rate threshold. Here, we have integrated semisupervised machine learning in the ANN-SoLo tool, an efficient spectral library search engine that is optimized for open modification searching to identify peptides with any type of post-translational modification. We show that machine learning rescoring boosts the number of spectra that can be identified for both standard searching and open searching, and we provide insights into relevant spectrum characteristics harnessed by the machine learning model. The semisupervised machine learning functionality has now been fully integrated into ANN-SoLo, which is available as open source under the permissive Apache 2.0 license on GitHub at https://github.com/bittremieux/ANN-SoLo.
Collapse
Affiliation(s)
- Issar Arab
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | | | - Kris Laukens
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
14
|
Dorl S, Winkler S, Mechtler K, Dorfer V. MS Ana: Improving Sensitivity in Peptide Identification with Spectral Library Search. J Proteome Res 2023; 22:462-470. [PMID: 36688604 PMCID: PMC9903325 DOI: 10.1021/acs.jproteome.2c00658] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Spectral library search can enable more sensitive peptide identification in tandem mass spectrometry experiments. However, its drawbacks are the limited availability of high-quality libraries and the added difficulty of creating decoy spectra for result validation. We describe MS Ana, a new spectral library search engine that enables high sensitivity peptide identification using either curated or predicted spectral libraries as well as robust false discovery control through its own decoy library generation algorithm. MS Ana identifies on average 36% more spectrum matches and 4% more proteins than database search in a benchmark test on single-shot human cell-line data. Further, we demonstrate the quality of the result validation with tests on synthetic peptide pools and show the importance of library selection through a comparison of library search performance with different configurations of publicly available human spectral libraries.
Collapse
Affiliation(s)
- Sebastian Dorl
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria,E-mail: . Phone: +43 (0) 50804
27145
| | - Stephan Winkler
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria
| | - Karl Mechtler
- Research
Institute of Molecular Pathology (IMP), Protein Chemistry, Campus-Vienna-Biocenter 1, 1030Vienna, Austria,Institute
of Molecular Biotechnology (IMBA), Protein Chemistry, Vienna Biocenter
(VBC), Dr. Bohr-Gasse 3, 1030Vienna, Austria,Gregor
Mendel Institute of Molecular Plant Biology of the Austrian Academy
of Sciences (GMI), Dr.
Bohr Gasse 3, 1030Vienna, Austria
| | - Viktoria Dorfer
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,E-mail: . Phone: +43 (0) 50804
22740
| |
Collapse
|
15
|
Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 2022; 18:94. [PMID: 36409434 PMCID: PMC10284100 DOI: 10.1007/s11306-022-01947-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/19/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. AIM OF REVIEW We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
16
|
Dai Y, Millikin R, Rolfs Z, Shortreed MR, Smith LM. A Hybrid Spectral Library and Protein Sequence Database Search Strategy for Bottom-Up and Top-Down Proteomic Data Analysis. J Proteome Res 2022; 21:2609-2618. [PMID: 36206157 PMCID: PMC9869658 DOI: 10.1021/acs.jproteome.2c00305] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Tandem mass spectrometry (MS/MS) is widely employed for the analysis of complex proteomic samples. While protein sequence database searching and spectral library searching are both well-established peptide identification methods, each has shortcomings. Protein sequence databases lack fragment peak intensity information, which can result in poor discrimination between correct and incorrect spectrum assignments. Spectral libraries usually contain fewer peptides than protein sequence databases, which limits the number of peptides that can be identified. Notably, few post-translationally modified peptides are represented in spectral libraries. This is because few search engines can both identify a broad spectrum of PTMs and create corresponding spectral libraries. Also, programs that generate spectral libraries using deep learning approaches are not yet able to accurately predict spectra for the vast majority of PTMs. Here, we address these limitations through use of a hybrid search strategy that combines protein sequence database and spectral library searches to improve identification success rates and sensitivity. This software uses Global PTM Discovery (G-PTM-D) to produce spectral libraries for a wide variety of different PTMs. These features, along with a new spectrum annotation and visualization tool, have been integrated into the freely available and open-source search engine MetaMorpheus.
Collapse
Affiliation(s)
- Yuling Dai
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Robert Millikin
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Zach Rolfs
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Michael R. Shortreed
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin, 1101 University Avenue, Madison, Wisconsin 53706, United States
| |
Collapse
|
17
|
MAVEN2: An Updated Open-Source Mass Spectrometry Exploration Platform. Metabolites 2022; 12:metabo12080684. [PMID: 35893250 PMCID: PMC9330773 DOI: 10.3390/metabo12080684] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 07/21/2022] [Accepted: 07/21/2022] [Indexed: 11/17/2022] Open
Abstract
MAVEN, an open-source software program for analysis of LC-MS metabolomics data, was originally released in 2010. As mass spectrometry has advanced in the intervening years, MAVEN has been periodically updated to reflect this advancement. This manuscript describes a major update to the program, MAVEN2, which supports LC-MS/MS analysis of metabolomics and lipidomics samples. We have developed algorithms to support MS/MS spectral matching and efficient search of large-scale fragmentation libraries. We explore the ability of our approach to separate authentic from spurious metabolite identifications using a set of standards spiked into water and yeast backgrounds. To support our improved lipid identification workflow, we introduce a novel in-silico lipidomics library covering major lipid classes and compare searches using our novel library to searches with existing in-silico lipidomics libraries. MAVEN2 source code and cross-platform application installers are freely available for download from GitHub under a GNU permissive license [ver 3], as are the in silico lipidomics libraries and corresponding code repository.
Collapse
|
18
|
Lee H, Kim SI. Review of Liquid Chromatography-Mass Spectrometry-Based Proteomic Analyses of Body Fluids to Diagnose Infectious Diseases. Int J Mol Sci 2022; 23:ijms23042187. [PMID: 35216306 PMCID: PMC8878692 DOI: 10.3390/ijms23042187] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 02/11/2022] [Accepted: 02/14/2022] [Indexed: 01/27/2023] Open
Abstract
Rapid and precise diagnostic methods are required to control emerging infectious diseases effectively. Human body fluids are attractive clinical samples for discovering diagnostic targets because they reflect the clinical statuses of patients and most of them can be obtained with minimally invasive sampling processes. Body fluids are good reservoirs for infectious parasites, bacteria, and viruses. Therefore, recent clinical proteomics methods have focused on body fluids when aiming to discover human- or pathogen-originated diagnostic markers. Cutting-edge liquid chromatography-mass spectrometry (LC-MS)-based proteomics has been applied in this regard; it is considered one of the most sensitive and specific proteomics approaches. Here, the clinical characteristics of each body fluid, recent tandem mass spectroscopy (MS/MS) data-acquisition methods, and applications of body fluids for proteomics regarding infectious diseases (including the coronavirus disease of 2019 [COVID-19]), are summarized and discussed.
Collapse
Affiliation(s)
- Hayoung Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Korea;
- Bio-Analytical Science Division, University of Science and Technology (UST), Daejeon 34113, Korea
| | - Seung Il Kim
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Korea;
- Bio-Analytical Science Division, University of Science and Technology (UST), Daejeon 34113, Korea
- Correspondence:
| |
Collapse
|
19
|
Chen Z, de Boves Harrington P, Rearden P, Shetty V, Noyola A. A quantitative reliability metric for querying large database. Forensic Sci Int 2021; 331:111155. [PMID: 34972050 DOI: 10.1016/j.forsciint.2021.111155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 10/28/2021] [Accepted: 12/18/2021] [Indexed: 11/25/2022]
Abstract
A redesigned quantitative reliability metric based on the F-distribution (QRMf) is reported for evaluating the reliability of library search. The QRMf provides orthogonal information to the comparison metric (e.g., dot product) and yields a probabilistic result. An intralibrary search can be considered as an idealized search because the top hit, i.e., the closest matching object, will match perfectly. If the search of an unknown object yields the same hit list as the intralibrary search, it would indicate good reliability. For each object in the hit list, a QRMf compares the order of an intralibrary and interlibrary search results and calculates a variance of interlibrary similarity metrics between the records of the intralibrary search and records in the corresponding positions of the interlibrary search. This variance that measures the discordance of the intra and interlibrary search can simply be compared to the variance of the similarity metrics within the interlibrary search results. The ratio of these variances follows an F-distribution that can be used to determine if the discordance is statistically significant and generates the probability based on the cumulative distribution function. The QRMf works for both similarity and dissimilarity and can be used for any queried object and comparison metric that is searched against a database. In this work, the QRMf was used along with the dot product similarity to query the mass spectra of novel synthetic opioids measured by gas chromatography-mass spectrometry (GC/MS). An automated pipeline was devised that used a basis set correction to assist peak detection. The basis was constructed by mass spectra obtained from the blank measurement preceding the analytical run to remove interferences from column bleed and septum degradation. After peak detection, the pipeline applied multivariate curve resolution to the chromatographic peak window to remove background components from the mass spectra. The corrected mass spectra were searched against a customized library for identification. The QRMf can be used along with the similarity metric to detect misidentifications and assist in finding the correct identification when it is not the closest match.
Collapse
Affiliation(s)
- Zewei Chen
- Chemistry Laboratories, Department of Chemistry and Biochemistry, Ohio University, Athens, OH 45701, USA
| | - Peter de Boves Harrington
- Chemistry Laboratories, Department of Chemistry and Biochemistry, Ohio University, Athens, OH 45701, USA.
| | - Preshious Rearden
- Research and Development Department, Houston Forensic Science Center, Houston, TX 77002, USA
| | - Vivekananda Shetty
- Research and Development Department, Houston Forensic Science Center, Houston, TX 77002, USA
| | - Angelica Noyola
- Seized Drugs Section, Houston Forensic Science Center, Houston, TX 77002, USA
| |
Collapse
|
20
|
Wojtkiewicz M, Berg Luecke L, Castro C, Burkovetskaya M, Mesidor R, Gundry RL. Bottom-up proteomic analysis of human adult cardiac tissue and isolated cardiomyocytes. J Mol Cell Cardiol 2021; 162:20-31. [PMID: 34437879 PMCID: PMC9620472 DOI: 10.1016/j.yjmcc.2021.08.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 07/07/2021] [Accepted: 08/04/2021] [Indexed: 12/30/2022]
Abstract
The heart is composed of multiple cell types, each with a specific function. Cell-type-specific approaches are necessary for defining the intricate molecular mechanisms underlying cardiac development, homeostasis, and pathology. While single-cell RNA-seq studies are beginning to define the chamber-specific cellular composition of the heart, our views of the proteome are more limited because most proteomics studies have utilized homogenized human cardiac tissue. To promote future cell-type specific analyses of the human heart, we describe the first method for cardiomyocyte isolation from cryopreserved human cardiac tissue followed by flow cytometry for purity assessment. We also describe a facile method for preparing isolated cardiomyocytes and whole cardiac tissue homogenate for bottom-up proteomic analyses. Prior experience in dissociating cardiac tissue or proteomics is not required to execute these methods. We compare different sample preparation workflows and analysis methods to demonstrate how these can impact the depth of proteome coverage achieved. We expect this how-to guide will serve as a starting point for investigators interested in general and cell-type-specific views of the cardiac proteome.
Collapse
Affiliation(s)
- Melinda Wojtkiewicz
- CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine, Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Linda Berg Luecke
- CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine, Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE 68198, USA; Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Chase Castro
- CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine, Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Maria Burkovetskaya
- CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine, Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Roneldine Mesidor
- CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine, Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Rebekah L Gundry
- CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine, Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE 68198, USA.
| |
Collapse
|
21
|
Cassidy L, Kaulich PT, Maaß S, Bartel J, Becher D, Tholey A. Bottom-up and top-down proteomic approaches for the identification, characterization, and quantification of the low molecular weight proteome with focus on short open reading frame-encoded peptides. Proteomics 2021; 21:e2100008. [PMID: 34145981 DOI: 10.1002/pmic.202100008] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 01/14/2023]
Abstract
The recent discovery of alternative open reading frames creates a need for suitable analytical approaches to verify their translation and to characterize the corresponding gene products at the molecular level. As the analysis of small proteins within a background proteome by means of classical bottom-up proteomics is challenging, method development for the analysis of small open reading frame encoded peptides (SEPs) have become a focal point for research. Here, we highlight bottom-up and top-down proteomics approaches established for the analysis of SEPs in both pro- and eukaryotes. Major steps of analysis, including sample preparation and (small) proteome isolation, separation and mass spectrometry, data interpretation and quality control, quantification, the analysis of post-translational modifications, and exploration of functional aspects of the SEPs by means of proteomics technologies are described. These methods do not exclusively cover the analytics of SEPs but simultaneously include the low molecular weight proteome, and moreover, can also be used for the proteome-wide analysis of proteolytic processing events.
Collapse
Affiliation(s)
- Liam Cassidy
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Jürgen Bartel
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| |
Collapse
|
22
|
Important Issues in Planning a Proteomics Experiment: Statistical Considerations of Quantitative Proteomic Data. Methods Mol Biol 2021; 2228:1-20. [PMID: 33950479 DOI: 10.1007/978-1-0716-1024-4_1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/15/2023]
Abstract
Mass spectrometry is frequently used in quantitative proteomics to detect differentially regulated proteins. A very important but unfortunately oftentimes neglected part in detecting differential proteins is the statistical analysis. Data from proteomics experiments are usually high-dimensional and hence require profound statistical methods. It is especially important to already correctly design a proteomic experiment before it is conducted in the laboratory. Only this can ensure that the statistical analysis is capable of detecting truly differential proteins afterward. This chapter thus covers aspects of both statistical planning as well as the actual analysis of quantitative proteomic experiments.
Collapse
|
23
|
Abstract
Proteomics, the large-scale study of all proteins of an organism or system, is a powerful tool for studying biological systems. It can provide a holistic view of the physiological and biochemical states of given samples through identification and quantification of large numbers of peptides and proteins. In forensic science, proteomics can be used as a confirmatory and orthogonal technique for well-built genomic analyses. Proteomics is highly valuable in cases where nucleic acids are absent or degraded, such as hair and bone samples. It can be used to identify body fluids, ethnic group, gender, individual, and estimate post-mortem interval using bone, muscle, and decomposition fluid samples. Compared to genomic analysis, proteomics can provide a better global picture of a sample. It has been used in forensic science for a wide range of sample types and applications. In this review, we briefly introduce proteomic methods, including sample preparation techniques, data acquisition using liquid chromatography-tandem mass spectrometry, and data analysis using database search, spectral library search, and de novo sequencing. We also summarize recent applications in the past decade of proteomics in forensic science with a special focus on human samples, including hair, bone, body fluids, fingernail, muscle, brain, and fingermark, and address the challenges, considerations, and future developments of forensic proteomics.
Collapse
|
24
|
Abstract
Mass spectrometry (MS)-based proteomics is currently the most successful approach to measure and compare peptides and proteins in a large variety of biological samples. Modern mass spectrometers, equipped with high-resolution analyzers, provide large amounts of data output. This is the case of shotgun/bottom-up proteomics, which consists in the enzymatic digestion of protein into peptides that are then measured by MS-instruments through a data dependent acquisition (DDA) mode. Dedicated bioinformatic tools and platforms have been developed to face the increasing size and complexity of raw MS data that need to be processed and interpreted for large-scale protein identification and quantification. This chapter illustrates the most popular bioinformatics solution for the analysis of shotgun MS-proteomics data. A general description will be provided on the data preprocessing options and the different search engines available, including practical suggestions on how to optimize the parameters for peptide search, based on hands-on experience.
Collapse
Affiliation(s)
- Avinash Yadav
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Federica Marini
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Alessandro Cuomo
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Tiziana Bonaldi
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy.
| |
Collapse
|
25
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
26
|
The challenge of detecting modifications on proteins. Essays Biochem 2020; 64:135-153. [PMID: 31957791 DOI: 10.1042/ebc20190055] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/17/2019] [Accepted: 12/19/2019] [Indexed: 12/16/2022]
Abstract
Post-translational modifications (PTMs) are integral to the regulation of protein function, characterising their role in this process is vital to understanding how cells work in both healthy and diseased states. Mass spectrometry (MS) facilitates the mass determination and sequencing of peptides, and thereby also the detection of site-specific PTMs. However, numerous challenges in this field continue to persist. The diverse chemical properties, low abundance, labile nature and instability of many PTMs, in combination with the more practical issues of compatibility with MS and bioinformatics challenges, contribute to the arduous nature of their analysis. In this review, we present an overview of the established MS-based approaches for analysing PTMs and the common complications associated with their investigation, including examples of specific challenges focusing on phosphorylation, lysine acetylation and redox modifications.
Collapse
|
27
|
Yang Y, Horvatovich P, Qiao L. Fragment Mass Spectrum Prediction Facilitates Site Localization of Phosphorylation. J Proteome Res 2020; 20:634-644. [PMID: 32985198 DOI: 10.1021/acs.jproteome.0c00580] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Liquid chromatography tandem mass spectrometry (LC-MS/MS) has been the most widely used technology for phosphoproteomics studies. As an alternative to database searching and probability-based phosphorylation site localization approaches, spectral library searching has been proved to be effective in the identification of phosphopeptides. However, incompletion of experimental spectral libraries limits the identification capability. Herein, we utilize MS/MS spectrum prediction coupled with spectral matching for site localization of phosphopeptides. In silico MS/MS spectra are generated from peptide sequences by deep learning/machine learning models trained with nonphosphopeptides. Then, mass shift according to phosphorylation sites, phosphoric acid neutral loss, and a "budding" strategy are adopted to adjust the in silico mass spectra. In silico MS/MS spectra can also be generated in one step for phosphopeptides using models trained with phosphopeptides. The method is benchmarked on data sets of synthetic phosphopeptides and is used to process real biological samples. It is demonstrated to be a method requiring only computational resources that supplements the probability-based approaches for phosphorylation site localization of singly and multiply phosphorylated peptides.
Collapse
Affiliation(s)
- Yi Yang
- Department of Chemistry and Shanghai Stomatological Hospital, Fudan University, Handan Road 220, Shanghai 200000, China
| | - Peter Horvatovich
- Department of Pharmacy, University of Groningen, Antonius Deusinglaan 1, Groningen 9700 AD, The Netherlands
| | - Liang Qiao
- Department of Chemistry and Shanghai Stomatological Hospital, Fudan University, Handan Road 220, Shanghai 200000, China
| |
Collapse
|
28
|
Label-Free Mass Spectrometry-Based Quantification of Linker Histone H1 Variants in Clinical Samples. Int J Mol Sci 2020; 21:ijms21197330. [PMID: 33020374 PMCID: PMC7582528 DOI: 10.3390/ijms21197330] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 09/25/2020] [Accepted: 09/28/2020] [Indexed: 12/21/2022] Open
Abstract
Epigenetic aberrations have been recognized as important contributors to cancer onset and development, and increasing evidence suggests that linker histone H1 variants may serve as biomarkers useful for patient stratification, as well as play an important role as drivers in cancer. Although traditionally histone H1 levels have been studied using antibody-based methods and RNA expression, these approaches suffer from limitations. Mass spectrometry (MS)-based proteomics represents the ideal tool to accurately quantify relative changes in protein abundance within complex samples. In this study, we used a label-free quantification approach to simultaneously analyze all somatic histone H1 variants in clinical samples and verified its applicability to laser micro-dissected tissue areas containing as low as 1000 cells. We then applied it to breast cancer patient samples, identifying differences in linker histone variants patters in primary triple-negative breast tumors with and without relapse after chemotherapy. This study highlights how label-free quantitation by MS is a valuable option to accurately quantitate histone H1 levels in different types of clinical samples, including very low-abundance patient tissues.
Collapse
|
29
|
O'Bryon I, Jenson SC, Merkley ED. Flying blind, or just flying under the radar? The underappreciated power of de novo methods of mass spectrometric peptide identification. Protein Sci 2020; 29:1864-1878. [PMID: 32713088 PMCID: PMC7454419 DOI: 10.1002/pro.3919] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/21/2020] [Accepted: 07/23/2020] [Indexed: 12/15/2022]
Abstract
Mass spectrometry-based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph-based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.
Collapse
Affiliation(s)
- Isabelle O'Bryon
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Sarah C. Jenson
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Eric D. Merkley
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| |
Collapse
|
30
|
Zhang F, Ge W, Ruan G, Cai X, Guo T. Data‐Independent Acquisition Mass Spectrometry‐Based Proteomics and Software Tools: A Glimpse in 2020. Proteomics 2020; 20:e1900276. [DOI: 10.1002/pmic.201900276] [Citation(s) in RCA: 116] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 03/27/2020] [Indexed: 01/02/2023]
Affiliation(s)
- Fangfei Zhang
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| | - Weigang Ge
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| | - Guan Ruan
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| | - Xue Cai
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| | - Tiannan Guo
- Key Laboratory of Structural Biology of Zhejiang ProvinceSchool of Life SciencesWestlake University 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
- Institute of Basic Medical SciencesWestlake Institute for Advanced Study 18 Shilongshan Road Hangzhou Zhejiang Province 310024 China
| |
Collapse
|
31
|
Hentschker C, Maaß S, Junker S, Hecker M, Hammerschmidt S, Otto A, Becher D. Comprehensive Spectral Library from the Pathogenic Bacterium Streptococcus pneumoniae with Focus on Phosphoproteins. J Proteome Res 2020; 19:1435-1446. [DOI: 10.1021/acs.jproteome.9b00615] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Christian Hentschker
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Sabryna Junker
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Michael Hecker
- Department of Microbial Physiology and Molecular Biology, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Sven Hammerschmidt
- Department of Molecular Genetics and Infection Biology, Interfaculty Institute for Genetics and Functional Genomics, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Andreas Otto
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| |
Collapse
|
32
|
Duggan BM, Cullum R, Fenical W, Amador LA, Rodríguez AD, La Clair JJ. Searching for Small Molecules with an Atomic Sort. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201911862] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Brendan M. Duggan
- Skaggs School of Pharmacy and Pharmaceutical Sciences University of California, San Diego 9500 Gilman Drive La Jolla CA 92093 USA
| | - Reiko Cullum
- Center for Marine Biotechnology and Biomedicine Scripps Institution of Oceanography University of California, San Diego La Jolla CA 92093-0204 USA
| | - William Fenical
- Center for Marine Biotechnology and Biomedicine Scripps Institution of Oceanography University of California, San Diego La Jolla CA 92093-0204 USA
| | - Luis A. Amador
- Molecular Sciences Research Center University of Puerto Rico 1390 Ponce de León Avenue San Juan 00926 Puerto Rico
| | - Abimael D. Rodríguez
- Molecular Sciences Research Center University of Puerto Rico 1390 Ponce de León Avenue San Juan 00926 Puerto Rico
| | - James J. La Clair
- Department of Chemistry and Biochemistry University of California San Diego 9500 Gilman Drive, La Jolla CA 92093 USA
| |
Collapse
|
33
|
Duggan BM, Cullum R, Fenical W, Amador LA, Rodríguez AD, La Clair JJ. Searching for Small Molecules with an Atomic Sort. Angew Chem Int Ed Engl 2020; 59:1144-1148. [PMID: 31696595 PMCID: PMC6942196 DOI: 10.1002/anie.201911862] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 10/24/2019] [Indexed: 12/14/2022]
Abstract
The discovery of biologically active small molecules requires sifting through large amounts of data to identify unique or unusual arrangements of atoms. Here, we develop, test and evaluate an atom-based sort to identify novel features of secondary metabolites and demonstrate its use to evaluate novelty in marine microbial and sponge extracts. This study outlines an important ongoing advance towards the translation of autonomous systems to identify, and ultimately elucidate, atomic novelty within a complex mixture of small molecules.
Collapse
Affiliation(s)
- Brendan M Duggan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Reiko Cullum
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, 92093-0204, USA
| | - William Fenical
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, 92093-0204, USA
| | - Luis A Amador
- Molecular Sciences Research Center, University of Puerto Rico, 1390 Ponce de León Avenue, San Juan, 00926, Puerto Rico
| | - Abimael D Rodríguez
- Molecular Sciences Research Center, University of Puerto Rico, 1390 Ponce de León Avenue, San Juan, 00926, Puerto Rico
| | - James J La Clair
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| |
Collapse
|
34
|
Vizcaíno JA, Kubiniok P, Kovalchik KA, Ma Q, Duquette JD, Mongrain I, Deutsch EW, Peters B, Sette A, Sirois I, Caron E. The Human Immunopeptidome Project: A Roadmap to Predict and Treat Immune Diseases. Mol Cell Proteomics 2020; 19:31-49. [PMID: 31744855 PMCID: PMC6944237 DOI: 10.1074/mcp.r119.001743] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 11/18/2019] [Indexed: 12/11/2022] Open
Abstract
The science that investigates the ensembles of all peptides associated to human leukocyte antigen (HLA) molecules is termed "immunopeptidomics" and is typically driven by mass spectrometry (MS) technologies. Recent advances in MS technologies, neoantigen discovery and cancer immunotherapy have catalyzed the launch of the Human Immunopeptidome Project (HIPP) with the goal of providing a complete map of the human immunopeptidome and making the technology so robust that it will be available in every clinic. Here, we provide a long-term perspective of the field and we use this framework to explore how we think the completion of the HIPP will truly impact the society in the future. In this context, we introduce the concept of immunopeptidome-wide association studies (IWAS). We highlight the importance of large cohort studies for the future and how applying quantitative immunopeptidomics at population scale may provide a new look at individual predisposition to common immune diseases as well as responsiveness to vaccines and immunotherapies. Through this vision, we aim to provide a fresh view of the field to stimulate new discussions within the community, and present what we see as the key challenges for the future for unlocking the full potential of immunopeptidomics in this era of precision medicine.
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Peter Kubiniok
- CHU Sainte-Justine Research Center, Montreal, QC H3T 1C5, Canada
| | | | - Qing Ma
- CHU Sainte-Justine Research Center, Montreal, QC H3T 1C5, Canada; School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | | | - Ian Mongrain
- Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal, QC, Canada; Montreal Heart Institute, Montreal, QC, Canada
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, La Jolla, California, 92037
| | - Alessandro Sette
- La Jolla Institute for Allergy and Immunology, La Jolla, California, 92037
| | - Isabelle Sirois
- CHU Sainte-Justine Research Center, Montreal, QC H3T 1C5, Canada
| | - Etienne Caron
- CHU Sainte-Justine Research Center, Montreal, QC H3T 1C5, Canada; Department of Pathology and Cellular Biology, Faculty of Medicine, Université de Montréal, QC H3T 1J4, Canada.
| |
Collapse
|
35
|
Klein JA, Zaia J. A Perspective on the Confident Comparison of Glycoprotein Site-Specific Glycosylation in Sample Cohorts. Biochemistry 2019; 59:3089-3097. [PMID: 31833756 DOI: 10.1021/acs.biochem.9b00730] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Protein glycosylation, resulting from glycosyl transferase reactions under complex control in the secretory pathway, consists of a distribution of related glycoforms at each glycosylation site. Because the biosynthetic substrate concentration and transport rates depend on architecture and other aspects of cellular phenotypes, site-specific glycosylation cannot be predicted accurately from genomic, transcriptomic, or proteomic information. Rather, it is necessary to quantify glycosylation at each protein site and how this changes among a sample cohort to provide information about disease mechanisms. At present, mature mass spectrometry-based methods allow for qualitative assignment of the glycan composition and glycosylation site of singly glycosylated proteolytic peptides. To make such quantitative comparisons, it is necessary to sample the glycosylation distribution with sufficient coverage and accuracy for confident assessment of the glycosylation changes that occur in the biological cohort. In this Perspective, we discuss the unmet needs for mass spectrometry acquisition methods and bioinformatics for the confident comparison of protein site-specific glycosylation among sample cohorts.
Collapse
|
36
|
den Ridder M, Daran-Lapujade P, Pabst M. Shot-gun proteomics: why thousands of unidentified signals matter. FEMS Yeast Res 2019; 20:5682490. [DOI: 10.1093/femsyr/foz088] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Accepted: 12/19/2019] [Indexed: 12/14/2022] Open
Abstract
ABSTRACT
Mass spectrometry-based proteomics has become a constitutional part of the multi-omics toolbox in yeast research, advancing fundamental knowledge of molecular processes and guiding decisions in strain and product developmental pipelines. Nevertheless, post-translational protein modifications (PTMs) continue to challenge the field of proteomics. PTMs are not directly encoded in the genome; therefore, they require a sensitive analysis of the proteome itself. In yeast, the relevance of post-translational regulators has already been established, such as for phosphorylation, which can directly affect the reaction rates of metabolic enzymes. Whereas, the selective analysis of single modifications has become a broadly employed technique, the sensitive analysis of a comprehensive set of modifications still remains a challenge. At the same time, a large number of fragmentation spectra in a typical shot-gun proteomics experiment remain unidentified. It has been estimated that a good proportion of those unidentified spectra originates from unexpected modifications or natural peptide variants. In this review, recent advancements in microbial proteomics for unrestricted protein modification discovery are reviewed, and recent research integrating this additional layer of information to elucidate protein interaction and regulation in yeast is briefly discussed.
Collapse
Affiliation(s)
- Maxime den Ridder
- Delft University of Technology, Department of Biotechnology, van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Pascale Daran-Lapujade
- Delft University of Technology, Department of Biotechnology, van der Maasweg 9, 2629 HZ Delft, The Netherlands
| | - Martin Pabst
- Delft University of Technology, Department of Biotechnology, van der Maasweg 9, 2629 HZ Delft, The Netherlands
| |
Collapse
|
37
|
Optimization of TripleTOF spectral simulation and library searching for confident localization of phosphorylation sites. PLoS One 2019; 14:e0225885. [PMID: 31790495 PMCID: PMC6886777 DOI: 10.1371/journal.pone.0225885] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Accepted: 11/14/2019] [Indexed: 12/31/2022] Open
Abstract
Tandem mass spectrometry (MS/MS) has been used in analysis of proteins and their post-translational modifications. A recently developed data analysis method, which simulates MS/MS spectra of phosphopeptides and performs spectral library searching using SpectraST, facilitates confident localization of phosphorylation sites. However, its performance has been evaluated only on MS/MS spectra acquired using Orbitrap HCD mass spectrometers so far. In this study, we have investigated whether this approach would be applicable to another type of mass spectrometers, and optimized the simulation and search conditions to achieve sensitive and confident site localization. Synthetic phosphopeptides and enriched K562 cell phosphopeptides were analyzed using a TripleTOF 6600 mass spectrometer before and after enzymatic dephosphorylation. Dephosphorylated peptides identified by X!Tandem database searching were subjected to spectral simulation of all possible single phosphorylations using SimPhospho software. Phosphopeptides were identified and localized by SpectraST searching against a library of the simulated spectra. Although no synthetic phosphopeptide was localized at 1% false localization rate under the previous conditions, optimization of the spectral simulation and search conditions for the TripleTOF datasets achieved the localization and improved the sensitivity. Furthermore, the optimized conditions enabled sensitive localization of K562 phosphopeptides at 1% false discovery and localization rates. These results suggest that accurate phosphopeptide simulation of TripleTOF MS/MS spectra is possible and the simulated spectral libraries can be used in SpectraST searching for confident localization of phosphorylation sites.
Collapse
|
38
|
Pino L, Lin A, Bittremieux W. 2018 YPIC Challenge: A Case Study in Characterizing an Unknown Protein Sample. J Proteome Res 2019; 18:3936-3943. [PMID: 31556620 PMCID: PMC6824964 DOI: 10.1021/acs.jproteome.9b00384] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
For the 2018 YPIC Challenge, contestants were invited to try to decipher two unknown English questions encoded by a synthetic protein expressed in Escherichia coli. In addition to deciphering the sentence, contestants were asked to determine the three-dimensional structure and detect any post-translation modifications left by the host organism. We present our experimental and computational strategy to characterize this sample by identifying the unknown protein sequence and detecting the presence of post-translational modifications. The sample was acquired with dynamic exclusion disabled to increase the signal-to-noise ratio of the measured molecules, after which spectral clustering was used to generate high-quality consensus spectra. De novo spectrum identification was used to determine the synthetic protein sequence, and any post-translational modifications introduced by E. coli on the synthetic protein were analyzed via spectral networking. This workflow resulted in a de novo sequence coverage of 70%, on par with sequence database searching performance. Additionally, the spectral networking analysis indicated that no systematic modifications were introduced on the synthetic protein by E. coli. The strategy presented here can be directly used to analyze samples for which no protein sequence information is available or when the identity of the sample is unknown. All software and code to perform the bioinformatics analysis is available as open source, and self-contained Jupyter notebooks are provided to fully recreate the analysis.
Collapse
Affiliation(s)
- Lindsay Pino
- Department of Genome Sciences, University of Washington, Seattle WA 98195, USA
| | - Andy Lin
- Department of Genome Sciences, University of Washington, Seattle WA 98195, USA
| | - Wout Bittremieux
- Department of Genome Sciences, University of Washington, Seattle WA 98195, USA
- Department of Mathematics and Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
39
|
Wang X, Shen S, Rasam SS, Qu J. MS1 ion current-based quantitative proteomics: A promising solution for reliable analysis of large biological cohorts. MASS SPECTROMETRY REVIEWS 2019; 38:461-482. [PMID: 30920002 PMCID: PMC6849792 DOI: 10.1002/mas.21595] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 02/28/2019] [Indexed: 05/04/2023]
Abstract
The rapidly-advancing field of pharmaceutical and clinical research calls for systematic, molecular-level characterization of complex biological systems. To this end, quantitative proteomics represents a powerful tool but an optimal solution for reliable large-cohort proteomics analysis, as frequently involved in pharmaceutical/clinical investigations, is urgently needed. Large-cohort analysis remains challenging owing to the deteriorating quantitative quality and snowballing missing data and false-positive discovery of altered proteins when sample size increases. MS1 ion current-based methods, which have become an important class of label-free quantification techniques during the past decade, show considerable potential to achieve reproducible protein measurements in large cohorts with high quantitative accuracy/precision. Nonetheless, in order to fully unleash this potential, several critical prerequisites should be met. Here we provide an overview of the rationale of MS1-based strategies and then important considerations for experimental and data processing techniques, with the emphasis on (i) efficient and reproducible sample preparation and LC separation; (ii) sensitive, selective and high-resolution MS detection; iii)accurate chromatographic alignment; (iv) sensitive and selective generation of quantitative features; and (v) optimal post-feature-generation data quality control. Prominent technical developments in these aspects are discussed. Finally, we reviewed applications of MS1-based strategy in disease mechanism studies, biomarker discovery, and pharmaceutical investigations.
Collapse
Affiliation(s)
- Xue Wang
- Department of Cell Stress BiologyRoswell Park Cancer InstituteBuffaloNew York
| | - Shichen Shen
- Department of Pharmaceutical SciencesUniversity at BuffaloState University of New YorkNew YorkNew York
| | - Sailee Suryakant Rasam
- Department of Biochemistry, University at BuffaloState University of New YorkNew YorkNew York
| | - Jun Qu
- Department of Cell Stress BiologyRoswell Park Cancer InstituteBuffaloNew York
- Department of Pharmaceutical SciencesUniversity at BuffaloState University of New YorkNew YorkNew York
- Department of Biochemistry, University at BuffaloState University of New YorkNew YorkNew York
| |
Collapse
|
40
|
O’Bryon I, Tucker AE, Kaiser BLD, Wahl KL, Merkley ED. Constructing a Tandem Mass Spectral Library for Forensic Ricin Identification. J Proteome Res 2019; 18:3926-3935. [DOI: 10.1021/acs.jproteome.9b00377] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Isabelle O’Bryon
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Abigail E. Tucker
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Brooke L. D. Kaiser
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Karen L. Wahl
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Eric D. Merkley
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
41
|
Buchowiecka AK. Modified cysteine S-phosphopeptide standards for mass spectrometry-based proteomics. Amino Acids 2019; 51:1365-1375. [DOI: 10.1007/s00726-019-02773-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 08/18/2019] [Indexed: 02/06/2023]
|
42
|
Zelanis A, Silva DA, Kitano ES, Liberato T, Fukushima I, Serrano SMT, Tashima AK. A first step towards building spectral libraries as complementary tools for snake venom proteome/peptidome studies. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2019; 31:100599. [PMID: 31181499 DOI: 10.1016/j.cbd.2019.100599] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 05/29/2019] [Accepted: 05/29/2019] [Indexed: 01/31/2023]
Abstract
Snake venoms are complex mixtures of a large number of distinct proteins and peptides with biological activity. Peptide spectral libraries are compilations of previously identified MS/MS spectra obtained from proteomics experiments. Here we present the generation and use of a Venom Peptidome and a Venom Proteome spectral library for the analysis of venom proteomes and peptidomes from distinct snake species.
Collapse
Affiliation(s)
- André Zelanis
- Functional Proteomics Laboratory, Department of Science and Technology, Universidade Federal de São Paulo (ICT-UNIFESP), São José dos Campos, SP, Brazil.
| | - Débora A Silva
- Laboratório Especial de Toxinologia Aplicada, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, São Paulo, Brazil
| | - Eduardo S Kitano
- Laboratório Especial de Toxinologia Aplicada, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, São Paulo, Brazil; Laboratório de Imunologia, Hospital de Clínicas, Faculdade de Medicina, Universidade de São Paulo (HCFMUSP), São Paulo, Brazil
| | - Tarcísio Liberato
- Functional Proteomics Laboratory, Department of Science and Technology, Universidade Federal de São Paulo (ICT-UNIFESP), São José dos Campos, SP, Brazil
| | - Isabella Fukushima
- Functional Proteomics Laboratory, Department of Science and Technology, Universidade Federal de São Paulo (ICT-UNIFESP), São José dos Campos, SP, Brazil
| | - Solange M T Serrano
- Laboratório Especial de Toxinologia Aplicada, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, São Paulo, Brazil
| | - Alexandre K Tashima
- Laboratório Especial de Toxinologia Aplicada, Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Instituto Butantan, São Paulo, Brazil; Departamento de Bioquímica, Escola Paulista de Medicina, Universidade Federal de São Paulo (EPM-UNIFESP), São Paulo, SP, Brazil
| |
Collapse
|
43
|
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019; 16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Tim Van Den Bossche
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| | - Stephan Fuchs
- d FG13 Division of Nosocomial Pathogens and Antibiotic Resistances , Robert Koch Institute , Wernigerode , Germany
| | - Bernhard Y Renard
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Thilo Muth
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Lennart Martens
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| |
Collapse
|
44
|
Applications and challenges of forensic proteomics. Forensic Sci Int 2019; 297:350-363. [DOI: 10.1016/j.forsciint.2019.01.022] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 01/09/2019] [Accepted: 01/13/2019] [Indexed: 12/23/2022]
|
45
|
Ammar C, Berchtold E, Csaba G, Schmidt A, Imhof A, Zimmer R. Multi-Reference Spectral Library Yields Almost Complete Coverage of Heterogeneous LC-MS/MS Data Sets. J Proteome Res 2019; 18:1553-1566. [DOI: 10.1021/acs.jproteome.8b00819] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Constantin Ammar
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
| | - Evi Berchtold
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
| | - Gergely Csaba
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
| | - Andreas Schmidt
- Zentrallabor für Proteinanalytik (Protein Analysis Unit), Ludwig-Maximilians-Universität München, Grosshaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Axel Imhof
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
- Zentrallabor für Proteinanalytik (Protein Analysis Unit), Ludwig-Maximilians-Universität München, Grosshaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Ralf Zimmer
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
| |
Collapse
|
46
|
Deutsch EW, Perez-Riverol Y, Chalkley RJ, Wilhelm M, Tate S, Sachsenberg T, Walzer M, Käll L, Delanghe B, Böcker S, Schymanski EL, Wilmes P, Dorfer V, Kuster B, Volders PJ, Jehmlich N, Vissers JP, Wolan DW, Wang AY, Mendoza L, Shofstahl J, Dowsey AW, Griss J, Salek RM, Neumann S, Binz PA, Lam H, Vizcaíno JA, Bandeira N, Röst H. Expanding the Use of Spectral Libraries in Proteomics. J Proteome Res 2018; 17:4051-4060. [PMID: 30270626 PMCID: PMC6443480 DOI: 10.1021/acs.jproteome.8b00485] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Robert J. Chalkley
- University of California San Francisco, San Francisco, 94158, California, United States
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
| | | | - Timo Sachsenberg
- Department of Computer Science, Center for Bioinformatics, University of Tübingen, Sand 14, Tübingen, 72076, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH − Royal Institute of Technology, Stockholm 114 28, Sweden
| | - Bernard Delanghe
- Thermo Fisher Scientific Bremen, Hanna-Kunath Str. 11, 28199 Bremen, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria, Bioinformatics Research Group, Hagenberg, 4232, Austria
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
- Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich, Freising, 85354, Germany
| | | | - Nico Jehmlich
- Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany
| | | | - Dennis W. Wolan
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Ana Y. Wang
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Jim Shofstahl
- Thermo Fisher Scientific, 355 River Oaks Parkway San Jose, CA 95134
| | - Andrew W. Dowsey
- Department of Population Health Sciences and Bristol Veterinary School, Faculty of Health Sciences, University of Bristol, Bristol BS9 1BN, UK
| | - Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, Vienna 1090, Austria
| | - Reza M. Salek
- The International Agency for Research on Cancer (IARC), 150 Cours Albert Thomas, 69372 Lyon CEDEX 08, France
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Department of Stress and Developmental Biology, 06120 Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Pierre-Alain Binz
- Clinical Chemistry Service, Centre Hospitalier Universitaire Vaudois, 1011 Lausanne, Switzerland
| | - Henry Lam
- Department of Chemical and Biological Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 92093-0404, USA
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, 160 College St., Toronto, ON, M5S 3E1, Canada
| |
Collapse
|
47
|
Bittremieux W, Meysman P, Noble WS, Laukens K. Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing. J Proteome Res 2018; 17:3463-3474. [PMID: 30184435 PMCID: PMC6173621 DOI: 10.1021/acs.jproteome.8b00359] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. We present the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared with SpectraST. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo .
Collapse
Affiliation(s)
- Wout Bittremieux
- Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium
- Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - Pieter Meysman
- Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium
- Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
| | - William Stafford Noble
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
- Department of Computer Science and Engineering , University of Washington , Seattle , Washington 98195 , United States
| | - Kris Laukens
- Department of Mathematics and Computer Science , University of Antwerp , 2020 Antwerp , Belgium
- Biomedical Informatics Network Antwerpen (biomina) , 2020 Antwerp , Belgium
| |
Collapse
|
48
|
Bittremieux W, Tabb DL, Impens F, Staes A, Timmerman E, Martens L, Laukens K. Quality control in mass spectrometry-based proteomics. MASS SPECTROMETRY REVIEWS 2018; 37:697-711. [PMID: 28802010 DOI: 10.1002/mas.21544] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 07/24/2017] [Accepted: 07/24/2017] [Indexed: 05/21/2023]
Abstract
Mass spectrometry is a highly complex analytical technique and mass spectrometry-based proteomics experiments can be subject to a large variability, which forms an obstacle to obtaining accurate and reproducible results. Therefore, a comprehensive and systematic approach to quality control is an essential requirement to inspire confidence in the generated results. A typical mass spectrometry experiment consists of multiple different phases including the sample preparation, liquid chromatography, mass spectrometry, and bioinformatics stages. We review potential sources of variability that can impact the results of a mass spectrometry experiment occurring in all of these steps, and we discuss how to monitor and remedy the negative influences on the experimental results. Furthermore, we describe how specialized quality control samples of varying sample complexity can be incorporated into the experimental workflow and how they can be used to rigorously assess detailed aspects of the instrument performance.
Collapse
Affiliation(s)
- Wout Bittremieux
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (Biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - David L Tabb
- Division of Molecular Biology and Human Genetics, Stellenbosch University Faculty of Medicine and Health Sciences, Tygerberg Hospital, Cape Town, South Africa
| | - Francis Impens
- VIB Proteomics Core, Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
- Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium
| | - An Staes
- VIB Proteomics Core, Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
- Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Evy Timmerman
- VIB Proteomics Core, Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
- Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
- Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Zwijnaarde, Belgium
| | - Kris Laukens
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (Biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
49
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
50
|
Perez‐Riverol Y, Vizcaíno JA, Griss J. Future Prospects of Spectral Clustering Approaches in Proteomics. Proteomics 2018; 18:e1700454. [PMID: 29882266 PMCID: PMC6099476 DOI: 10.1002/pmic.201700454] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 05/23/2018] [Indexed: 12/14/2022]
Abstract
In this article, current and future applications of spectral clustering are discussed in the context of mass spectrometry-based proteomics approaches. First of all, the main algorithms and tools that can currently be used to perform spectral clustering are introduced. In addition, its main applications and their use in current computational proteomics workflows are explained, including the generation of spectral libraries and spectral archives. Finally, possible future directions for spectral clustering, including its potential use to achieve a deeper coverage of the proteome and the discovery of novel post-translational modifications and single amino acid variants.
Collapse
Affiliation(s)
- Yasset Perez‐Riverol
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
| | - Juan Antonio Vizcaíno
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
| | - Johannes Griss
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)Wellcome Trust Genome CampusHinxtonCambridgeCB10 1SDUK
- Division of ImmunologyAllergy and Infectious DiseasesDepartment of DermatologyMedical University of Vienna1090ViennaAustria
| |
Collapse
|