1
|
Fields L, Dang TC, Tran VNH, Ibarra AE, Li L. Decoding Neuropeptide Complexity: Advancing Neurobiological Insights from Invertebrates to Vertebrates through Evolutionary Perspectives. ACS Chem Neurosci 2025. [PMID: 40261092 DOI: 10.1021/acschemneuro.5c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/24/2025] Open
Abstract
Neuropeptides are vital signaling molecules involved in neural communication, hormonal regulation, and stress response across diverse taxa. Despite their critical roles, neuropeptide research remains challenging due to their low abundance, complex post-translational modifications (PTMs), and dynamic expression patterns. Mass spectrometry (MS)-based neuropeptidomics has revolutionized peptide identification and quantification, enabling the high-throughput characterization of neuropeptides and their PTMs. However, the complexity of vertebrate neural networks poses significant challenges for functional studies. Invertebrate models, such as Cancer borealis, Drosophila melanogaster, and Caenorhabditis elegans, offer simplified neural circuits, well-characterized systems, and experimental tools for elucidating the functional roles of neuropeptides. These models have revealed conserved neuropeptide families, including allatostatins, RFamides, and tachykinin-related peptides, whose vertebrate homologues regulate analogous physiological functions. Recent advancements in MS techniques, including ion mobility spectrometry and MALDI MS imaging, have further enhanced the spatial and temporal resolution of neuropeptide analysis, allowing for insights into peptide signaling systems. Invertebrate neuropeptide research not only expands our understanding of conserved neuropeptide functions but also informs translational applications including the development of peptide-based therapeutics. This review highlights the utility of invertebrate models in neuropeptide discovery, emphasizing their contributions to uncovering fundamental biological principles and their relevance to vertebrate systems.
Collapse
Affiliation(s)
- Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Tina C Dang
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Vu Ngoc Huong Tran
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Angel E Ibarra
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
- Lachman Institute for Pharmaceutical Development, School of Pharmacy, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
- Wisconsin Center for NanoBioSystems, School of Pharmacy, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
2
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
3
|
Walmsley SJ, Guo J, Tarifa A, DeCaprio AP, Cooke MS, Turesky RJ, Villalta PW. Mass Spectral Library for DNA Adductomics. Chem Res Toxicol 2024; 37:302-310. [PMID: 38231175 PMCID: PMC10939812 DOI: 10.1021/acs.chemrestox.3c00302] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]
Abstract
Endogenous electrophiles, ionizing and non-ionizing radiation, and hazardous chemicals present in the environment and diet can damage DNA by forming covalent adducts. DNA adducts can form in critical cancer driver genes and, if not repaired, may induce mutations during cell division, potentially leading to the onset of cancer. The detection and quantification of specific DNA adducts are some of the first steps in studying their role in carcinogenesis, the physiological conditions that lead to their production, and the risk assessment of exposure to specific genotoxic chemicals. Hundreds of different DNA adducts have been reported in the literature, and there is a critical need to establish a DNA adduct mass spectral database to facilitate the detection of previously observed DNA adducts and characterize newly discovered DNA adducts. We have collected synthetic DNA adduct standards from the research community, acquired MSn (n = 2, 3) fragmentation spectra using Orbitrap and Quadrupole-Time-of-Flight (Q-TOF) MS instrumentation, processed the spectral data and incorporated it into the MassBank of North America (MoNA) database, and created a DNA adduct portal Web site (https://sites.google.com/umn.edu/dnaadductportal) to serve as a central location for the DNA adduct mass spectra and metadata, including the spectral database downloadable in different formats. This spectral library should prove to be a valuable resource for the DNA adductomics community, accelerating research and improving our understanding of the role of DNA adducts in disease.
Collapse
Affiliation(s)
- Scott J Walmsley
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota 55455, United States
- Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Jingshu Guo
- Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
- Department of Medicinal Chemistry, College of Pharmacy, University of Minnesota, Minneapolis, Minnesota 55455, United States
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | - Anamary Tarifa
- Forensic & Analytical Toxicology Facility, Department of Chemistry and Biochemistry, Florida International University, Miami, Florida 33199, United States
| | - Anthony P DeCaprio
- Forensic & Analytical Toxicology Facility, Department of Chemistry and Biochemistry, Florida International University, Miami, Florida 33199, United States
| | - Marcus S Cooke
- Oxidative Stress Group, Department of Molecular Biosciences, University of South Florida, Tampa, Florida 33620, United States
| | - Robert J Turesky
- Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
- Department of Medicinal Chemistry, College of Pharmacy, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Peter W Villalta
- Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota 55455, United States
- Department of Medicinal Chemistry, College of Pharmacy, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
4
|
Ye J, He X, Wang S, Dong MQ, Wu F, Lu S, Feng F. Test-Time Training for Deep MS/MS Spectrum Prediction Improves Peptide Identification. J Proteome Res 2024; 23:550-559. [PMID: 38153036 DOI: 10.1021/acs.jproteome.3c00229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
In bottom-up proteomics, peptide-spectrum matching is critical for peptide and protein identification. Recently, deep learning models have been used to predict tandem mass spectra of peptides, enabling the calculation of similarity scores between the predicted and experimental spectra for peptide-spectrum matching. These models follow the supervised learning paradigm, which trains a general model using paired peptides and spectra from standard data sets and directly employs the model on experimental data. However, this approach can lead to inaccurate predictions due to differences between the training data and the experimental data, such as sample types, enzyme specificity, and instrument calibration. To tackle this problem, we developed a test-time training paradigm that adapts the pretrained model to generate experimental data-specific models, namely, PepT3. PepT3 yields a 10-40% increase in peptide identification depending on the variability in training and experimental data. Intriguingly, when applied to a patient-derived immunopeptidomic sample, PepT3 increases the identification of tumor-specific immunopeptide candidates by 60%. Two-thirds of the newly identified candidates are predicted to bind to the patient's human leukocyte antigen isoforms. To facilitate access of the model and all the results, we have archived all the intermediate files in Zenodo.org with identifier 8231084.
Collapse
Affiliation(s)
- Jianbai Ye
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Xiangnan He
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Shujuan Wang
- National Institute of Biological Sciences, Beijing 102206, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing 102206, China
| | - Feng Wu
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Shan Lu
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, California 92093, United States
| | - Fuli Feng
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
5
|
Révész Á, Hevér H, Steckel A, Schlosser G, Szabó D, Vékey K, Drahos L. Collision energies: Optimization strategies for bottom-up proteomics. MASS SPECTROMETRY REVIEWS 2023; 42:1261-1299. [PMID: 34859467 DOI: 10.1002/mas.21763] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 06/07/2023]
Abstract
Mass-spectrometry coupled to liquid chromatography is an indispensable tool in the field of proteomics. In the last decades, more and more complex and diverse biochemical and biomedical questions have arisen. Problems to be solved involve protein identification, quantitative analysis, screening of low abundance modifications, handling matrix effect, and concentrations differing by orders of magnitude. This led the development of more tailored protocols and problem centered proteomics workflows, including advanced choice of experimental parameters. In the most widespread bottom-up approach, the choice of collision energy in tandem mass spectrometric experiments has outstanding role. This review presents the collision energy optimization strategies in the field of proteomics which can help fully exploit the potential of MS based proteomics techniques. A systematic collection of use case studies is then presented to serve as a starting point for related further scientific work. Finally, this article discusses the issue of comparing results from different studies or obtained on different instruments, and it gives some hints on methodology transfer between laboratories based on measurement of reference species.
Collapse
Affiliation(s)
- Ágnes Révész
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - Helga Hevér
- Chemical Works of Gedeon Richter Plc, Budapest, Hungary
| | - Arnold Steckel
- Department of Analytical Chemistry, MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Gitta Schlosser
- Department of Analytical Chemistry, MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Institute of Chemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Dániel Szabó
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - Károly Vékey
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| | - László Drahos
- MS Proteomics Research Group, Institute of Organic Chemistry, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
6
|
Affiliation(s)
- Bruna Gomes
- From the Departments of Medicine, Genetics, and Biomedical Data Science, Stanford University, Stanford, CA (B.G., E.A.A.); and the Department of Cardiology, Pneumology, and Angiology, Heidelberg University Hospital, Heidelberg, Germany (B.G.)
| | - Euan A Ashley
- From the Departments of Medicine, Genetics, and Biomedical Data Science, Stanford University, Stanford, CA (B.G., E.A.A.); and the Department of Cardiology, Pneumology, and Angiology, Heidelberg University Hospital, Heidelberg, Germany (B.G.)
| |
Collapse
|
7
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
8
|
Kong W, Hui HWH, Peng H, Goh WWB. Dealing with missing values in proteomics data. Proteomics 2022; 22:e2200092. [PMID: 36349819 DOI: 10.1002/pmic.202200092] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022]
Abstract
Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.
Collapse
Affiliation(s)
- Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,Centre for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
9
|
Shin H, Park Y, Ahn K, Kim S. Accurate Prediction of y Ions in Beam-Type Collision-Induced Dissociation Using Deep Learning. Anal Chem 2022; 94:7752-7758. [PMID: 35609248 PMCID: PMC9178553 DOI: 10.1021/acs.analchem.1c03184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Peptide fragmentation spectra contain critical information for the identification of peptides by mass spectrometry. In this study, we developed an algorithm that more accurately predicts the high-intensity peaks among the peptide spectra. The training data are composed of 180,833 peptides from the National Institute of Standards and Technology and Proteomics Identification database, which were fragmented by either quadrupole time-of-flight or triple-quadrupole collision-induced dissociation methods. Exploratory analysis of the peptide fragmentation pattern was focused on the highest intensity peaks that showed proline, peptide length, and a sliding window of four amino acid combination that can be exploited as key features. The amino acid sequence of each peptide and each of the key features were allocated to different layers of the model, where recurrent neural network, convolutional neural network, and fully connected neural network were used. The trained model, PrAI-frag, accurately predicts the fragmentation spectra compared to previous machine learning-based prediction algorithms. The model excels at high-intensity peak prediction, which is advantageous to selective/multiple reaction monitoring application. PrAI-frag is provided via a Web server which can be used for peptides of length 6-15.
Collapse
Affiliation(s)
- HyeonSeok Shin
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Youngmin Park
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Kyunggeun Ahn
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Sungsoo Kim
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| |
Collapse
|
10
|
Yang Y, Lin L, Qiao L. Deep learning approaches for data-independent acquisition proteomics. Expert Rev Proteomics 2021; 18:1031-1043. [PMID: 34918987 DOI: 10.1080/14789450.2021.2020654] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
INTRODUCTION Data-independent acquisition (DIA) is an emerging technology for large-scale proteomic studies. DIA data analysis methods are evolving rapidly, and deep learning has cut a conspicuous figure in this field. AREAS COVERED This review discusses and provides an overview of the deep learning methods that are used for DIA data analysis, including spectral library prediction, feature scoring, and statistical control in peptide-centric analysis, as well as de novo peptide sequencing. Literature searches were performed for articles, including preprints, up to December 2021 from PubMed, Scopus, and Web of Science databases. EXPERT OPINION While spectral library prediction has broken through the limitation on proteome coverage of experimental libraries, the statistical burden due to the large query space is the remaining challenge of utilizing proteome-wide predicted libraries. Analysis of post-translational modifications is another promising direction of deep learning-based DIA methods.
Collapse
Affiliation(s)
- Yi Yang
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Ling Lin
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Liang Qiao
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| |
Collapse
|
11
|
Abstract
Mass-spectrometry-based proteomics enables quantitative analysis of thousands of human proteins. However, experimental and computational challenges restrict progress in the field. This review summarizes the recent flurry of machine-learning strategies using artificial deep neural networks (or "deep learning") that have started to break barriers and accelerate progress in the field of shotgun proteomics. Deep learning now accurately predicts physicochemical properties of peptides from their sequence, including tandem mass spectra and retention time. Furthermore, deep learning methods exist for nearly every aspect of the modern proteomics workflow, enabling improved feature selection, peptide identification, and protein inference.
Collapse
Affiliation(s)
- Jesse G. Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
12
|
Borges R, Colby SM, Das S, Edison AS, Fiehn O, Kind T, Lee J, Merrill AT, Merz KM, Metz TO, Nunez JR, Tantillo DJ, Wang LP, Wang S, Renslow RS. Quantum Chemistry Calculations for Metabolomics. Chem Rev 2021; 121:5633-5670. [PMID: 33979149 PMCID: PMC8161423 DOI: 10.1021/acs.chemrev.0c00901] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Indexed: 02/07/2023]
Abstract
A primary goal of metabolomics studies is to fully characterize the small-molecule composition of complex biological and environmental samples. However, despite advances in analytical technologies over the past two decades, the majority of small molecules in complex samples are not readily identifiable due to the immense structural and chemical diversity present within the metabolome. Current gold-standard identification methods rely on reference libraries built using authentic chemical materials ("standards"), which are not available for most molecules. Computational quantum chemistry methods, which can be used to calculate chemical properties that are then measured by analytical platforms, offer an alternative route for building reference libraries, i.e., in silico libraries for "standards-free" identification. In this review, we cover the major roadblocks currently facing metabolomics and discuss applications where quantum chemistry calculations offer a solution. Several successful examples for nuclear magnetic resonance spectroscopy, ion mobility spectrometry, infrared spectroscopy, and mass spectrometry methods are reviewed. Finally, we consider current best practices, sources of error, and provide an outlook for quantum chemistry calculations in metabolomics studies. We expect this review will inspire researchers in the field of small-molecule identification to accelerate adoption of in silico methods for generation of reference libraries and to add quantum chemistry calculations as another tool at their disposal to characterize complex samples.
Collapse
Affiliation(s)
- Ricardo
M. Borges
- Walter
Mors Institute of Research on Natural Products, Federal University of Rio de Janeiro, Rio de Janeiro 21941-901, Brazil
| | - Sean M. Colby
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Susanta Das
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Arthur S. Edison
- Departments
of Genetics and Biochemistry and Molecular Biology, Complex Carbohydrate
Research Center and Institute of Bioinformatics, University of Georgia, Athens, Georgia 30602, United States
| | - Oliver Fiehn
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
| | - Tobias Kind
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
| | - Jesi Lee
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Amy T. Merrill
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Thomas O. Metz
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Jamie R. Nunez
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Dean J. Tantillo
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Lee-Ping Wang
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Shunyang Wang
- West
Coast Metabolomics Center for Compound Identification, UC Davis Genome
Center, University of California, Davis, California 95616, United States
- Department
of Chemistry, University of California, Davis, California 95616, United States
| | - Ryan S. Renslow
- Biological
Science Division, Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
13
|
Chen ZL, Mao PZ, Zeng WF, Chi H, He SM. pDeepXL: MS/MS Spectrum Prediction for Cross-Linked Peptide Pairs by Deep Learning. J Proteome Res 2021; 20:2570-2582. [PMID: 33821641 DOI: 10.1021/acs.jproteome.0c01004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In cross-linking mass spectrometry, the identification of cross-linked peptide pairs heavily relies on the ability of a database search engine to measure the similarities between experimental and theoretical MS/MS spectra. However, the lack of accurate ion intensities in theoretical spectra impairs the performance of search engines, in particular, on proteome scales. Here we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. To train pDeepXL, we used the transfer-learning technique because it facilitated the training with limited benchmark data of cross-linked peptide pairs. Test results on more than ten data sets showed that pDeepXL accurately predicted the spectra of both noncleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson's r values higher than 0.9) and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have Pearson's r values higher than 0.9). pDeepXL also achieved the accurate prediction on unseen data sets using an online fine-tuning technique. Lastly, integrating pDeepXL into a database search engine increased the number of identified cross-link spectra by 18% on average.
Collapse
Affiliation(s)
- Zhen-Lin Chen
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng-Zhi Mao
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
14
|
Wilburn DB, Richards AL, Swaney DL, Searle BC. CIDer: A Statistical Framework for Interpreting Differences in CID and HCD Fragmentation. J Proteome Res 2021; 20:1951-1965. [PMID: 33729787 DOI: 10.1021/acs.jproteome.0c00964] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Library searching is a powerful technique for detecting peptides using either data independent or data dependent acquisition. While both large-scale spectrum library curators and deep learning prediction approaches have focused on beam-type CID fragmentation (HCD), resonance CID fragmentation remains a popular technique. Here we demonstrate an approach to model the differences between HCD and CID spectra, and present a software tool, CIDer, for converting libraries between the two fragmentation methods. We demonstrate that just using a combination of simple linear models and basic principles of peptide fragmentation, we can explain up to 43% of the variation between ions fragmented by HCD and CID across an array of collision energy settings. We further show that in some circumstances, searching converted CID libraries can detect more peptides than searching existing CID libraries or libraries of machine learning predictions from FASTA databases. These results suggest that leveraging information in existing libraries by converting between HCD and CID libraries may be an effective interim solution while large-scale CID libraries are being developed.
Collapse
Affiliation(s)
- Damien B Wilburn
- Institute for Systems Biology, Seattle, Washington 98109, United States.,Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Alicia L Richards
- Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California 94158, United States.,J. David Gladstone Institutes, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California 94158, United States
| | - Danielle L Swaney
- Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California 94158, United States.,J. David Gladstone Institutes, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California 94158, United States
| | - Brian C Searle
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
15
|
Wang L, Liu K, Li S, Tang H. A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing. Proteomics 2020; 20:e2000002. [PMID: 32415809 PMCID: PMC7669687 DOI: 10.1002/pmic.202000002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 04/17/2020] [Indexed: 01/07/2023]
Abstract
With the accumulation of MS/MS spectra collected in spectral libraries, the spectral library searching approach emerges as an important approach for peptide identification in proteomics, complementary to the commonly used protein database searching approach, in particular for the proteomic analyses of well-studied model organisms, such as human. Existing spectral library searching algorithms compare a query MS/MS spectrum with each spectrum in the library with matched precursor mass and charge state, which may become computationally intensive with the rapidly growing library size. Here, the software msSLASH, which implements a fast spectral library searching algorithm based on the Locality-Sensitive Hashing (LSH) technique, is presented. The algorithm first converts the library and query spectra into bit-strings using LSH functions, and then computes the similarity between the spectra with highly similar bit-string. Using the spectral library searching of large real-world MS/MS spectra datasets, it is demonstrated that the algorithm significantly reduced the number of spectral comparisons, and as a result, achieved 2-9X speedup in comparison with existing spectral library searching algorithm SpectraST. The spectral searching algorithm is implemented in C/C++, and is ready to be used in proteomic data analyses.
Collapse
Affiliation(s)
- Lei Wang
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Kaiyuan Liu
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Sujun Li
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| | - Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA
| |
Collapse
|
16
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
17
|
Wang S, Li W, Hu L, Cheng J, Yang H, Liu Y. NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses. Nucleic Acids Res 2020; 48:e83. [PMID: 32526036 PMCID: PMC7641313 DOI: 10.1093/nar/gkaa498] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 04/20/2020] [Accepted: 06/08/2020] [Indexed: 02/05/2023] Open
Abstract
Mass spectrometry (MS)-based quantitative proteomics experiments frequently generate data with missing values, which may profoundly affect downstream analyses. A wide variety of imputation methods have been established to deal with the missing-value issue. To date, however, there is a scarcity of efficient, systematic, and easy-to-handle tools that are tailored for proteomics community. Herein, we developed a user-friendly and powerful stand-alone software, NAguideR, to enable implementation and evaluation of different missing value methods offered by 23 widely used missing-value imputation algorithms. NAguideR further evaluates data imputation results through classic computational criteria and, unprecedentedly, proteomic empirical criteria, such as quantitative consistency between different charge-states of the same peptide, different peptides belonging to the same proteins, and individual proteins participating protein complexes and functional interactions. We applied NAguideR into three label-free proteomic datasets featuring peptide-level, protein-level, and phosphoproteomic variables respectively, all generated by data independent acquisition mass spectrometry (DIA-MS) with substantial biological replicates. The results indicate that NAguideR is able to discriminate the optimal imputation methods that are facilitating DIA-MS experiments over those sub-optimal and low-performance algorithms. NAguideR further provides downloadable tables and figures supporting flexible data analysis and interpretation. NAguideR is freely available at http://www.omicsolution.org/wukong/NAguideR/ and the source code: https://github.com/wangshisheng/NAguideR/.
Collapse
Affiliation(s)
- Shisheng Wang
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Wenxue Li
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Liqiang Hu
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jingqiu Cheng
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Hao Yang
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yansheng Liu
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA.,Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
18
|
Xu R, Sheng J, Bai M, Shu K, Zhu Y, Chang C. A Comprehensive Evaluation of MS/MS Spectrum Prediction Tools for Shotgun Proteomics. Proteomics 2020; 20:e1900345. [DOI: 10.1002/pmic.201900345] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 04/29/2020] [Indexed: 01/27/2023]
Affiliation(s)
- Rui Xu
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Jie Sheng
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Mingze Bai
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Kunxian Shu
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Yunping Zhu
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
| | - Cheng Chang
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
| |
Collapse
|
19
|
A new opening for the tricky untargeted investigation of natural and modified short peptides. Talanta 2020; 219:121262. [PMID: 32887153 DOI: 10.1016/j.talanta.2020.121262] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 06/08/2020] [Accepted: 06/09/2020] [Indexed: 12/16/2022]
Abstract
Short peptides are of extreme interest in clinical and food research fields, nevertheless they still represent a crucial analytical issue. The main aim of this paper was the development of an analytical platform for a considerable advancement in short peptides identification. For the first time, short sequences presenting both natural and post-translationally modified amino acids were comprehensively studied thanks to the generation of specific databases. Short peptide databases had a dual purpose. First, they were employed as inclusion lists for a suspect screening mass-spectrometric analysis, overcoming the limits of data dependent acquisition mode and allowing the fragmentation of such low-abundance substances. Moreover, the databases were implemented in Compound Discoverer 3.0, a software dedicated to the analysis of short molecules, for the creation of a data processing workflow specifically dedicated to short peptide tentative identification. For this purpose, a detailed study of short peptide fragmentation pathways was carried out for the first time. The proposed method was applied to the study of short peptide sequences in enriched urine samples and led to the tentative identification more than 200 short natural and modified short peptides, the highest number ever reported.
Collapse
|
20
|
Ramachandran S, Thomas T. A Frequency-Based Approach to Predict the Low-Energy Collision-Induced Dissociation Fragmentation Spectra. ACS OMEGA 2020; 5:12615-12622. [PMID: 32548445 PMCID: PMC7288360 DOI: 10.1021/acsomega.9b03935] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 05/12/2020] [Indexed: 06/11/2023]
Abstract
Peptide identification algorithms rely on the comparison between the experimental tandem mass spectrometry spectrum and the theoretical spectrum to identify a peptide from the tandem mass spectra. Hence, it is important to understand the fragmentation process and predict the tandem mass spectra for high-throughput proteomics research. In this study, a novel method was developed to predict the theoretical ion trap collision-induced dissociation (CID) tandem mass spectra of the singly, doubly, and triply charged tryptic peptides. The fragmentation statistics of the ion trap CID spectra were used to predict the theoretical tandem mass spectra of the peptide sequence. The study estimated the relative cleavage frequency for each pair of adjacent amino acids along the peptide length. The study showed that the cleavage frequency can be directly used to predict the tandem mass spectra. The predicted spectra show a high correlation with the experimental spectra used in this study; 99.73% of the high-quality reference spectra have correlation scores greater than 0.8. The new method predicts the theoretical spectrum and correlates significantly better with the experimental spectrum as compared to the existing spectrum prediction tools OpenMS_Simulator, MS2PIP, and MS2PBPI, where only 80, 85.76, and 85.80% of the spectral count, respectively, has a correlation score greater than 0.8.
Collapse
|
21
|
Software-aided detection and structural characterization of cyclic peptide metabolites in biological matrix by high-resolution mass spectrometry. J Pharm Anal 2020; 10:240-246. [PMID: 32612870 PMCID: PMC7322757 DOI: 10.1016/j.jpha.2020.05.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 05/25/2020] [Accepted: 05/25/2020] [Indexed: 11/21/2022] Open
Abstract
Compared to their linear counterparts, cyclic peptides show better biological activities, such as antibacterial, immunosuppressive, and anti-tumor activities, and pharmaceutical properties due to their conformational rigidity. However, cyclic peptides could form numerous putative metabolites from potential hydrolytic cleavages and their fragments are very difficult to interpret. These characteristics pose a great challenge when analyzing metabolites of cyclic peptides by mass spectrometry. This study was to assess and apply a software-aided analytical workflow for the detection and structural characterization of cyclic peptide metabolites. Insulin and atrial natriuretic peptide (ANP) as model cyclic peptides were incubated with trypsin/chymotrypsin and/or rat liver S9, followed by data acquisition using TripleTOF® 5600. Resultant full-scan MS and MS/MS datasets were automatically processed through a combination of targeted and untargeted peak finding strategies. MS/MS spectra of predicted metabolites were interrogated against putative metabolite sequences, in light of a, b, y and internal fragment series. The resulting fragment assignments led to the confirmation and ranking of the metabolite sequences and identification of metabolic modification. As a result, 29 metabolites with linear or cyclic structures were detected in the insulin incubation with the hydrolytic enzymes. Sequences of twenty insulin metabolites were further determined, which were consistent with the hydrolytic sites of these enzymes. In the same manner, multiple metabolites of insulin and ANP formed in rat liver S9 incubation were detected and structurally characterized, some of which have not been previously reported. The results demonstrated the utility of software-aided data processing tool in detection and identification of cyclic peptide metabolites. A software-aided workflow enabling detection and characterization of cyclic peptide metabolites by LC/HRMS. Automatically data processing through a combination of targeted and untargeted peak finding strategies. MS/MS spectra of predicted metabolites interrogated against putative metabolite sequences. Rapidly determining metabolite profiles of insulin and atrial natriuretic peptide in rat liver S9. Potentially applicable to metabolic soft spot analysis and in vitro metabolism across species in drug discovery.
Collapse
|
22
|
Liu K, Li S, Wang L, Ye Y, Tang H. Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network. Anal Chem 2020; 92:4275-4283. [PMID: 32053352 DOI: 10.1021/acs.analchem.9b04867] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The ability to predict tandem mass (MS/MS) spectra from peptide sequences can significantly enhance our understanding of the peptide fragmentation process and could improve peptide identification in proteomics. However, current approaches for predicting high-energy collisional dissociation (HCD) spectra are limited to predict the intensities of expected ion types, that is, the a/b/c/x/y/z ions and their neutral loss derivatives (referred to as backbone ions). In practice, backbone ions only account for <70% of total ion intensities in HCD spectra, indicating many intense ions are ignored by current predictors. In this paper, we present a deep learning approach that can predict the complete spectra (both backbone and nonbackbone ions) directly from peptide sequences. We made no assumptions or expectations on which kind of ions to predict but instead predicting the intensities for all possible m/z. Training this model needs no annotations of fragment ion nor any prior knowledge of the fragmentation rules. Our analyses show that the predicted 2+ and 3+ HCD spectra are highly similar to the experimental spectra, with average full-spectrum cosine similarities of 0.820 (±0.088) and 0.786 (±0.085), respectively, very close to the similarities between the experimental replicated spectra. In contrast, the best-performed backbone only models can only achieve an average similarity below 0.75 and 0.70 for 2+ and 3+ spectra, respectively. Furthermore, we developed a multitask learning (MTL) approach for predicting spectra of insufficient training samples, which allows our model to make accurate predictions for electron transfer dissociation (ETD) spectra and HCD spectra of less abundant charges (1+ and 4+).
Collapse
Affiliation(s)
- Kaiyuan Liu
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Sujun Li
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Lei Wang
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Yuzhen Ye
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Haixu Tang
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
23
|
Yang Y, Liu X, Shen C, Lin Y, Yang P, Qiao L. In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun 2020; 11:146. [PMID: 31919359 PMCID: PMC6952453 DOI: 10.1038/s41467-019-13866-z] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Accepted: 12/04/2019] [Indexed: 11/12/2022] Open
Abstract
Data-independent acquisition (DIA) is an emerging technology for quantitative proteomic analysis of large cohorts of samples. However, sample-specific spectral libraries built by data-dependent acquisition (DDA) experiments are required prior to DIA analysis, which is time-consuming and limits the identification/quantification by DIA to the peptides identified by DDA. Herein, we propose DeepDIA, a deep learning-based approach to generate in silico spectral libraries for DIA analysis. We demonstrate that the quality of in silico libraries predicted by instrument-specific models using DeepDIA is comparable to that of experimental libraries, and outperforms libraries generated by global models. With peptide detectability prediction, in silico libraries can be built directly from protein sequence databases. We further illustrate that DeepDIA can break through the limitation of DDA on peptide/protein detection, and enhance DIA analysis on human serum samples compared to the state-of-the-art protocol using a DDA library. We expect this work expanding the toolbox for DIA proteomics.
Collapse
Affiliation(s)
- Yi Yang
- Department of Chemistry, Shanghai Stomatological Hospital, and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200000, China
| | - Xiaohui Liu
- Department of Chemistry, Shanghai Stomatological Hospital, and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200000, China
| | - Chengpin Shen
- Shanghai Omicsolution Co., Ltd., Shanghai, 200000, China
| | - Yu Lin
- College of Engineering and Computer Science, The Australian National University, Canberra, ACT 0200, Australia
| | - Pengyuan Yang
- Department of Chemistry, Shanghai Stomatological Hospital, and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200000, China
| | - Liang Qiao
- Department of Chemistry, Shanghai Stomatological Hospital, and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200000, China.
| |
Collapse
|
24
|
Lin YM, Chen CT, Chang JM. MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks. BMC Genomics 2019; 20:906. [PMID: 31874640 PMCID: PMC6929458 DOI: 10.1186/s12864-019-6297-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 11/15/2019] [Indexed: 01/22/2023] Open
Abstract
Background Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. Results We propose MS2CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS2CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS2PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, MS2PIP is significantly better than both MS2PIP and pDeep. Conclusions We showed that MS2CNN outperforms MS2PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS2CNN, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.
Collapse
Affiliation(s)
- Yang-Ming Lin
- Department of Computer Science, National Chengchi University, 11605, Taipei City, Taiwan
| | - Ching-Tai Chen
- Institute of Information Science, Academia Sinica, 115, Taipei City, Taiwan
| | - Jia-Ming Chang
- Department of Computer Science, National Chengchi University, 11605, Taipei City, Taiwan.
| |
Collapse
|
25
|
Zeng WF, Zhou XX, Zhou WJ, Chi H, Zhan J, He SM. MS/MS Spectrum Prediction for Modified Peptides Using pDeep2 Trained by Transfer Learning. Anal Chem 2019; 91:9724-9731. [DOI: 10.1021/acs.analchem.9b01262] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- Wen-Feng Zeng
- University of Chinese Academy of Sciences, 100190 Beijing, China
| | - Xie-Xuan Zhou
- University of Chinese Academy of Sciences, 100190 Beijing, China
| | - Wen-Jing Zhou
- University of Chinese Academy of Sciences, 100190 Beijing, China
| | - Hao Chi
- University of Chinese Academy of Sciences, 100190 Beijing, China
| | - Jianfeng Zhan
- University of Chinese Academy of Sciences, 100190 Beijing, China
| | - Si-Min He
- University of Chinese Academy of Sciences, 100190 Beijing, China
| |
Collapse
|
26
|
Liang T, Leung LM, Opene B, Fondrie WE, Lee YI, Chandler CE, Yoon SH, Doi Y, Ernst RK, Goodlett DR. Rapid Microbial Identification and Antibiotic Resistance Detection by Mass Spectrometric Analysis of Membrane Lipids. Anal Chem 2019; 91:1286-1294. [PMID: 30571097 DOI: 10.1021/acs.analchem.8b02611] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Infectious diseases have a substantial global health impact. Clinicians need rapid and accurate diagnoses of infections to direct patient treatment and improve antibiotic stewardship. Current technologies employed in routine diagnostics are based on bacterial culture followed by morphological trait differentiation and biochemical testing, which can be time-consuming and labor-intensive. With advances in mass spectrometry (MS) for clinical diagnostics, the U.S. Food and Drug Administration has approved two microbial identification platforms based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS analysis of microbial proteins. We recently reported a novel and complementary approach by comparing MALDI-TOF mass spectra of microbial membrane lipid fingerprints to identify ESKAPE pathogens. However, this lipid-based approach used a sample preparation method that required more than a working day from sample collection to identification. Here, we report a new method that extracts lipids efficiently and rapidly from microbial membranes using an aqueous sodium acetate (SA) buffer that can be used to identify clinically relevant Gram-positive and -negative pathogens and fungal species in less than an hour. The SA method also has the ability to differentiate antibiotic-susceptible and antibiotic-resistant strains, directly identify microbes from biological specimens, and detect multiple pathogens in a mixed sample. These results should have positive implications for the manner in which bacteria and fungi are identified in general hospital settings and intensive care units.
Collapse
Affiliation(s)
- Tao Liang
- Department of Pharmaceutical Sciences, School of Pharmacy , University of Maryland , Baltimore , Maryland 20742 , United States
| | - Lisa M Leung
- Department of Microbial Pathogenesis, School of Dentistry , University of Maryland , Baltimore , Maryland 20742 , United States.,Divisions of Microbiology and Molecular Biology, Laboratories Administration , Maryland Department of Health , Baltimore , Maryland 21215 , United States
| | - Belita Opene
- Department of Microbial Pathogenesis, School of Dentistry , University of Maryland , Baltimore , Maryland 20742 , United States
| | - William E Fondrie
- Center for Vascular and Inflammatory Diseases , University of Maryland , Baltimore , Maryland 20742 , United States
| | - Young In Lee
- Department of Microbial Pathogenesis, School of Dentistry , University of Maryland , Baltimore , Maryland 20742 , United States
| | - Courtney E Chandler
- Department of Microbial Pathogenesis, School of Dentistry , University of Maryland , Baltimore , Maryland 20742 , United States
| | - Sung Hwan Yoon
- Department of Microbial Pathogenesis, School of Dentistry , University of Maryland , Baltimore , Maryland 20742 , United States
| | - Yohei Doi
- Division of Infectious Diseases, School of Medicine , University of Pittsburgh , Pittsburgh , Pennsylvania 15260 , United States
| | - Robert K Ernst
- Department of Microbial Pathogenesis, School of Dentistry , University of Maryland , Baltimore , Maryland 20742 , United States
| | - David R Goodlett
- Department of Pharmaceutical Sciences, School of Pharmacy , University of Maryland , Baltimore , Maryland 20742 , United States
| |
Collapse
|
27
|
Wang H, Leeming MG, Ho J, Donald WA. Origin and Prediction of Highly Specific Bond Cleavage Sites in the Thermal Activation of Intact Protein Ions. Chemistry 2018; 25:823-834. [DOI: 10.1002/chem.201804668] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Indexed: 11/09/2022]
Affiliation(s)
- Huixin Wang
- School of Chemistry University of New South Wales Sydney New South Wales Australia
| | - Michael G. Leeming
- School of Chemistry, Bio21 Institute of Molecular Science and Biotechnology The University of Melbourne Melbourne Victoria Australia
| | - Junming Ho
- School of Chemistry University of New South Wales Sydney New South Wales Australia
| | - William A. Donald
- School of Chemistry University of New South Wales Sydney New South Wales Australia
| |
Collapse
|
28
|
Zhou XX, Zeng WF, Chi H, Luo C, Liu C, Zhan J, He SM, Zhang Z. pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning. Anal Chem 2017; 89:12690-12697. [DOI: 10.1021/acs.analchem.7b02566] [Citation(s) in RCA: 128] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Xie-Xuan Zhou
- State
Key Laboratory of Computer Architecture, Institute of Computing Technology
(ICT), Chinese Academy of Sciences (CAS), Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Feng Zeng
- University of Chinese Academy of Sciences, Beijing, China
- Key
Laboratory of Intelligent Information Processing of CAS, ICT, Chinese Academy of Sciences, Beijing 100190, China
| | - Hao Chi
- University of Chinese Academy of Sciences, Beijing, China
- Key
Laboratory of Intelligent Information Processing of CAS, ICT, Chinese Academy of Sciences, Beijing 100190, China
| | - Chunjie Luo
- State
Key Laboratory of Computer Architecture, Institute of Computing Technology
(ICT), Chinese Academy of Sciences (CAS), Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chao Liu
- University of Chinese Academy of Sciences, Beijing, China
- Key
Laboratory of Intelligent Information Processing of CAS, ICT, Chinese Academy of Sciences, Beijing 100190, China
| | - Jianfeng Zhan
- State
Key Laboratory of Computer Architecture, Institute of Computing Technology
(ICT), Chinese Academy of Sciences (CAS), Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Si-Min He
- University of Chinese Academy of Sciences, Beijing, China
- Key
Laboratory of Intelligent Information Processing of CAS, ICT, Chinese Academy of Sciences, Beijing 100190, China
| | - Zhifei Zhang
- Capital Medical University, Beijing 100069, China
| |
Collapse
|
29
|
Abstract
Scoring functions that assess spectrum similarity play a crucial role in many computational mass spectrometry algorithms. These functions are used to compare an experimentally acquired fragmentation (MS/MS) spectrum against two different types of target MS/MS spectra: either against a theoretical MS/MS spectrum derived from a peptide from a sequence database, or against another, previously acquired MS/MS spectrum. The former is typically encountered in database searching, while the latter is used in spectrum clustering and spectral library searching. The comparison between acquired versus theoretical MS/MS spectra is most commonly performed using cross-correlations or probability derived scoring functions, while the comparison of two acquired MS/MS spectra typically makes use of a normalized dot product, especially in spectrum library search algorithms. In addition to these scoring functions, Pearson's or Spearman's correlation coefficients, mean squared error, or median absolute deviation scores can also be used for the same purpose. Here, we describe and evaluate these scoring functions with regards to their ability to assess spectrum similarity for theoretical versus acquired, and acquired versus acquired spectra.
Collapse
Affiliation(s)
- Şule Yilmaz
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Elien Vandermarliere
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium.
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium.
| |
Collapse
|
30
|
Matsuda F. Technical Challenges in Mass Spectrometry-Based Metabolomics. ACTA ACUST UNITED AC 2016; 5:S0052. [PMID: 27900235 DOI: 10.5702/massspectrometry.s0052] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 10/05/2016] [Indexed: 12/15/2022]
Abstract
Metabolomics is a strategy for analysis, and quantification of the complete collection of metabolites present in biological samples. Metabolomics is an emerging area of scientific research because there are many application areas including clinical, agricultural, and medical researches for the biomarker discovery and the metabolic system analysis by employing widely targeted analysis of a few hundred preselected metabolites from 10-100 biological samples. Further improvement in technologies of mass spectrometry in terms of experimental design for larger scale analysis, computational methods for tandem mass spectrometry-based elucidation of metabolites, and specific instrumentation for advanced bioanalysis will enable more comprehensive metabolome analysis for exploring the hidden secrets of metabolism.
Collapse
Affiliation(s)
- Fumio Matsuda
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University; RIKEN Center for Sustainable Resource Science
| |
Collapse
|
31
|
Li S, Dabir A, Misal SA, Tang H, Radivojac P, Reilly JP. Impact of Amidination on Peptide Fragmentation and Identification in Shotgun Proteomics. J Proteome Res 2016; 15:3656-3665. [PMID: 27615690 DOI: 10.1021/acs.jproteome.6b00468] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Peptide amidination labeling using S-methyl thioacetimidate (SMTA) is investigated in an attempt to increase the number and types of peptides that can be detected in a bottom-up proteomics experiment. This derivatization method affects the basicity of lysine residues and is shown here to significantly impact the idiosyncracies of peptide fragmentation and peptide detectability. The unique and highly reproducible fragmentation properties of SMTA-labeled peptides, such as the strong propensity for forming b1 fragment ions, can be further exploited to modify the scoring of peptide-spectrum pairs and improve peptide identification. To this end, we have developed a supervised postprocessing algorithm to exploit these characteristics of peptides labeled by SMTA. Our experiments show that although the overall number of identifications are similar, the SMTA modification enabled the detection of 16-26% peptides not previously observed in comparable CID/HCD tandem mass spectrometry experiments without SMTA labeling.
Collapse
Affiliation(s)
- Sujun Li
- School of Informatics and Computing, Indiana University , Bloomington, Indiana 47405, United States
| | - Aditi Dabir
- Department of Chemistry, Indiana University , Bloomington, Indiana 47405, United States
| | - Santosh A Misal
- Department of Chemistry, Indiana University , Bloomington, Indiana 47405, United States
| | - Haixu Tang
- School of Informatics and Computing, Indiana University , Bloomington, Indiana 47405, United States
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University , Bloomington, Indiana 47405, United States
| | - James P Reilly
- Department of Chemistry, Indiana University , Bloomington, Indiana 47405, United States
| |
Collapse
|
32
|
Griss J. Spectral library searching in proteomics. Proteomics 2016; 16:729-40. [PMID: 26616598 DOI: 10.1002/pmic.201500296] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 10/15/2015] [Accepted: 10/29/2015] [Indexed: 12/12/2022]
Abstract
Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
33
|
Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc 2015; 10:426-41. [PMID: 25675208 DOI: 10.1038/nprot.2015.015] [Citation(s) in RCA: 232] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 d to complete, depending on the extent of the library and the computational resources available.
Collapse
|
34
|
Dong NP, Liang YZ, Xu QS, Mok DKW, Yi LZ, Lu HM, He M, Fan W. Prediction of Peptide Fragment Ion Mass Spectra by Data Mining Techniques. Anal Chem 2014; 86:7446-54. [DOI: 10.1021/ac501094m] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
| | | | | | - Daniel K. W. Mok
- Department
of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
- State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), Shenzhen, 518000, P. R. China
| | - Lun-zhao Yi
- Yunnan
Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, P. R. China
| | | | - Min He
- Department of
Pharmaceutical Engineering,
School of Chemical Engineering, Xiangtan University, Xiangtan, 411105, P.R. China
| | - Wei Fan
- College of
Bioscience and Biotechnology, Hunan Agricultural University, Changsha, 410083, P. R. China
| |
Collapse
|
35
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Abstract
MOTIVATION Tandem mass spectrometry provides the means to match mass spectrometry signal observations with the chemical entities that generated them. The technology produces signal spectra that contain information about the chemical dissociation pattern of a peptide that was forced to fragment using methods like collision-induced dissociation. The ability to predict these MS(2) signals and to understand this fragmentation process is important for sensitive high-throughput proteomics research. RESULTS We present a new tool called MS(2)PIP for predicting the intensity of the most important fragment ion signal peaks from a peptide sequence. MS(2)PIP pre-processes a large dataset with confident peptide-to-spectrum matches to facilitate data-driven model induction using a random forest regression learning algorithm. The intensity predictions of MS(2)PIP were evaluated on several independent evaluation sets and found to correlate significantly better with the observed fragment-ion intensities as compared with the current state-of-the-art PeptideART tool. AVAILABILITY MS(2)PIP code is available for both training and predicting at http://compomics.com/.
Collapse
Affiliation(s)
- Sven Degroeve
- Department of Medical Protein Research, VIB, Ghent 9000, Belgium and Department of Biochemistry, Ghent University, Ghent 9000, Belgium
| | | |
Collapse
|
37
|
Dong NP, Liang YZ, Yi LZ, Lu HM. Investigation of scrambled ions in tandem mass spectra, part 2. On the influence of the ions on peptide identification. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2013; 24:857-867. [PMID: 23504644 DOI: 10.1007/s13361-013-0591-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 01/19/2013] [Accepted: 01/20/2013] [Indexed: 06/01/2023]
Abstract
A comprehensive investigation was performed to understand the influence of sequence scrambling in peptide ions on peptide identification results. To achieve this, four tandem mass spectrometry datasets with scrambled ions included and with them excluded were analyzed by Crux, X!Tandem, SpectraST, Lutefisk, and PepNovo. While the different algorithms differed in their performance, an increase in the number of correctly identified peptides was generally observed when removing scrambled ions, with the exception of the SpectraST algorithm. However, the variation of the match scores upon removal was unpredictable. Following these investigations, an interpretation was given on how the scrambled ions affect peptide identification. Lastly, a simulated theoretical mass spectral library derived from the NIST peptide Libraries was constructed and searched by SpectraST to study whether scrambled ions in predicted mass spectra could affect peptide identification. Consistent with the peptide library search results, no significant variations for dot product scores as well as peptide identification results were observed when these ions were included in the theoretical MS/MS spectra. From the five adopted algorithms, the SpectraST and Crux provided the most robust results, whereas X!Tandem, PepNovo, and Lutefisk were sensitive to the existence of the scrambled ions, especially the latter two de novo sequencing algorithms.
Collapse
Affiliation(s)
- Nai-ping Dong
- College of Chemistry and Chemical Engineering, Central South University, Changsha, People's Republic of China
| | | | | | | |
Collapse
|
38
|
Wang D, Dasari S, Chambers MC, Holman JD, Chen K, Liebler DC, Orton DJ, Purvine SO, Monroe ME, Chung CY, Rose KL, Tabb DL. Basophile: accurate fragment charge state prediction improves peptide identification rates. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:86-95. [PMID: 23499924 PMCID: PMC3737598 DOI: 10.1016/j.gpb.2012.11.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Revised: 11/03/2012] [Accepted: 11/22/2012] [Indexed: 01/14/2023]
Abstract
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.
Collapse
Affiliation(s)
- Dong Wang
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Ji C, Arnold RJ, Sokoloski KJ, Hardy RW, Tang H, Radivojac P. Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra. Proteomics 2013; 13:756-65. [PMID: 23303707 DOI: 10.1002/pmic.201100670] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 10/19/2012] [Accepted: 11/11/2012] [Indexed: 01/10/2023]
Abstract
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20-60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.
Collapse
Affiliation(s)
- Chao Ji
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | | | | | | | | | | |
Collapse
|
40
|
Modzel M, Stefanowicz P, Szewczuk Z. Hydrogen scrambling in non-covalent complexes of peptides. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2012; 26:2739-2744. [PMID: 23124664 DOI: 10.1002/rcm.6396] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
RATIONALE Mass spectrometry analysis combined with hydrogen-deuterium exchange (HDX-MS) is arising as a tool for quick analysis of native protein conformation. However, during collision-induced dissociation (CID) the spatial distribution of deuterium is not always conserved. It is therefore important to find out how hydrogen scrambling occurs--this study concentrates on the possibility of scrambling between amino acid residues spatially close together, but not connected by covalent bonds. METHODS Peptides used in this study were synthesized by Fmoc strategy. Deuteration occurred in ammonia formate solution in D(2)O. Non-covalent complexes consisting of a deuterated and a non-deuterated peptide were analyzed by electrospray ionization (ESI) Fourier transform ion cyclotron resonance (FT-ICR-MS) with quadrupole mass filter. Low-energy CID was used for complex dissociation. RESULTS The complexes were isolated on a quadrupole and subjected to CID to cause dissociation. The deuterium distribution before and after the dissociation of a non-covalent complex to its components was measured. The study revealed that no significant scrambling occurred between the constituents of the complexes--the degree of scrambling did not exceed 10%. CONCLUSIONS The results obtained for the complexes should be similar to those for protein parts spatially close together--hydrogen scrambling between them should be negligible. The knowledge that almost all the scrambling occurs along peptide chains gives a better insight into the mechanism of HDX inside a protein.
Collapse
Affiliation(s)
- Maciej Modzel
- Faculty of Chemistry, University of Wrocław, Joliot-Curie 14, Wroclaw, Poland
| | | | | |
Collapse
|
41
|
Abstract
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
Collapse
Affiliation(s)
- Yong Fuga Li
- School of Informatics and Computing, Indiana University, Bloomington 150 S, Woodlawn Avenue, Bloomington, Indiana 47405, USA
| | | |
Collapse
|
42
|
Abstract
Selected reaction monitoring (SRM) has a long history of use in the area of quantitative MS. In recent years, the approach has seen increased application to quantitative proteomics, facilitating multiplexed relative and absolute quantification studies in a variety of organisms. This article discusses SRM, after introducing the context of quantitative proteomics (specifically primarily absolute quantification) where it finds most application, and considers topics such as the theory and advantages of SRM, the selection of peptide surrogates for protein quantification, the design of optimal SRM co-ordinates and the handling of SRM data. A number of published studies are also discussed to demonstrate the impact that SRM has had on the field of quantitative proteomics.
Collapse
|
43
|
Niedermeyer THJ, Strohalm M. mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PLoS One 2012; 7:e44913. [PMID: 23028676 PMCID: PMC3441486 DOI: 10.1371/journal.pone.0044913] [Citation(s) in RCA: 219] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/09/2012] [Indexed: 11/19/2022] Open
Abstract
Natural or synthetic cyclic peptides often possess pronounced bioactivity. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and the complex fragmentation patterns observed. Even though several software tools for cyclic peptide tandem mass spectra annotation have been published, these tools are still unable to annotate a majority of the signals observed in experimentally obtained mass spectra. They are thus not suitable for extensive mass spectrometric characterization of these compounds. This lack of advanced and user-friendly software tools has motivated us to extend the fragmentation module of a freely available open-source software, mMass (http://www.mmass.org), to allow for cyclic peptide tandem mass spectra annotation and interpretation. The resulting software has been tested on several cyanobacterial and other naturally occurring peptides. It has been found to be superior to other currently available tools concerning both usability and annotation extensiveness. Thus it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.
Collapse
|
44
|
Sun S, Yang F, Yang Q, Zhang H, Wang Y, Bu D, Ma B. MS-Simulator: Predicting Y-Ion Intensities for Peptides with Two Charges Based on the Intensity Ratio of Neighboring Ions. J Proteome Res 2012; 11:4509-16. [DOI: 10.1021/pr300235v] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Shiwei Sun
- Advanced Research
Laboratory,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Intelligent
Information Processing, Chinese Academy of Sciences, Beijing, China
| | - Fuquan Yang
- Proteomics Platform, Institute
of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Qing Yang
- Advanced Research
Laboratory,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- School
of Computer Science, University of Science and Technology, Beijing, China
| | - Hong Zhang
- College of Food
Science and Biological
Engineering, Zhejiang Gongshang University, Hangzhou, China
| | - Yaojun Wang
- Advanced Research
Laboratory,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Dongbo Bu
- Advanced Research
Laboratory,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Intelligent
Information Processing, Chinese Academy of Sciences, Beijing, China
| | - Bin Ma
- School
of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| |
Collapse
|
45
|
Niedermeyer THJ, Strohalm M. mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PLoS One 2012. [PMID: 23028676 DOI: 10.1055/s-0032-1321299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2023] Open
Abstract
Natural or synthetic cyclic peptides often possess pronounced bioactivity. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and the complex fragmentation patterns observed. Even though several software tools for cyclic peptide tandem mass spectra annotation have been published, these tools are still unable to annotate a majority of the signals observed in experimentally obtained mass spectra. They are thus not suitable for extensive mass spectrometric characterization of these compounds. This lack of advanced and user-friendly software tools has motivated us to extend the fragmentation module of a freely available open-source software, mMass (http://www.mmass.org), to allow for cyclic peptide tandem mass spectra annotation and interpretation. The resulting software has been tested on several cyanobacterial and other naturally occurring peptides. It has been found to be superior to other currently available tools concerning both usability and annotation extensiveness. Thus it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.
Collapse
|
46
|
Ahrné E, Ohta Y, Nikitin F, Scherl A, Lisacek F, Müller M. An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates. Proteomics 2011; 11:4085-95. [DOI: 10.1002/pmic.201000665] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Revised: 07/13/2011] [Accepted: 07/29/2011] [Indexed: 11/06/2022]
|