1
|
Li K, Tang H, Liu X. TopLib: Building and Searching Top-Down Mass Spectral Libraries for Proteoform Identification. Anal Chem 2025; 97:11443-11453. [PMID: 40440726 PMCID: PMC12163883 DOI: 10.1021/acs.analchem.4c06627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 05/14/2025] [Accepted: 05/15/2025] [Indexed: 06/11/2025]
Abstract
Mass spectral library search is a widely used approach for spectral identification in mass spectrometry (MS)-based proteomics. While numerous methods exist for building and searching bottom-up mass spectral libraries, there is a lack of software tools for top-down mass spectral libraries. To fill the gap, we introduce TopLib, a new software package designed for building and searching top-down spectral libraries. TopLib utilizes an efficient spectral representation technique to reduce database size and improve query speed and performance. We systematically evaluated various spectral representation techniques and scoring functions for top-down spectral clustering and search. Our results demonstrate that TopLib is significantly faster and yields higher reproducibility in proteoform identification compared to conventional database search methods in top-down MS.
Collapse
Affiliation(s)
- Kun Li
- Deming Department
of Medicine, Tulane University, New Orleans, Louisiana70112, United States
| | - Haixu Tang
- Luddy
School
of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana47408, United States
| | - Xiaowen Liu
- Deming Department
of Medicine, Tulane University, New Orleans, Louisiana70112, United States
| |
Collapse
|
2
|
Smith BJ, Guest PC, Martins-de-Souza D. Maximizing Analytical Performance in Biomolecular Discovery with LC-MS: Focus on Psychiatric Disorders. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2024; 17:25-46. [PMID: 38424029 DOI: 10.1146/annurev-anchem-061522-041154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
In this review, we discuss the cutting-edge developments in mass spectrometry proteomics and metabolomics that have brought improvements for the identification of new disease-based biomarkers. A special focus is placed on psychiatric disorders, for example, schizophrenia, because they are considered to be not a single disease entity but rather a spectrum of disorders with many overlapping symptoms. This review includes descriptions of various types of commonly used mass spectrometry platforms for biomarker research, as well as complementary techniques to maximize data coverage, reduce sample heterogeneity, and work around potentially confounding factors. Finally, we summarize the different statistical methods that can be used for improving data quality to aid in reliability and interpretation of proteomics findings, as well as to enhance their translatability into clinical use and generalizability to new data sets.
Collapse
Affiliation(s)
- Bradley J Smith
- 1Laboratory of Neuroproteomics, Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas, São Paulo, Brazil;
| | - Paul C Guest
- 1Laboratory of Neuroproteomics, Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas, São Paulo, Brazil;
- 2Department of Psychiatry, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
- 3Laboratory of Translational Psychiatry, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
| | - Daniel Martins-de-Souza
- 1Laboratory of Neuroproteomics, Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas, São Paulo, Brazil;
- 4Experimental Medicine Research Cluster, University of Campinas, São Paulo, Brazil
- 5National Institute of Biomarkers in Neuropsychiatry, National Council for Scientific and Technological Development, São Paulo, Brazil
- 6D'Or Institute for Research and Education, São Paulo, Brazil
- 7INCT in Modelling Human Complex Diseases with 3D Platforms (Model3D), São Paulo, Brazil
| |
Collapse
|
3
|
Palstrøm NB, Campbell AJ, Lindegaard CA, Cakar S, Matthiesen R, Beck HC. Spectral library search for improved TMTpro labelled peptide assignment in human plasma proteomics. Proteomics 2024; 24:e2300236. [PMID: 37706597 DOI: 10.1002/pmic.202300236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 09/15/2023]
Abstract
Clinical biomarker discovery is often based on the analysis of human plasma samples. However, the high dynamic range and complexity of plasma pose significant challenges to mass spectrometry-based proteomics. Current methods for improving protein identifications require laborious pre-analytical sample preparation. In this study, we developed and evaluated a TMTpro-specific spectral library for improved protein identification in human plasma proteomics. The library was constructed by LC-MS/MS analysis of highly fractionated TMTpro-tagged human plasma, human cell lysates, and relevant arterial tissues. The library was curated using several quality filters to ensure reliable peptide identifications. Our results show that spectral library searching using the TMTpro spectral library improves the identification of proteins in plasma samples compared to conventional sequence database searching. Protein identifications made by the spectral library search engine demonstrated a high degree of complementarity with the sequence database search engine, indicating the feasibility of increasing the number of protein identifications without additional pre-analytical sample preparation. The TMTpro-specific spectral library provides a resource for future plasma proteomics research and optimization of search algorithms for greater accuracy and speed in protein identifications in human plasma proteomics, and is made publicly available to the research community via ProteomeXchange with identifier PXD042546.
Collapse
Affiliation(s)
- Nicolai B Palstrøm
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Amanda J Campbell
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | | | - Samir Cakar
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| | - Rune Matthiesen
- Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisbon, Portugal
| | - Hans C Beck
- Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark
| |
Collapse
|
4
|
Jia W, Peng J, Zhang Y, Zhu J, Qiang X, Zhang R, Shi L. Exploring novel ANGICon-EIPs through ameliorated peptidomics techniques: Can deep learning strategies as a core breakthrough in peptide structure and function prediction? Food Res Int 2023; 174:113640. [PMID: 37986483 DOI: 10.1016/j.foodres.2023.113640] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/23/2023] [Accepted: 10/24/2023] [Indexed: 11/22/2023]
Abstract
Dairy-derived angiotensin-I-converting enzyme inhibitory peptides (ANGICon-EIPs) have been regarded as a relatively safe supplementary diet-therapy strategy for individuals with hypertension, and short-chain peptides may have more relevant antihypertensive benefits due to their direct intestinal absorption. Our previous explorations have confirmed that endogenous goat milk short-chain peptides are also an essential source of ANGICon-EIPs. Nonetheless, there are limited explorations on endogenous ANGICon-EIPs owing to the limitations of the extraction and enrichment of endogenous peptides, currently. This review outlined ameliorated pre-treatment strategies, data acquisition methods, and tools for the prediction of peptide structure and function, aiming to provide creative ideas for discovering novel ANGICon-EIPs. Currently, deep learning-based peptide structure and function prediction algorithms have achieved significant advancements. The convolutional neural network (CNN) and peptide sequence-based multi-label deep learning approach for determining the multi-functionalities of bioactive peptides (MLBP) can predict multiple peptide functions with absolute true value and accuracy of 0.699 and 0.708, respectively. Utilizing peptide sequence input, torsion angles, and inter-residue distance to train neural networks, APPTEST predicted the average backbone root mean square deviation (RMSD) value of peptide (5-40 aa) structures as low as 1.96 Å. Overall, with the exploration of more neural network architectures, deep learning could be considered a critical research tool to reduce the cost and improve the efficiency of identifying novel endogenous ANGICon-EIPs.
Collapse
Affiliation(s)
- Wei Jia
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China; Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China; Shaanxi Research Institute of Agricultural Products Processing Technology, Xi'an 710021, China.
| | - Jian Peng
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Yan Zhang
- Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China
| | - Jiying Zhu
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Xin Qiang
- Inspection and Testing Center of Fuping County (Shaanxi goat milk product quality supervision and Inspection Center), Weinan 711700, China
| | - Rong Zhang
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| | - Lin Shi
- School of Food and Bioengineering, Shaanxi University of Science and Technology, Xi'an 710021, China
| |
Collapse
|
5
|
Abstract
Spectrum library searching is a powerful alternative to database searching for data dependent acquisition experiments, but has been historically limited to identifying previously observed peptides in libraries. Here we present Scribe, a new library search engine designed to leverage deep learning fragmentation prediction software such as Prosit. Rather than relying on highly curated DDA libraries, this approach predicts fragmentation and retention times for every peptide in a FASTA database. Scribe embeds Percolator for false discovery rate correction and an interference tolerant, label-free quantification integrator for an end-to-end proteomics workflow. By leveraging expected relative fragmentation and retention time values, we find that library searching with Scribe can outperform traditional database searching tools both in terms of sensitivity and quantitative precision. Scribe and its graphical interface are easy to use, freely accessible, and fully open source.
Collapse
Affiliation(s)
- Brian C Searle
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
- Proteome Software Inc., Portland, Oregon97219, United States
| | - Ariana E Shannon
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| | - Damien Beau Wilburn
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| |
Collapse
|