1
|
Benoist É, Jean G, Rogniaux H, Fertin G, Tessier D. SpecPeptidOMS Directly and Rapidly Aligns Mass Spectra on Whole Proteomes and Identifies Peptides That Are Not Necessarily Tryptic: Implications for Peptidomics. J Proteome Res 2025; 24:2159-2172. [PMID: 40146164 DOI: 10.1021/acs.jproteome.4c00870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2025]
Abstract
SpecPeptidOMS directly aligns peptide fragmentation spectra to whole and undigested protein sequences. The algorithm was specifically and initially designed for peptidomics, where the aim is to identify peptides that do not result from the hydrolysis of a known protein and therefore, whose termini cannot be predicted. Thus, SpecPeptidOMS can perform alignments starting and ending anywhere in the protein sequence. The underlying computational method of SpecPeptidOMS, which is based on a dynamic programming approach, was drastically optimized. As a result, SpecPeptidOMS can process around 12,000 spectra per hour on an ordinary laptop, with alignment performed against the entire human proteome. The performance of SpecPeptidOMS was first evaluated on a publicly available data set of (nontryptic) synthetic mass spectra. Accuracy was estimated by considering the results obtained by MaxQuant on the same data set as the "ground truth". A second series of tests on a larger, well-known proteomics data set (HEK293) highlighted SpecPeptidOMS' additional ability to search for open modifications, a feature of interest in peptidomics but also more broadly in conventional proteomics. SpecPeptidOMS is open-source, cross-platform (written in Java), and freely available.
Collapse
Affiliation(s)
- Émile Benoist
- Nantes Université, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Géraldine Jean
- Nantes Université, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
| | - Hélène Rogniaux
- INRAE, PROBE Research Infrastructure, BIBS Facility, F-44300 Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, F-44316 Nantes, France
| | | | - Dominique Tessier
- INRAE, PROBE Research Infrastructure, BIBS Facility, F-44300 Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, F-44316 Nantes, France
| |
Collapse
|
2
|
Yi Y, Li Z, Liu L, Wu HC. Towards Next Generation Protein Sequencing. Chembiochem 2025; 26:e202400824. [PMID: 39632614 DOI: 10.1002/cbic.202400824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/01/2024] [Accepted: 12/03/2024] [Indexed: 12/07/2024]
Abstract
Understanding the structure and function of proteins is a critical objective in the life sciences. Protein sequencing, a central aspect of this endeavor, was first accomplished through Edman degradation in the 1950s. Since the late 20th century, mass spectrometry has emerged as a prominent method for protein sequencing. In recent years, single-molecule technologies have increasingly been applied to this field, yielding numerous innovative results. Among these, nanopore sensing has proven to be a reliable single-molecule technology, enabling advancements in amino acid recognition, short peptide differentiation, and peptide sequence reading. These developments are set to elevate protein sequencing technology to new heights. The next generation of protein sequencing technologies is anticipated to revolutionize our understanding of molecular mechanisms in biological processes and significantly enhance clinical diagnostics and treatments.
Collapse
Affiliation(s)
- Yakun Yi
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Analytical Chemistry for Living Biosystems, Institute of Chemistry, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Ziyi Li
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Analytical Chemistry for Living Biosystems, Institute of Chemistry, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Lei Liu
- College of Food and Bioengineering, Xihua University, 610039, Chengdu, China
| | - Hai-Chen Wu
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Analytical Chemistry for Living Biosystems, Institute of Chemistry, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| |
Collapse
|
3
|
Hu GS, Zheng ZZ, He YH, Wang DC, Nie RC, Liu W. Integrated Analysis of Proteome and Transcriptome Profiling Reveals Pan-Cancer-Associated Pathways and Molecular Biomarkers. Mol Cell Proteomics 2025; 24:100919. [PMID: 39884577 PMCID: PMC11907456 DOI: 10.1016/j.mcpro.2025.100919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 01/02/2025] [Accepted: 01/24/2025] [Indexed: 02/01/2025] Open
Abstract
Understanding dysregulated genes and pathways in cancer is critical for precision oncology. Integrating mass spectrometry-based proteomic data with transcriptomic data presents unique opportunities for systematic analyses of dysregulated genes and pathways in pan-cancer. Here, we compiled a comprehensive set of datasets, encompassing proteomic data from 2404 samples and transcriptomic data from 7752 samples across 13 cancer types. Comparisons between normal or adjacent normal tissues and tumor tissues identified several dysregulated pathways including mRNA splicing, interferon pathway, fatty acid metabolism, and complement coagulation cascade in pan-cancer. Additionally, pan-cancer upregulated and downregulated genes (PCUGs and PCDGs) were also identified. Notably, RRM2 and ADH1B, two genes which belong to PCUGs and PCDGs, respectively, were identified as robust pan-cancer diagnostic biomarkers. TNM stage-based comparisons revealed dysregulated genes and biological pathways involved in cancer progression, among which the dysregulation of complement coagulation cascade and epithelial-mesenchymal transition are frequent in multiple types of cancers. A group of pan-cancer continuously upregulated and downregulated proteins in different tumor stages (PCCUPs and PCCDPs) were identified. We further constructed prognostic risk stratification models for corresponding cancer types based on dysregulated genes, which effectively predict the prognosis for patients with these cancers. Drug prediction based on PCUGs and PCDGs as well as PCCUPs and PCCDPs revealed that small molecule inhibitors targeting CDK, HDAC, MEK, JAK, PI3K, and others might be effective treatments for pan-cancer, thereby supporting drug repurposing. We also developed web tools for cancer diagnosis, pathologic stage assessment, and risk evaluation. Overall, this study highlights the power of combining proteomic and transcriptomic data to identify valuable diagnostic and prognostic markers as well as drug targets and treatments for cancer.
Collapse
Affiliation(s)
- Guo-Sheng Hu
- Biomedical Research Center of South China, College of Life Sciences, Fujian Normal University, Fuzhou, China; State Key Laboratory of Cellular Stress Biology, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Xiang An Biomedicine Laboratory, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China
| | - Zao-Zao Zheng
- State Key Laboratory of Cellular Stress Biology, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Xiang An Biomedicine Laboratory, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China
| | - Yao-Hui He
- State Key Laboratory of Cellular Stress Biology, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Xiang An Biomedicine Laboratory, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan, China
| | - Du-Chuang Wang
- State Key Laboratory of Cellular Stress Biology, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Xiang An Biomedicine Laboratory, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China
| | - Rui-Chao Nie
- State Key Laboratory of Cellular Stress Biology, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Xiang An Biomedicine Laboratory, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China
| | - Wen Liu
- State Key Laboratory of Cellular Stress Biology, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; Xiang An Biomedicine Laboratory, School of Pharmaceutical Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, Fujian, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China.
| |
Collapse
|
4
|
Liu YC, Lin TJ, Chong KY, Chen GY, Kuo CY, Lin YY, Chang CW, Hsiao TF, Wang CL, Shih YC, Yu CJ. Targeting the ERK1/2 and p38 MAPK pathways attenuates Golgi tethering factor golgin-97 depletion-induced cancer progression in breast cancer. Cell Commun Signal 2025; 23:22. [PMID: 39800687 PMCID: PMC11727508 DOI: 10.1186/s12964-024-02010-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Accepted: 12/22/2024] [Indexed: 01/16/2025] Open
Abstract
BACKGROUND The Golgi apparatus is widely considered a secretory center and a hub for different signaling pathways. Abnormalities in Golgi dynamics can perturb the tumor microenvironment and influence cell migration. Therefore, unraveling the regulatory network of the Golgi and searching for pharmacological targets would facilitate the development of novel anticancer therapies. Previously, we reported an unconventional role for the Golgi tethering factor golgin-97 in inhibiting breast cell motility, and its downregulation was associated with poor patient prognosis. However, the specific role and regulatory mechanism of golgin-97 in cancer progression in vivo remain unclear. METHODS We integrated genetic knockout (KO) of golgin-97, animal models (zebrafish and xenograft mice), multi-omics analysis (next-generation sequencing and proteomics), bioinformatics analysis, and kinase inhibitor treatment to evaluate the effects of golgin-97 KO in triple-negative breast cancer cells. Gene knockdown and kinase inhibitor treatment followed by qRT‒PCR, Western blotting, cell viability, migration, and cytotoxicity assays were performed to elucidate the mechanisms of golgin-97 KO-mediated cancer invasion. A xenograft mouse model was used to investigate cancer progression and drug therapy. RESULTS We demonstrated that golgin-97 KO promoted breast cell metastasis in zebrafish and xenograft mouse models. Multi-omics analysis revealed that the Wnt signaling pathway, MAPK kinase cascades, and inflammatory cytokines are involved in golgin-97 KO-induced breast cancer progression. Targeting the ERK1/2 and p38 MAPK pathways effectively attenuated golgin-97-induced cancer cell migration, reduced the expression of inflammatory mediators, and enhanced the chemotherapeutic effect of paclitaxel in vitro and in vivo. Specifically, compared with the paclitaxel regimen, the combination of ERK1/2 and p38 MAPK inhibitors significantly prevented lung metastasis and lung injury. We further demonstrated that hypoxia is a physiological condition that reduces golgin-97 expression in cancer, revealing a novel and potential feedback loop between ERK/MAPK signaling and golgin-97. CONCLUSION Our results collectively support a novel regulatory role of golgin-97 in ERK/MAPK signaling and the tumor microenvironment, possibly providing new insights for anti-breast cancer drug development.
Collapse
Affiliation(s)
- Yu-Chin Liu
- Department of Cell and Molecular Biology, College of Medicine, Chang Gung University, 259 Wen-Hwa 1 road, Guishan District, Taoyuan, Taiwan
| | - Tsung-Jen Lin
- Department of Cell and Molecular Biology, College of Medicine, Chang Gung University, 259 Wen-Hwa 1 road, Guishan District, Taoyuan, Taiwan
- CardioVascular Research Center, Tzu Chi General Hospital, Hualien City, Hualien County, Taiwan
| | - Kowit-Yu Chong
- Department of Medical Biotechnology and Laboratory Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Graduate Institute of Biomedical Sciences Division of Biotechnology, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Hyperbaric Oxygen Medical Research Lab, Bone and Joint Research Center, Linkou Chang Gung Memorial Hospital, Taoyuan, Taiwan
- Centre for Stem Cell Research, Faculty of Medicine and Health Sciences, Universiti Tunku Abdul Rahman, Selangor, Malaysia
| | - Guan-Ying Chen
- Department of Cell and Molecular Biology, College of Medicine, Chang Gung University, 259 Wen-Hwa 1 road, Guishan District, Taoyuan, Taiwan
| | - Chia-Yu Kuo
- Department of Cell and Molecular Biology, College of Medicine, Chang Gung University, 259 Wen-Hwa 1 road, Guishan District, Taoyuan, Taiwan
| | - Yi-Yun Lin
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Chia-Wei Chang
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Ting-Feng Hsiao
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Chih-Liang Wang
- School of Medicine, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Department of Thoracic Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Yo-Chen Shih
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Chia-Jung Yu
- Department of Cell and Molecular Biology, College of Medicine, Chang Gung University, 259 Wen-Hwa 1 road, Guishan District, Taoyuan, Taiwan.
- Graduate Institute of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan.
- Department of Thoracic Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan.
- Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
| |
Collapse
|
5
|
Stastna M. Post-translational modifications of proteins in cardiovascular diseases examined by proteomic approaches. FEBS J 2025; 292:28-46. [PMID: 38440918 PMCID: PMC11705224 DOI: 10.1111/febs.17108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/22/2024] [Accepted: 02/20/2024] [Indexed: 03/06/2024]
Abstract
Over 400 different types of post-translational modifications (PTMs) have been reported and over 200 various types of PTMs have been discovered using mass spectrometry (MS)-based proteomics. MS-based proteomics has proven to be a powerful method capable of global PTM mapping with the identification of modified proteins/peptides, the localization of PTM sites and PTM quantitation. PTMs play regulatory roles in protein functions, activities and interactions in various heart related diseases, such as ischemia/reperfusion injury, cardiomyopathy and heart failure. The recognition of PTMs that are specific to cardiovascular pathology and the clarification of the mechanisms underlying these PTMs at molecular levels are crucial for discovery of novel biomarkers and application in a clinical setting. With sensitive MS instrumentation and novel biostatistical methods for precise processing of the data, low-abundance PTMs can be successfully detected and the beneficial or unfavorable effects of specific PTMs on cardiac function can be determined. Moreover, computational proteomic strategies that can predict PTM sites based on MS data have gained an increasing interest and can contribute to characterization of PTM profiles in cardiovascular disorders. More recently, machine learning- and deep learning-based methods have been employed to predict the locations of PTMs and explore PTM crosstalk. In this review article, the types of PTMs are briefly overviewed, approaches for PTM identification/quantitation in MS-based proteomics are discussed and recently published proteomic studies on PTMs associated with cardiovascular diseases are included.
Collapse
Affiliation(s)
- Miroslava Stastna
- Institute of Analytical Chemistry of the Czech Academy of SciencesBrnoCzech Republic
| |
Collapse
|
6
|
Anderson LC, Bai DL, Blakney GT, Butcher DS, Reser L, Shabanowitz J. The Hunt Lab Guide to De Novo Peptide Sequence Analysis by Tandem Mass Spectrometry. Mol Cell Proteomics 2024; 23:100875. [PMID: 39515468 DOI: 10.1016/j.mcpro.2024.100875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 10/23/2024] [Accepted: 10/29/2024] [Indexed: 11/16/2024] Open
Abstract
Donald Hunt has made seminal contributions to the fields of proteomics, immunology, epigenetics, and glycobiology. The foundation of every important work to come out of the Hunt Laboratory is de novo peptide sequencing. For decades, he taught hundreds of students, postdocs, engineers, and scientists to directly interpret mass spectral data. To honor his legacy and ensure that the art of de novo sequencing is not lost, we have adapted his teaching materials into "The Hunt Lab Guide to De Novo Peptide Sequence Analysis by Tandem Mass Spectrometry". In addition to the de novo sequencing tutorials, we present two freely available software tools that facilitate manual interpretation of mass spectra and validation of search results. The first, "Hunt Lab Peptide Fragment Calculator", calculates precursor and fragment mass-to-charge ratios for any peptide. The second program, "Predator Protein Fragment Calculator", was inspired in part by the fragment calculator developed in the Hunt Lab. Its capabilities are enhanced to facilitate interpretation of mass spectral data derived from intact proteins. We hope that the combination of these educational tools will continue to benefit students and researchers by empowering them to interpret data on their own.
Collapse
Affiliation(s)
- Lissa C Anderson
- National High Magnetic Field Laboratory, Florida State University, Tallahassee, Florida, USA; Department of Chemistry and Biochemistry, Florida State University, Tallahassee, Florida, USA.
| | - Dina L Bai
- Department of Chemistry, University of Virginia, Charlottesville, Virginia, USA
| | - Greg T Blakney
- National High Magnetic Field Laboratory, Florida State University, Tallahassee, Florida, USA
| | - David S Butcher
- National High Magnetic Field Laboratory, Florida State University, Tallahassee, Florida, USA
| | - Larry Reser
- Department of Chemistry, University of Virginia, Charlottesville, Virginia, USA
| | - Jeffrey Shabanowitz
- Department of Chemistry, University of Virginia, Charlottesville, Virginia, USA
| |
Collapse
|
7
|
Klein J, Lam H, Mak TD, Bittremieux W, Perez-Riverol Y, Gabriels R, Shofstahl J, Hecht H, Binz PA, Kawano S, Van Den Bossche T, Carver J, Neely BA, Mendoza L, Suomi T, Claeys T, Payne T, Schulte D, Sun Z, Hoffmann N, Zhu Y, Neumann S, Jones AR, Bandeira N, Vizcaíno JA, Deutsch EW. The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF. Anal Chem 2024; 96:18491-18501. [PMID: 39514576 PMCID: PMC11579979 DOI: 10.1021/acs.analchem.4c04091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 10/16/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024]
Abstract
Mass spectral libraries are collections of reference spectra, usually associated with specific analytes from which the spectra were generated, that are used for further downstream analysis of new spectra. There are many different formats used for encoding spectral libraries, but none have undergone a standardization process to ensure broad applicability to many applications. As part of the Human Proteome Organization Proteomics Standards Initiative (PSI), we have developed a standardized format for encoding spectral libraries, called mzSpecLib (https://psidev.info/mzSpecLib). It is primarily a data model that flexibly encodes metadata about the library entries using the extensible PSI-MS controlled vocabulary and can be encoded in and converted between different serialization formats. We have also developed a standardized data model and serialization for fragment ion peak annotations, called mzPAF (https://psidev.info/mzPAF). It is defined as a separate standard, since it may be used for other applications besides spectral libraries. The mzSpecLib and mzPAF standards are compatible with existing PSI standards such as ProForma 2.0 and the Universal Spectrum Identifier. The mzSpecLib and mzPAF standards have been primarily defined for peptides in proteomics applications with basic small molecule support. They could be extended in the future to other fields that need to encode spectral libraries for nonpeptidic analytes.
Collapse
Affiliation(s)
- Joshua Klein
- Program
for Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, 999077 Hong Kong, P. R. China
| | - Tytus D. Mak
- Mass
Spectrometry Data Center, National Institute
of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jim Shofstahl
- Thermo
Fisher
Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Helge Hecht
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 60200 Brno, Czech Republic
| | | | - Shin Kawano
- Database
Center for Life Science, Joint Support Center
for Data Science Research, Research Organization of Information and
Systems, Chiba 277-0871, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
| | - Benjamin A. Neely
- National
Institute of Standards and Technology (NIST) Charleston, Charleston, South Carolina 29412, United States
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tomi Suomi
- Turku Bioscience
Centre, University of Turku and Åbo
Akademi University, FI-20520 Turku, Finland
| | - Tine Claeys
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Thomas Payne
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Douwe Schulte
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584,
CH, Utrecht, The
Netherlands
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Steffen Neumann
- Computational
Plant Biochemistry, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, United Kingdom
| | - Nuno Bandeira
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
8
|
Tsour S, Machne R, Leduc A, Widmer S, Guez J, Karczewski K, Slavov N. Alternate RNA decoding results in stable and abundant proteins in mammals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.26.609665. [PMID: 39253435 PMCID: PMC11383030 DOI: 10.1101/2024.08.26.609665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Amino acid substitutions may substantially alter protein stability and function, but the contribution of substitutions arising from alternate translation (deviations from the genetic code) is unknown. To explore it, we analyzed deep proteomic and transcriptomic data from over 1,000 human samples, including 6 cancer types and 26 healthy human tissues. This global analysis identified 60,024 high confidence substitutions corresponding to 8,801 unique sites in proteins derived from 1,990 genes. Some substitutions are shared across samples, while others exhibit strong tissue-type and cancer specificity. Surprisingly, products of alternate translation are more abundant than their canonical counterparts for hundreds of proteins, suggesting sense codon recoding. Recoded proteins include transcription factors, proteases, signaling proteins, and proteins associated with neurodegeneration. Mechanisms contributing to substitution abundance include protein stability, codon frequency, codon-anticodon mismatches, and RNA modifications. We characterize sequence motifs around alternatively translated amino acids and how substitution ratios vary across protein domains, tissue types and cancers. The substitution ratios are positively associated with intrinsically disordered regions and genetic polymorphisms in gnomAD, though the polymorphisms cannot account for the substitutions. Both the sequence and the tissue-specificity of alternatively translated proteins are conserved between human and mouse. These results demonstrate the contribution of alternate translation to diversifying mammalian proteomes, and its association with protein stability, tissue-specific proteomes, and diseases.
Collapse
Affiliation(s)
- Shira Tsour
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Alnylam Pharmaceuticals, Cambridge, MA, USA
| | - Rainer Machne
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Simon Widmer
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Jeremy Guez
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Konrad Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| |
Collapse
|
9
|
He Q, Li X, Zhong J, Yang G, Han J, Shuai J. Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics. SMART MEDICINE 2024; 3:e20240014. [PMID: 39420951 PMCID: PMC11425048 DOI: 10.1002/smmd.20240014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 07/01/2024] [Indexed: 10/19/2024]
Abstract
Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3-20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.
Collapse
Affiliation(s)
- Qingzu He
- Department of PhysicsNational Institute for Data Science in Health and MedicineXiamen UniversityXiamenChina
- Wenzhou Key Laboratory of BiophysicsWenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| | - Xiang Li
- Department of PhysicsNational Institute for Data Science in Health and MedicineXiamen UniversityXiamenChina
| | - Jinjin Zhong
- Wenzhou Key Laboratory of BiophysicsWenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health)WenzhouZhejiangChina
| | - Gen Yang
- Wenzhou Key Laboratory of BiophysicsWenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
- State Key Laboratory of Nuclear Physics and TechnologySchool of PhysicsPeking UniversityBeijingChina
| | - Jiahuai Han
- State Key Laboratory of Cellular Stress BiologyInnovation Center for Cell Signaling NetworkSchool of Life SciencesXiamen UniversityXiamenFujianChina
| | - Jianwei Shuai
- Wenzhou Key Laboratory of BiophysicsWenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health)WenzhouZhejiangChina
| |
Collapse
|
10
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
11
|
Huang J, Wang C, Kuo C, Chang T, Liu Y, Hsiao T, Wang C, Yu C. Oxidative stress mediates nucleocytoplasmic shuttling of KPNA2 via AKT1-CDK1 axis-regulated S62 phosphorylation. FASEB Bioadv 2024; 6:276-288. [PMID: 39114447 PMCID: PMC11301272 DOI: 10.1096/fba.2024-00078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 06/25/2024] [Accepted: 06/28/2024] [Indexed: 08/10/2024] Open
Abstract
Karyopherin α 2 (KPNA2, importin α1), a transport factor shuttling between the nuclear and cytoplasmic compartments, is involved in the nuclear import of proteins and participates in cellular processes such as cell cycle regulation, apoptosis, and transcriptional regulation. However, it is still unclear which signaling regulates the nucleocytoplasmic distribution of KPNA2 in response to cellular stress. In this study, we report that oxidative stress increases nuclear retention of KPNA2 through alpha serine/threonine-protein kinase (AKT1)-mediated reduction of serine 62 (S62) phosphorylation. We first found that AKT1 activation was required for H2O2-induced nuclear accumulation of KPNA2. Immunoprecipitation and quantitative proteomic analysis revealed that the phosphorylation of KPNA2 at S62 was decreased under H2O2-induced oxidative stress. We showed that cyclin-dependent kinase 1 (CDK1), a kinase responsible for KPNA2 S62 phosphorylation, contributes to the localization of KPNA2 in the cytoplasm. AKT1 knockdown increased KPNA2 S62 phosphorylation and inhibited CDK1 activation. Furthermore, H2O2-induced AKT1 activation promoted nuclear KPNA2 interaction with nucleophosmin 1 (NPM1), resulting in attenuation of NPM1-mediated cyclin D1 gene transcription. Thus, we infer that the AKT1-CDK1 axis regulates the nucleocytoplasmic shuttling and function of KPNA2 through spatiotemporal regulation of KPNA2 S62 phosphorylation under oxidative stress conditions.
Collapse
Affiliation(s)
- Jie‐Xin Huang
- Graduate Institute of Biomedical Sciences, College of MedicineChang Gung UniversityTaoyuanTaiwan
| | - Chun‐I Wang
- Department of Biochemistry, School of MedicineChina Medical UniversityTaichungTaiwan
| | - Chia‐Yu Kuo
- Department of Cell and Molecular Biology, College of MedicineChang Gung UniversityTaoyuanTaiwan
| | - Ting‐Wei Chang
- Institute of Molecular Medicine, College of MedicineNational Taiwan UniversityTaipeiTaiwan
| | - Yu‐Chin Liu
- Department of Cell and Molecular Biology, College of MedicineChang Gung UniversityTaoyuanTaiwan
| | - Ting‐Feng Hsiao
- Graduate Institute of Biomedical Sciences, College of MedicineChang Gung UniversityTaoyuanTaiwan
- Molecular Medicine Research CenterChang Gung UniversityTaoyuanTaiwan
| | - Chih‐Liang Wang
- School of Medicine, College of MedicineChang Gung UniversityTaoyuanTaiwan
- Department of Thoracic MedicineChang Gung Memorial HospitalTaoyuanTaiwan
| | - Chia‐Jung Yu
- Graduate Institute of Biomedical Sciences, College of MedicineChang Gung UniversityTaoyuanTaiwan
- Department of Cell and Molecular Biology, College of MedicineChang Gung UniversityTaoyuanTaiwan
- Molecular Medicine Research CenterChang Gung UniversityTaoyuanTaiwan
- Department of Thoracic MedicineChang Gung Memorial HospitalTaoyuanTaiwan
| |
Collapse
|
12
|
Ragland JM, Place BJ. A Portable and Reusable Database Infrastructure for Mass Spectrometry, and Its Associated Toolkit (The DIMSpec Project). JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:1282-1291. [PMID: 38704738 DOI: 10.1021/jasms.4c00073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Nontargeted analysis (NTA) is a rapidly growing field of techniques that includes the identification of unknown chemical analytes in complex mixtures such as environmental, biological, and food matrices. The use of reference mass spectral databases is a key component of most NTA workflows, providing a high level of confidence for chemical identification when analytical standards are not available, yet effective interlaboratory sharing of research grade spectra remains challenging. The Database Infrastructure for Mass Spectrometry (DIMSpec) project focused on the creation of an open-source toolkit supporting storage and sharing of high-resolution mass spectra with attached sample and methodological metadata. As a demonstration of its utility, the DIMSpec toolkit was used to create a database of curated mass spectra for per- and polyfluoroalkyl substances (PFAS) generated from various sources. While the underlying toolkit is agnostic to analytical targets, this initial release (along with the database schema, mass spectral data, and database tools) should enable PFAS researchers to use these data for their own studies, including the identification of novel PFAS in the environment.
Collapse
Affiliation(s)
- Jared M Ragland
- National Institute of Standards and Technology, Material Measurement Laboratory, Chemical Sciences Division, Gaithersburg, Maryland 20899, United States
| | - Benjamin J Place
- National Institute of Standards and Technology, Material Measurement Laboratory, Chemical Sciences Division, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
13
|
Chiang Y, Welker F, Collins MJ. Spectra without stories: reporting 94% dark and unidentified ancient proteomes. OPEN RESEARCH EUROPE 2024; 4:71. [PMID: 38903702 PMCID: PMC11187534 DOI: 10.12688/openreseurope.17225.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 03/15/2024] [Indexed: 06/22/2024]
Abstract
Background Data-dependent, bottom-up proteomics is widely used for identifying proteins and peptides. However, one key challenge is that 70% of fragment ion spectra consistently fail to be assigned by conventional database searching. This 'dark matter' of bottom-up proteomics seems to affect fields where non-model organisms, low-abundance proteins, non-tryptic peptides, and complex modifications may be present. While palaeoproteomics may appear as a niche field, understanding and reporting unidentified ancient spectra require collaborative innovation in bioinformatics strategies. This may advance the analysis of complex datasets. Methods 14.97 million high-impact ancient spectra published in Nature and Science portfolios were mined from public repositories. Identification rates, defined as the proportion of assigned fragment ion spectra, were collected as part of deposited database search outputs or parsed using open-source python packages. Results and Conclusions We report that typically 94% of the published ancient spectra remain unidentified. This phenomenon may be caused by multiple factors, notably the limitations of database searching and the selection of user-defined reference data with advanced modification patterns. These 'spectra without stories' highlight the need for widespread data sharing to facilitate methodological development and minimise the loss of often irreplaceable ancient materials. Testing and validating alternative search strategies, such as open searching and de novo sequencing, may also improve overall identification rates. Hence, lessons learnt in palaeoproteomics may benefit other fields grappling with challenging data.
Collapse
Affiliation(s)
- Yun Chiang
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
- The Nice Institute of Chemistry, Universite Cote d'Azur, Nice, France
| | - Frido Welker
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Matthew James Collins
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge, England, UK
| |
Collapse
|
14
|
Wakid M, Almeida D, Aouabed Z, Rahimian R, Davoli MA, Yerko V, Leonova-Erko E, Richard V, Zahedi R, Borchers C, Turecki G, Mechawar N. Universal method for the isolation of microvessels from frozen brain tissue: A proof-of-concept multiomic investigation of the neurovasculature. Brain Behav Immun Health 2023; 34:100684. [PMID: 37822873 PMCID: PMC10562768 DOI: 10.1016/j.bbih.2023.100684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 08/29/2023] [Accepted: 09/06/2023] [Indexed: 10/13/2023] Open
Abstract
The neurovascular unit, comprised of vascular cell types that collectively regulate cerebral blood flow to meet the needs of coupled neurons, is paramount for the proper function of the central nervous system. The neurovascular unit gatekeeps blood-brain barrier properties, which experiences impairment in several central nervous system diseases associated with neuroinflammation and contributes to pathogenesis. To better understand function and dysfunction at the neurovascular unit and how it may confer inflammatory processes within the brain, isolation and characterization of the neurovascular unit is needed. Here, we describe a singular, standardized protocol to enrich and isolate microvessels from archived snap-frozen human and frozen mouse cerebral cortex using mechanical homogenization and centrifugation-separation that preserves the structural integrity and multicellular composition of microvessel fragments. For the first time, microvessels are isolated from postmortem ventromedial prefrontal cortex tissue and are comprehensively investigated as a structural unit using both RNA sequencing and Liquid Chromatography with tandem mass spectrometry (LC-MS/MS). Both the transcriptome and proteome are obtained and compared, demonstrating that the isolated brain microvessel is a robust model for the NVU and can be used to generate highly informative datasets in both physiological and disease contexts.
Collapse
Affiliation(s)
- Marina Wakid
- McGill Group for Suicide Studies, Douglas Research Centre, Montréal, Quebec, Canada
- Integrated Program in Neuroscience, McGill University, Montréal, Quebec, Canada
| | - Daniel Almeida
- McGill Group for Suicide Studies, Douglas Research Centre, Montréal, Quebec, Canada
- Integrated Program in Neuroscience, McGill University, Montréal, Quebec, Canada
| | - Zahia Aouabed
- McGill Group for Suicide Studies, Douglas Research Centre, Montréal, Quebec, Canada
| | - Reza Rahimian
- McGill Group for Suicide Studies, Douglas Research Centre, Montréal, Quebec, Canada
| | | | - Volodymyr Yerko
- McGill Group for Suicide Studies, Douglas Research Centre, Montréal, Quebec, Canada
| | - Elena Leonova-Erko
- McGill Group for Suicide Studies, Douglas Research Centre, Montréal, Quebec, Canada
| | - Vincent Richard
- Segal Cancer Proteomics Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Montréal, Quebec, Canada
| | - René Zahedi
- Segal Cancer Proteomics Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Montréal, Quebec, Canada
| | - Christoph Borchers
- Segal Cancer Proteomics Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, Montréal, Quebec, Canada
| | - Gustavo Turecki
- McGill Group for Suicide Studies, Douglas Research Centre, Montréal, Quebec, Canada
- Integrated Program in Neuroscience, McGill University, Montréal, Quebec, Canada
- Department of Psychiatry, McGill University, Montréal, Quebec, Canada
| | - Naguib Mechawar
- McGill Group for Suicide Studies, Douglas Research Centre, Montréal, Quebec, Canada
- Integrated Program in Neuroscience, McGill University, Montréal, Quebec, Canada
- Department of Psychiatry, McGill University, Montréal, Quebec, Canada
| |
Collapse
|
15
|
Prunier G, Cherkaoui M, Lysiak A, Langella O, Blein-Nicolas M, Lollier V, Benoist E, Jean G, Fertin G, Rogniaux H, Tessier D. Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides. BMC Bioinformatics 2023; 24:421. [PMID: 37940845 PMCID: PMC10631047 DOI: 10.1186/s12859-023-05555-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 10/30/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND In proteomics, the interpretation of mass spectra representing peptides carrying multiple complex modifications remains challenging, as it is difficult to strike a balance between reasonable execution time, a limited number of false positives, and a huge search space allowing any number of modifications without a priori. The scientific community needs new developments in this area to aid in the discovery of novel post-translational modifications that may play important roles in disease. RESULTS To make progress on this issue, we implemented SpecGlobX (SpecGlob eXTended to eXperimental spectra), a standalone Java application that quickly determines the best spectral alignments of a (possibly very large) list of Peptide-to-Spectrum Matches (PSMs) provided by any open modification search method, or generated by the user. As input, SpecGlobX reads a file containing spectra in MGF or mzML format and a semicolon-delimited spreadsheet describing the PSMs. SpecGlobX returns the best alignment for each PSM as output, splitting the mass difference between the spectrum and the peptide into one or more shifts while considering the possibility of non-aligned masses (a phenomenon resulting from many situations including neutral losses). SpecGlobX is fast, able to align one million PSMs in about 1.5 min on a standard desktop. Firstly, we remind the foundations of the algorithm and detail how we adapted SpecGlob (the method we previously developed following the same aim, but limited to the interpretation of perfect simulated spectra) to the interpretation of imperfect experimental spectra. Then, we highlight the interest of SpecGlobX as a complementary tool downstream to three open modification search methods on a large simulated spectra dataset. Finally, we ran SpecGlobX on a proteome-wide dataset downloaded from PRIDE to demonstrate that SpecGlobX functions just as well on simulated and experimental spectra. We then carefully analyzed a limited set of interpretations. CONCLUSIONS SpecGlobX is helpful as a decision support tool, providing keys to interpret peptides carrying complex modifications still poorly considered by current open modification search software. Better alignment of PSMs enhances confidence in the identification of spectra provided by open modification search methods and should improve the interpretation rate of spectra.
Collapse
Affiliation(s)
- Grégoire Prunier
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
| | - Mehdi Cherkaoui
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
| | - Albane Lysiak
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- Nantes Université, CNRS, LS2N, UMR 6004, 44000, Nantes, France
| | - Olivier Langella
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, PAPPSO, 91190, Gif-Sur-Yvette, France
| | - Mélisande Blein-Nicolas
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, PAPPSO, 91190, Gif-Sur-Yvette, France
| | - Virginie Lollier
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
| | - Emile Benoist
- Nantes Université, CNRS, LS2N, UMR 6004, 44000, Nantes, France
| | - Géraldine Jean
- Nantes Université, CNRS, LS2N, UMR 6004, 44000, Nantes, France
| | | | - Hélène Rogniaux
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
| | - Dominique Tessier
- INRAE, PROBE Research Infrastructure, BIBS Facility, 44300, Nantes, France.
- INRAE, UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France.
| |
Collapse
|
16
|
Mao Y, Jia L, Dong L, Shu XE, Qian SB. Start codon-associated ribosomal frameshifting mediates nutrient stress adaptation. Nat Struct Mol Biol 2023; 30:1816-1825. [PMID: 37957305 DOI: 10.1038/s41594-023-01119-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 09/07/2023] [Indexed: 11/15/2023]
Abstract
A translating ribosome is typically thought to follow the reading frame defined by the selected start codon. Using super-resolution ribosome profiling, here we report pervasive out-of-frame translation immediately from the start codon. Start codon-associated ribosomal frameshifting (SCARF) stems from the slippage of ribosomes during the transition from initiation to elongation. Using a massively paralleled reporter assay, we uncovered sequence elements acting as SCARF enhancers or repressors, implying that start codon recognition is coupled with reading frame fidelity. This finding explains thousands of mass spectrometry spectra that are unannotated in the human proteome. Mechanistically, we find that the eukaryotic initiation factor 5B (eIF5B) maintains the reading frame fidelity by stabilizing initiating ribosomes. Intriguingly, amino acid starvation induces SCARF by proteasomal degradation of eIF5B. The stress-induced SCARF protects cells from starvation by enabling amino acid recycling and selective mRNA translation. Our findings illustrate a beneficial effect of translational 'noise' in nutrient stress adaptation.
Collapse
Affiliation(s)
- Yuanhui Mao
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
- Liangzhu Laboratory, Zhejiang University, Hangzhou, China
| | - Longfei Jia
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
| | - Leiming Dong
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
| | - Xin Erica Shu
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA
| | - Shu-Bing Qian
- Division of Nutritional Sciences, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
17
|
Chen Y, Du Z, Zhao H, Fang W, Liu T, Zhang Y, Zhang W, Qin W. SPPUSM: An MS/MS spectra merging strategy for improved low-input and single-cell proteome identification. Anal Chim Acta 2023; 1279:341793. [PMID: 37827637 DOI: 10.1016/j.aca.2023.341793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 08/26/2023] [Accepted: 09/06/2023] [Indexed: 10/14/2023]
Abstract
Single and rare cell analysis provides unique insights into the investigation of biological processes and disease progress by resolving the cellular heterogeneity that is masked by bulk measurements. Although many efforts have been made, the techniques used to measure the proteome in trace amounts of samples or in single cells still lag behind those for DNA and RNA due to the inherent non-amplifiable nature of proteins and the sensitivity limitation of current mass spectrometry. Here, we report an MS/MS spectra merging strategy termed SPPUSM (same precursor-produced unidentified spectra merging) for improved low-input and single-cell proteome data analysis. In this method, all the unidentified MS/MS spectra from multiple test files are first extracted. Then, the corresponding MS/MS spectra produced by the same precursor ion from different files are matched according to their precursor mass and retention time (RT) and are merged into one new spectrum. The newly merged spectra with more fragment ions are next searched against the database to increase the MS/MS spectra identification and proteome coverage. Further improvement can be achieved by increasing the number of test files and spectra to be merged. Up to 18.2% improvement in protein identification was achieved for 1 ng HeLa peptides by SPPUSM. Reliability evaluation by the "entrapment database" strategy using merged spectra from human and E. coli revealed a marginal error rate for the proposed method. For application in single cell proteome (SCP) study, identification enhancement of 28%-61% was achieved for proteins for different SCP data. Furthermore, a lower abundance was found for the SPPUSM-identified peptides, indicating its potential for more sensitive low sample input and SCP studies.
Collapse
Affiliation(s)
- Yongle Chen
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Zhuokun Du
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Hongxian Zhao
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Wei Fang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Tong Liu
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Yangjun Zhang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China
| | - Wanjun Zhang
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China; College of Chemistry and Materials Science, Hebei University, Baoding, 071002, China
| | - Weijie Qin
- State Key Laboratory of Proteomics, Beijing Institute of Lifeomics, National Center for Protein Sciences Beijing, Beijing Proteome Research Center, Beijing, 102206, PR China; College of Chemistry and Materials Science, Hebei University, Baoding, 071002, China.
| |
Collapse
|
18
|
Huang H, Zhang Y, Gui L, Zhang L, Cai M, Sheng Y. Proteomic analyses reveal cystatin c is a promising biomarker for evaluation of systemic lupus erythematosus. Clin Proteomics 2023; 20:43. [PMID: 37853350 PMCID: PMC10583312 DOI: 10.1186/s12014-023-09434-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 10/04/2023] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND Systemic lupus erythematosus (SLE) is an autoimmune disease with multiple organ involvement, especially the kidneys. However, the underlying mechanism remains unclear, and accurate biomarkers are still lacking. This study aimed to identify biomarkers to assess organ damage and disease activity in patients with SLE using quantitative proteomics. METHODS Proteomic analysis was performed using mass spectrometry in 15 patients with SLE and 15 age-matched healthy controls. Proteomic profiles were compared in four main subtypes: SLE with proteinuria (SLE-PN), SLE without proteinuria (SLE-non-PN), SLE with anti-dsDNA positivity (SLE-DP), and SLE with anti-dsDNA negativity (SLE-non-DP). Gene ontology biological process analysis revealed differentially expressed protein networks. Cystatin C (CysC) levels were measured in 200 patients with SLE using an immunoturbidimetric assay. Clinical and laboratory data were collected to assess their correlation with serum CysC levels. RESULTS Proteomic analysis showed that upregulated proteins in both the SLE-PN and SLE-DP groups were mainly mapped to neutrophil activation networks. Moreover, CysC from neutrophil activation networks was upregulated in both the SLE-PN and SLE-DP groups. The associations of serum CysC level with proteinuria, anti-dsDNA positivity, lower complement C3 levels, and SLE disease activity index score in patients with SLE were further validated in a large independent cohort. CONCLUSIONS Neutrophil activation is more prominent in SLE with proteinuria and anti-dsDNA positivity, and CysC is a promising marker for monitoring organ damage and disease activity in SLE.
Collapse
Affiliation(s)
- He Huang
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
| | - Yukun Zhang
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
| | - Lan Gui
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China
| | - Li Zhang
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Minglong Cai
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China.
| | - Yujun Sheng
- Department of Dermatology, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, China.
- Department of Dermatology, China-Japan Friendship Hospital, Beijing, China.
| |
Collapse
|
19
|
Wu L, Hoque A, Lam H. Spectroscape enables real-time query and visualization of a spectral archive in proteomics. Nat Commun 2023; 14:6267. [PMID: 37805652 PMCID: PMC10560257 DOI: 10.1038/s41467-023-42006-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/26/2023] [Indexed: 10/09/2023] Open
Abstract
In proteomics, spectral archives organize the enormous amounts of publicly available peptide tandem mass spectra by similarity, offering opportunities for error correction and novel discoveries. Here we adapt an indexing algorithm developed by Facebook for organizing online multimedia resources to tandem mass spectra and achieve practically instantaneous retrieval and clustering of approximate nearest neighbors in a large spectral archive. An interactive web-based graphical user interface enables the user to view a query spectrum in its clustered neighborhood, which facilitates contextual validation of peptide identifications and exploration of the dark proteome.
Collapse
Affiliation(s)
- Long Wu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
- Department of Electrical and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Ayman Hoque
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.
| |
Collapse
|
20
|
Skeffington A, Fischer A, Sviben S, Brzezinka M, Górka M, Bertinetti L, Woehle C, Huettel B, Graf A, Scheffel A. A joint proteomic and genomic investigation provides insights into the mechanism of calcification in coccolithophores. Nat Commun 2023; 14:3749. [PMID: 37353496 PMCID: PMC10290126 DOI: 10.1038/s41467-023-39336-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 06/05/2023] [Indexed: 06/25/2023] Open
Abstract
Coccolithophores are globally abundant, calcifying microalgae that have profound effects on marine biogeochemical cycles, the climate, and life in the oceans. They are characterized by a cell wall of CaCO3 scales called coccoliths, which may contribute to their ecological success. The intricate morphologies of coccoliths are of interest for biomimetic materials synthesis. Despite the global impact of coccolithophore calcification, we know little about the molecular machinery underpinning coccolithophore biology. Working on the model Emiliania huxleyi, a globally distributed bloom-former, we deploy a range of proteomic strategies to identify coccolithogenesis-related proteins. These analyses are supported by a new genome, with gene models derived from long-read transcriptome sequencing, which revealed many novel proteins specific to the calcifying haptophytes. Our experiments provide insights into proteins involved in various aspects of coccolithogenesis. Our improved genome, complemented with transcriptomic and proteomic data, constitutes a new resource for investigating fundamental aspects of coccolithophore biology.
Collapse
Affiliation(s)
- Alastair Skeffington
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
- Biological and Environmental Sciences, University of Stirling, Stirling, FK9 4LA, UK
| | - Axel Fischer
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Sanja Sviben
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Magdalena Brzezinka
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Michał Górka
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Luca Bertinetti
- Max Planck Institute of Colloids and Interfaces, Potsdam-Golm, 14476, Germany
| | - Christian Woehle
- Max Planck Institute for Plant Breeding Research, Max Planck-Genome-Centre Cologne, Cologne, 50829, Germany
| | - Bruno Huettel
- Max Planck Institute for Plant Breeding Research, Max Planck-Genome-Centre Cologne, Cologne, 50829, Germany
| | - Alexander Graf
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - André Scheffel
- Technische Universität Dresden, Faculty of Biology, 01307, Dresden, Germany.
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany.
| |
Collapse
|
21
|
Nurmohamed NS, Kraaijenhof JM, Mayr M, Nicholls SJ, Koenig W, Catapano AL, Stroes ESG. Proteomics and lipidomics in atherosclerotic cardiovascular disease risk prediction. Eur Heart J 2023; 44:1594-1607. [PMID: 36988179 PMCID: PMC10163980 DOI: 10.1093/eurheartj/ehad161] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 01/04/2023] [Accepted: 03/04/2023] [Indexed: 03/30/2023] Open
Abstract
Given the limited accuracy of clinically used risk scores such as the Systematic COronary Risk Evaluation 2 system and the Second Manifestations of ARTerial disease 2 risk scores, novel risk algorithms determining an individual's susceptibility of future incident or recurrent atherosclerotic cardiovascular disease (ASCVD) risk are urgently needed. Due to major improvements in assay techniques, multimarker proteomic and lipidomic panels hold the promise to be reliably assessed in a high-throughput routine. Novel machine learning-based approaches have facilitated the use of this high-dimensional data resulting from these analyses for ASCVD risk prediction. More than a dozen of large-scale retrospective studies using different sets of biomarkers and different statistical methods have consistently demonstrated the additive prognostic value of these panels over traditionally used clinical risk scores. Prospective studies are needed to determine the clinical utility of a biomarker panel in clinical ASCVD risk stratification. When combined with the genetic predisposition captured with polygenic risk scores and the actual ASCVD phenotype observed with coronary artery imaging, proteomics and lipidomics can advance understanding of the complex multifactorial causes underlying an individual's ASCVD risk.
Collapse
Affiliation(s)
- Nick S Nurmohamed
- Department of Vascular Medicine, Amsterdam University Medical Centers, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
- Department of Cardiology, Amsterdam University Medical Centers, Vrije Universiteit Amsterdam, De Boelelaan 1117, 1081 HV Amsterdam, The Netherlands
| | - Jordan M Kraaijenhof
- Department of Vascular Medicine, Amsterdam University Medical Centers, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| | - Manuel Mayr
- School of Cardiovascular and Metabolic Medicine & Science, King’s College London, Strand, London WC2R 2LS, UK
- Department of Internal Medicine II, Division of Cardiology, Medical University of Vienna, Währinger Gürtel, 18-201090 Vienna, Austria
| | - Stephen J Nicholls
- Victorian Heart Institute, Monash University, 631 Blackburn Rd, Clayton, VIC 3168, Australia
| | - Wolfgang Koenig
- Deutsches Herzzentrum München, Technische Universität München, Lazarettstraße 36, 80636 München, Germany
- German Centre for Cardiovascular Research (DZHK e.V.), partner site Munich Heart Alliance, Pettenkoferstr. 8a & 9, 80336 Munich, Germany
- Institute of Epidemiology and Medical Biometry, University of Ulm, Helmholtzstr. 22, 89081 Ulm, Germany
| | - Alberico L Catapano
- Department of Pharmacological and Biomolecular Sciences, University of Milan, Via Balzaretti 9, 20133 Milan, Italy
- IRCCS Multimedica, Via Milanese, 300, 20099 Sesto San Giovanni (MI), Italy
| | - Erik S G Stroes
- Department of Vascular Medicine, Amsterdam University Medical Centers, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
| |
Collapse
|
22
|
DiagnoMass: A proteomics hub for pinpointing discriminative spectral clusters. J Proteomics 2023; 277:104853. [PMID: 36804625 DOI: 10.1016/j.jprot.2023.104853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/14/2023] [Indexed: 02/18/2023]
Abstract
MOTIVATION There are several well-established paradigms for identifying and pinpointing discriminative peptides/proteins using shotgun proteomic data; examples are peptide-spectrum matching, de novo sequencing, open searches, and even hybrid approaches. Such an arsenal of complementary paradigms can provide deep data coverage, albeit some unidentified discriminative peptides remain. RESULTS We present DiagnoMass, software tool that groups similar spectra into spectral clusters and then shortlists those clusters that are discriminative for biological conditions. DiagnoMass then communicates with proteomic tools to attempt the identification of such clusters. We demonstrate the effectiveness of DiagnoMass by analyzing proteomic data from Escherichia coli, Salmonella, and Shigella, listing many high-quality discriminative spectral clusters that had thus far remained unidentified by widely adopted proteomic tools. DiagnoMass can also classify proteomic profiles. We anticipate the use of DiagnoMass as a vital tool for pinpointing biomarkers. AVAILABILITY DiagnoMass and related documentation, including a usage protocol, are available at http://www.diagnomass.com.
Collapse
|
23
|
Joh Y, Lee K, Kim H, Park H. Progressive search in tandem mass spectrometry. BMC Bioinformatics 2023; 24:94. [PMID: 36918816 PMCID: PMC10015927 DOI: 10.1186/s12859-023-05222-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/03/2023] [Indexed: 03/16/2023] Open
Abstract
BACKGROUND High-throughput Proteomics has been accelerated by (tandem) mass spectrometry. However, the slow speed of mass spectra analysis prevents the analysis results from being up-to-date. Tandem mass spectrometry database search requires O(|S||D|) time where S is the set of spectra and D is the set of peptides in a database. With usual values of |S| and |D|, database search is quite time consuming. Meanwhile, the database for search is usually updated every month, with 0.5-2% changes. Although the change in the database is usually very small, it may cause extensive changes in the overall analysis results because individual PSM scores such as deltaCn and E-value depend on the entire search results. Therefore, to keep the search results up-to-date, one needs to perform database search from scratch every time the database is updated, which is very inefficient. RESULTS Thus, we present a very efficient method to keep the search results up-to-date where the results are the same as those achieved by the normal search from scratch. This method, called progressive search, runs in O(|S||ΔD|) time on average where ΔD is the difference between the old and the new databases. The experimental results show that the progressive search is up to 53.9 times faster for PSM update only and up to 16.5 times faster for both PSM and E-value update. CONCLUSIONS Progressive search is a novel approach to efficiently obtain analysis results for updated database in tandem mass spectrometry. Compared to performing a normal search from scratch, progressive search achieves the same results much faster. Progressive search is freely available at: https://isa.hanyang.ac.kr/ProgSearch.html .
Collapse
Affiliation(s)
- Yoonsung Joh
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea
| | - Kangbae Lee
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea
| | - Hyunwoo Kim
- Biomedical Informatics Team, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea
| | - Heejin Park
- Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea.
| |
Collapse
|
24
|
Haukamp FJ, Hartmann ZM, Pich A, Kuhn J, Blasczyk R, Stieglitz F, Bade-Döding C. HLA-B*57:01/Carbamazepine-10,11-Epoxide Association Triggers Upregulation of the NFκB and JAK/STAT Pathways. Cells 2023; 12:cells12050676. [PMID: 36899812 PMCID: PMC10000580 DOI: 10.3390/cells12050676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 02/17/2023] [Accepted: 02/18/2023] [Indexed: 02/23/2023] Open
Abstract
Measure of drug-mediated immune reactions that are dependent on the patient's genotype determine individual medication protocols. Despite extensive clinical trials prior to the license of a specific drug, certain patient-specific immune reactions cannot be reliably predicted. The need for acknowledgement of the actual proteomic state for selected individuals under drug administration becomes obvious. The well-established association between certain HLA molecules and drugs or their metabolites has been analyzed in recent years, yet the polymorphic nature of HLA makes a broad prediction unfeasible. Dependent on the patient's genotype, carbamazepine (CBZ) hypersensitivities can cause diverse disease symptoms as maculopapular exanthema, drug reaction with eosinophilia and systemic symptoms or the more severe diseases Stevens-Johnson-Syndrome or toxic epidermal necrolysis. Not only the association between HLA-B*15:02 or HLA-A*31:01 but also between HLA-B*57:01 and CBZ administration could be demonstrated. This study aimed to illuminate the mechanism of HLA-B*57:01-mediated CBZ hypersensitivity by full proteome analysis. The main CBZ metabolite EPX introduced drastic proteomic alterations as the induction of inflammatory processes through the upstream kinase ERBB2 and the upregulation of NFκB and JAK/STAT pathway implying a pro-apoptotic, pro-necrotic shift in the cellular response. Anti-inflammatory pathways and associated effector proteins were downregulated. This disequilibrium of pro- and anti-inflammatory processes clearly explain fatal immune reactions following CBZ administration.
Collapse
Affiliation(s)
- Funmilola Josephine Haukamp
- Institute of Transfusion Medicine and Transplant Engineering, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
- Correspondence: ; Tel.: +49-511-532-9774; Fax: +49-511-532-2079
| | - Zoe Maria Hartmann
- Institute of Transfusion Medicine and Transplant Engineering, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| | - Andreas Pich
- Institute of Toxicology, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
- Core Facility Proteomics, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| | - Joachim Kuhn
- Institute for Laboratory and Transfusion Medicine, Heart and Diabetes Center North Rhine-Westphalia, Ruhr University Bochum, Georgstraße 11, 32545 Bad Oeynhausen, Germany
| | - Rainer Blasczyk
- Institute of Transfusion Medicine and Transplant Engineering, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| | - Florian Stieglitz
- Institute of Transfusion Medicine and Transplant Engineering, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| | - Christina Bade-Döding
- Institute of Transfusion Medicine and Transplant Engineering, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| |
Collapse
|
25
|
Mao Y, Jia L, Dong L, Shu XE, Qian SB. Start codon-associated ribosomal frameshifting mediates nutrient stress adaptation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528768. [PMID: 36824937 PMCID: PMC9949036 DOI: 10.1101/2023.02.15.528768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
A translating ribosome is typically thought to follow the reading frame defined by the selected start codon. Using super-resolution ribosome profiling, here we report pervasive out-of-frame translation immediately from the start codon. The start codon-associated ribosome frameshifting (SCARF) stems from the slippage of ribosomes during the transition from initiation to elongation. Using a massively paralleled reporter assay, we uncovered sequence elements acting as SCARF enhancers or repressors, implying that start codon recognition is coupled with reading frame fidelity. This finding explains thousands of mass spectrometry spectra unannotated from human proteome. Mechanistically, we find that the eukaryotic initiation factor 5B (eIF5B) maintains the reading frame fidelity by stabilizing initiating ribosomes. Intriguingly, amino acid starvation induces SCARF by proteasomal degradation of eIF5B. The stress-induced SCARF protects cells from starvation by enabling amino acid recycling and selective mRNA translation. Our findings illustrate a beneficial effect of translational "noise" in nutrient stress adaptation.
Collapse
|
26
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
27
|
McDonnell K, Howley E, Abram F. Critical evaluation of the use of artificial data for machine learning based de novo peptide identification. Comput Struct Biotechnol J 2023; 21:2732-2743. [PMID: 37168871 PMCID: PMC10165132 DOI: 10.1016/j.csbj.2023.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/16/2023] [Accepted: 04/16/2023] [Indexed: 05/13/2023] Open
Abstract
Proteins are essential components of all living cells and so the study of their in situ expression, proteomics, has wide reaching applications. Peptide identification in proteomics typically relies on matching high resolution tandem mass spectra to a protein database but can also be performed de novo. While artificial spectra have been successfully incorporated into database search pipelines to increase peptide identification rates, little work has been done to investigate the utility of artificial spectra in the context of de novo peptide identification. Here, we perform a critical analysis of the use of artificial data for the training and evaluation of de novo peptide identification algorithms. First, we classify the different fragment ion types present in real spectra and then estimate the number of spurious matches using random peptides. We then categorise the different types of noise present in real spectra. Finally, we transfer this knowledge to artificial data and test the performance of a state-of-the-art de novo peptide identification algorithm trained using artificial spectra with and without relevant noise addition. Noise supplementation increased artificial training data performance from 30% to 77% of real training data peptide recall. While real data performance was not fully replicated, this work provides the first steps towards an artificial spectrum framework for the training and evaluation of de novo peptide identification algorithms. Further enhanced artificial spectra may allow for more in depth analysis of de novo algorithms as well as alleviating the reliance on database searches for training data.
Collapse
Affiliation(s)
- Kevin McDonnell
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- School of Computer Science, University of Galway, Ireland
- Corresponding author at: Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland.
| | - Enda Howley
- School of Computer Science, University of Galway, Ireland
| | - Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- Corresponding author.
| |
Collapse
|
28
|
Miller RM, Millikin RJ, Rolfs Z, Shortreed MR, Smith LM. Enhanced Proteomic Data Analysis with MetaMorpheus. Methods Mol Biol 2023; 2426:35-66. [PMID: 36308684 PMCID: PMC9623450 DOI: 10.1007/978-1-0716-1967-4_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
MetaMorpheus is a free and open-source software program dedicated to the comprehensive analysis of proteomic data. In bottom-up proteomics, protein samples are digested into peptides prior to chromatographic separation and tandem mass spectrometric analysis. The resulting fragmentation spectra are subsequently analyzed with search software programs to obtain peptide identifications and infer the presence of proteins in the samples. MetaMorpheus seeks to maximize the information gleaned from proteomic data through the use of (a) mass calibration, (b) post-translational modification discovery, (c) multiple search algorithms, which aid in the analysis of data from traditional, crosslinking, and glycoproteomic experiments, (d) isotope-based or label-free quantification, (e) multi-protease protein inference, and (f) spectral annotation and data visualization capabilities. This protocol provides detailed descriptions of how use MetaMorpheus and how to customize data analysis workflows using MetaMorpheus tasks to meet the specific needs of the user.
Collapse
Affiliation(s)
- Rachel M Miller
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | - Robert J Millikin
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | - Zach Rolfs
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | | | - Lloyd M Smith
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA.
| |
Collapse
|
29
|
McDonnell K, Abram F, Howley E. Application of a Novel Hybrid CNN-GNN for Peptide Ion Encoding. J Proteome Res 2022; 22:323-333. [PMID: 36534699 PMCID: PMC9903319 DOI: 10.1021/acs.jproteome.2c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Almost all state-of-the-art de novo peptide sequencing algorithms now use machine learning models to encode fragment peaks and hence identify amino acids in mass spectrometry (MS) spectra. Previous work has highlighted how the inherent MS challenges of noise and missing peptide peaks detrimentally affect the performance of these models. In the present research we extracted and evaluated the encoding modules from 3 state-of-the-art de novo peptide sequencing algorithms. We also propose a convolutional neural network-graph neural network machine learning model for encoding peptide ions in tandem MS spectra. We compared the proposed encoding module to those used in the state-of-the-art de novo peptide sequencing algorithms by assessing their ability to identify b-ions and y-ions in MS spectra. This included a comprehensive evaluation in both real and artificial data across various levels of noise and missing peptide peaks. The proposed model performed best across all data sets using two different metrics (area under the receiver operating characteristic curve (AUC) and average precision). The work also highlighted the effect of including additional features such as intensity rank in these encoding modules as well as issues with using the AUC as a metric. This work is of significance to those designing future de novo peptide identification algorithms as it is the first step toward a new approach.
Collapse
Affiliation(s)
- Kevin McDonnell
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland,Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland,E-mail:
| | - Florence Abram
- Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland
| | - Enda Howley
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland
| |
Collapse
|
30
|
Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics 2022; 18:94. [PMID: 36409434 PMCID: PMC10284100 DOI: 10.1007/s11306-022-01947-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/19/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. AIM OF REVIEW We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. KEY SCIENTIFIC CONCEPTS OF REVIEW This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.
Collapse
Affiliation(s)
- Wout Bittremieux
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, 92507, USA
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
31
|
Mohaupt P, Roucou X, Delaby C, Vialaret J, Lehmann S, Hirtz C. The alternative proteome in neurobiology. Front Cell Neurosci 2022; 16:1019680. [PMID: 36467612 PMCID: PMC9712206 DOI: 10.3389/fncel.2022.1019680] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 11/02/2022] [Indexed: 10/13/2023] Open
Abstract
Translation involves the biosynthesis of a protein sequence following the decoding of the genetic information embedded in a messenger RNA (mRNA). Typically, the eukaryotic mRNA was considered to be inherently monocistronic, but this paradigm is not in agreement with the translational landscape of cells, tissues, and organs. Recent ribosome sequencing (Ribo-seq) and proteomics studies show that, in addition to currently annotated reference proteins (RefProt), other proteins termed alternative proteins (AltProts), and microproteins are encoded in regions of mRNAs thought to be untranslated or in transcripts annotated as non-coding. This experimental evidence expands the repertoire of functional proteins within a cell and potentially provides important information on biological processes. This review explores the hitherto overlooked alternative proteome in neurobiology and considers the role of AltProts in pathological and healthy neuromolecular processes.
Collapse
Affiliation(s)
- Pablo Mohaupt
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Xavier Roucou
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Constance Delaby
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Jérôme Vialaret
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Sylvain Lehmann
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| | - Christophe Hirtz
- LBPC-PPC, Université de Montpellier, IRMB CHU de Montpellier, INM INSERM, Montpellier, France
| |
Collapse
|
32
|
Urbiola-Salvador V, Miroszewska D, Jabłońska A, Qureshi T, Chen Z. Proteomics approaches to characterize the immune responses in cancer. BIOCHIMICA ET BIOPHYSICA ACTA. MOLECULAR CELL RESEARCH 2022; 1869:119266. [PMID: 35390423 DOI: 10.1016/j.bbamcr.2022.119266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 03/01/2022] [Accepted: 03/28/2022] [Indexed: 06/14/2023]
Abstract
Despite the dynamic development of cancer research, annually millions of people die of cancer. The human immune system is the major 'guard' against tumor development. Unfortunately, cancer cells have the ability to evade the immune system and continue to grow. The proper understanding of the intricate immune response in tumorigenesis remains the holy grail of cancer immunology and designing effective immunotherapy. To decode the immune responses in cancer, in recent years, proteomics studies have received considerable attention. Proteomics studies focus on the detection and quantification of proteins, which are the effectors of biological functions, and as such, are proven to reflect the cell state more accurately, in comparison to genomic or transcriptomic studies. In this review, we discuss the proteomics studies applied to characterize the immune responses in cancer and tumor immune microenvironment heterogeneity. Further, we describe emerging single-cell proteomics approaches that have the potential to be applied in cancer immunity studies.
Collapse
Affiliation(s)
- Víctor Urbiola-Salvador
- Intercollegiate Faculty of Biotechnology of University of Gdańsk and Medical University of Gdańsk, University of Gdańsk, Poland.
| | - Dominika Miroszewska
- Intercollegiate Faculty of Biotechnology of University of Gdańsk and Medical University of Gdańsk, University of Gdańsk, Poland.
| | - Agnieszka Jabłońska
- Intercollegiate Faculty of Biotechnology of University of Gdańsk and Medical University of Gdańsk, University of Gdańsk, Poland.
| | - Talha Qureshi
- Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland.
| | - Zhi Chen
- Intercollegiate Faculty of Biotechnology of University of Gdańsk and Medical University of Gdańsk, University of Gdańsk, Poland; Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, Finland.
| |
Collapse
|
33
|
Bob K, Teschner D, Kemmer T, Gomez-Zepeda D, Tenzer S, Schmidt B, Hildebrandt A. Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data. BMC Bioinformatics 2022; 23:287. [PMID: 35858828 PMCID: PMC9301846 DOI: 10.1186/s12859-022-04833-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 07/08/2022] [Indexed: 11/10/2022] Open
Abstract
Background Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties. Results In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs. Conclusions Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data. Availability Generated data and code are available at https://github.com/hildebrandtlab/mzBucket. Raw data is available at https://zenodo.org/record/5036526. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04833-5.
Collapse
Affiliation(s)
- Konstantin Bob
- Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany
| | - David Teschner
- Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany
| | - Thomas Kemmer
- Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany
| | - David Gomez-Zepeda
- Institute for Immunology, University Medical Center of the Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.,Immunoproteomics Unit, Helmholtz-Institute for Translational Oncology (HI-TRON) Mainz, D-55131, Mainz, Germany
| | - Stefan Tenzer
- Institute for Immunology, University Medical Center of the Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.,Immunoproteomics Unit, Helmholtz-Institute for Translational Oncology (HI-TRON) Mainz, D-55131, Mainz, Germany
| | - Bertil Schmidt
- Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany
| | - Andreas Hildebrandt
- Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.
| |
Collapse
|
34
|
Perez-Riverol Y. Proteomic repository data submission, dissemination, and reuse: key messages. Expert Rev Proteomics 2022; 19:297-310. [PMID: 36529941 PMCID: PMC7614296 DOI: 10.1080/14789450.2022.2160324] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 12/07/2022] [Indexed: 12/23/2022]
Abstract
INTRODUCTION The creation of ProteomeXchange data workflows in 2012 transformed the field of proteomics, consisting of the standardization of data submission and dissemination and enabling the widespread reanalysis of public MS proteomics data worldwide. ProteomeXchange has triggered a growing trend toward public dissemination of proteomics data, facilitating the assessment, reuse, comparative analyses, and extraction of new findings from public datasets. By 2022, the consortium is integrated by PRIDE, PeptideAtlas, MassIVE, jPOST, iProX, and Panorama Public. AREAS COVERED Here, we review and discuss the current ecosystem of resources, guidelines, and file formats for proteomics data dissemination and reanalysis. Special attention is drawn to new exciting quantitative and post-translational modification-oriented resources. The challenges and future directions on data depositions including the lack of metadata and cloud-based and high-performance software solutions for fast and reproducible reanalysis of the available data are discussed. EXPERT OPINION The success of ProteomeXchange and the amount of proteomics data available in the public domain have triggered the creation and/or growth of other protein knowledgebase resources. Data reuse is a leading, active, and evolving field; supporting the creation of new formats, tools, and workflows to rediscover and reshape the public proteomics data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
35
|
Sánchez-Álvarez M, del Pozo MÁ, Bosch M, Pol A. Insights Into the Biogenesis and Emerging Functions of Lipid Droplets From Unbiased Molecular Profiling Approaches. Front Cell Dev Biol 2022; 10:901321. [PMID: 35756995 PMCID: PMC9213792 DOI: 10.3389/fcell.2022.901321] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 05/17/2022] [Indexed: 11/30/2022] Open
Abstract
Lipid droplets (LDs) are spherical, single sheet phospholipid-bound organelles that store neutral lipids in all eukaryotes and some prokaryotes. Initially conceived as relatively inert depots for energy and lipid precursors, these highly dynamic structures play active roles in homeostatic functions beyond metabolism, such as proteostasis and protein turnover, innate immunity and defense. A major share of the knowledge behind this paradigm shift has been enabled by the use of systematic molecular profiling approaches, capable of revealing and describing these non-intuitive systems-level relationships. Here, we discuss these advances and some of the challenges they entail, and highlight standing questions in the field.
Collapse
Affiliation(s)
- Miguel Sánchez-Álvarez
- Cell and Developmental Biology Area, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Miguel Ángel del Pozo
- Cell and Developmental Biology Area, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Marta Bosch
- Lipid Trafficking and Disease Group, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Department of Biomedical Sciences, Faculty of Medicine, Universitat de Barcelona, Barcelona, Spain
| | - Albert Pol
- Lipid Trafficking and Disease Group, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Department of Biomedical Sciences, Faculty of Medicine, Universitat de Barcelona, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
36
|
A neural network for large-scale clustering of peptide mass spectra. Nat Methods 2022; 19:658-659. [DOI: 10.1038/s41592-022-01497-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
37
|
Bittremieux W, May DH, Bilmes J, Noble WS. A learned embedding for efficient joint analysis of millions of mass spectra. Nat Methods 2022; 19:675-678. [DOI: 10.1038/s41592-022-01496-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 04/14/2022] [Indexed: 11/09/2022]
|
38
|
Tian R, Feng X, Wei L, Dai D, Ma Y, Pan H, Ge S, Bai L, Ke C, Liu Y, Lang L, Zhu S, Sun H, Yu Y, Chen X. A genetic engineering strategy for editing near-infrared-II fluorophores. Nat Commun 2022; 13:2853. [PMID: 35606352 PMCID: PMC9127093 DOI: 10.1038/s41467-022-30304-9] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 04/26/2022] [Indexed: 01/03/2023] Open
Abstract
AbstractThe second near-infrared (NIR-II) window is a fundamental modality for deep-tissue in vivo imaging. However, it is challenging to synthesize NIR-II probes with high quantum yields (QYs), good biocompatibility, satisfactory pharmacokinetics, and tunable biological properties. Conventional long-wavelength probes, such as inorganic probes (which often contain heavy metal atoms in their scaffolds) and organic dyes (which contain large π-conjugated groups), exhibit poor biosafety, low QYs, and/or uncontrollable pharmacokinetic properties. Herein, we present a bioengineering strategy that can replace the conventional chemical synthesis methods for generating NIR-II contrast agents. We use a genetic engineering technique to obtain a series of albumin fragments and recombinant proteins containing one or multiple domains that form covalent bonds with chloro-containing cyanine dyes. These albumin variants protect the inserted dyes and remarkably enhance their brightness. The albumin variants can also be genetically edited to develop size-tunable complexes with precisely tailored pharmacokinetics. The proteins can also be conjugated to biofunctional molecules without impacting the complexed dyes. This combination of albumin mutants and clinically-used cyanine dyes can help widen the clinical application prospects of NIR-II fluorophores.
Collapse
|
39
|
Luo X, Bittremieux W, Griss J, Deutsch EW, Sachsenberg T, Levitsky LI, Ivanov MV, Bubis JA, Gabriels R, Webel H, Sanchez A, Bai M, Käll L, Perez-Riverol Y. A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics. J Proteome Res 2022; 21:1566-1574. [PMID: 35549218 PMCID: PMC9171829 DOI: 10.1021/acs.jproteome.2c00069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Spectrum clustering
is a powerful strategy to minimize redundant
mass spectra by grouping them based on similarity, with the aim of
forming groups of mass spectra from the same repeatedly measured analytes.
Each such group of near-identical spectra can be represented by its
so-called consensus spectrum for downstream processing. Although several
algorithms for spectrum clustering have been adequately benchmarked
and tested, the influence of the consensus spectrum generation step
is rarely evaluated. Here, we present an implementation and benchmark
of common consensus spectrum algorithms, including spectrum averaging,
spectrum binning, the most similar spectrum, and the best-identified
spectrum. We have analyzed diverse public data sets using two different
clustering algorithms (spectra-cluster and MaRaCluster) to evaluate
how the consensus spectrum generation procedure influences downstream
peptide identification. The BEST and BIN methods were found the most
reliable methods for consensus spectrum generation, including for
data sets with post-translational modifications (PTM) such as phosphorylation.
All source code and data of the present study are freely available
on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
Collapse
Affiliation(s)
- Xiyang Luo
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, 400065 Chongqing, China
| | - Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, U.K.,Department of Dermatology, Medical University of Vienna, 1090 Vienna, Austria
| | - Eric W Deutsch
- Institute for Systems Biology (ISB), Seattle, Washington 98109, United States
| | - Timo Sachsenberg
- Applied Bioinformatics, Department for Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Germany
| | - Lev I Levitsky
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Mark V Ivanov
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Julia A Bubis
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 142432, Russia
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, B-9052 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, B-9000 Ghent, Belgium
| | - Henry Webel
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen DK-2200, Denmark
| | - Aniel Sanchez
- Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Skåne University Hospital Malmö, 20502 Malmö, Sweden
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, 400065 Chongqing, China
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, Royal Institute of Technology - KTH, Box 1031, 17121 Solna, Sweden
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, U.K
| |
Collapse
|
40
|
Rex DB, Patil AH, Modi PK, Kandiyil MK, Kasaragod S, Pinto SM, Tanneru N, Sijwali PS, Prasad TSK. Dissecting Plasmodium yoelii Pathobiology: Proteomic Approaches for Decoding Novel Translational and Post-Translational Modifications. ACS OMEGA 2022; 7:8246-8257. [PMID: 35309442 PMCID: PMC8928344 DOI: 10.1021/acsomega.1c03892] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Malaria is a vector-borne disease. It is caused by Plasmodium parasites. Plasmodium yoelii is a rodent model parasite, primarily used for studying parasite development in liver cells and vectors. To better understand parasite biology, we carried out a high-throughput-based proteomic analysis of P. yoelii. From the same mass spectrometry (MS)/MS data set, we also captured several post-translational modified peptides by following a bioinformatics analysis without any prior enrichment. Further, we carried out a proteogenomic analysis, which resulted in improvements to some of the existing gene models along with the identification of several novel genes. Analysis of proteome and post-translational modifications (PTMs) together resulted in the identification of 3124 proteins. The identified PTMs were found to be enriched in mitochondrial metabolic pathways. Subsequent bioinformatics analysis provided an insight into proteins associated with metabolic regulatory mechanisms. Among these, the tricarboxylic acid (TCA) cycle and the isoprenoid synthesis pathway are found to be essential for parasite survival and drug resistance. The proteogenomic analysis discovered 43 novel protein-coding genes. The availability of an in-depth proteomic landscape of a malaria pathogen model will likely facilitate further molecular-level investigations on pre-erythrocytic stages of malaria.
Collapse
Affiliation(s)
- Devasahayam
Arokia Balaya Rex
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Arun H. Patil
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Prashant Kumar Modi
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Mrudula Kinarulla Kandiyil
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Sandeep Kasaragod
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Sneha M. Pinto
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Nandita Tanneru
- CSIR-Centre
for Cellular and Molecular Biology, Hyderabad 500007, Telangana, India
| | - Puran Singh Sijwali
- CSIR-Centre
for Cellular and Molecular Biology, Hyderabad 500007, Telangana, India
- Academy
of Scientific and Innovative Research, Ghaziabad 201002, Uttar Pradesh, India
| | | |
Collapse
|
41
|
A systematic evaluation of yeast sample preparation protocols for spectral identifications, proteome coverage and post-isolation modifications. J Proteomics 2022; 261:104576. [DOI: 10.1016/j.jprot.2022.104576] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/17/2022] [Accepted: 03/17/2022] [Indexed: 11/20/2022]
|
42
|
The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms. Comput Struct Biotechnol J 2022; 20:1402-1412. [PMID: 35386104 PMCID: PMC8956878 DOI: 10.1016/j.csbj.2022.03.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 01/24/2023] Open
Abstract
Most correct de novo peptides have ⩽1 missing fragmentation cleavages. DeepNovo outperforms Novor for peptide accuracy for both data types. Novor excels at amino acid recall when many fragmentation cleavages are missing. Deep learning allows DeepNovo to predict amino acids without adjacent peaks.
Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations.
Collapse
|
43
|
Brady MM, Meyer AS. Cataloguing the proteome: Current developments in single-molecule protein sequencing. BIOPHYSICS REVIEWS 2022; 3:011304. [PMID: 38505228 PMCID: PMC10903494 DOI: 10.1063/5.0065509] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 01/13/2022] [Indexed: 03/21/2024]
Abstract
The cellular proteome is complex and dynamic, with proteins playing a critical role in cell-level biological processes that contribute to homeostasis, stimuli response, and disease pathology, among others. As such, protein analysis and characterization are of extreme importance in both research and clinical settings. In the last few decades, most proteomics analysis has relied on mass spectrometry, affinity reagents, or some combination thereof. However, these techniques are limited by their requirements for large sample amounts, low resolution, and insufficient dynamic range, making them largely insufficient for the characterization of proteins in low-abundance or single-cell proteomic analysis. Despite unique technical challenges, several single-molecule protein sequencing (SMPS) technologies have been proposed in recent years to address these issues. In this review, we outline several approaches to SMPS technologies and discuss their advantages, limitations, and potential contributions toward an accurate, sensitive, and high-throughput platform.
Collapse
Affiliation(s)
- Morgan M. Brady
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Anne S. Meyer
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
44
|
Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, Kundu D, Prakash A, Frericks-Zipper A, Eisenacher M, Walzer M, Wang S, Brazma A, Vizcaíno J. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 2022; 50:D543-D552. [PMID: 34723319 PMCID: PMC8728295 DOI: 10.1093/nar/gkab1038] [Citation(s) in RCA: 3825] [Impact Index Per Article: 1275.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 12/12/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David García-Seisdedos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anika Frericks-Zipper
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
45
|
To PKP, Wu L, Chan CM, Hoque A, Lam H. ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics. J Proteome Res 2021; 20:5359-5367. [PMID: 34734728 DOI: 10.1021/acs.jproteome.1c00485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Modern shotgun proteomics experiments generate gigabytes of spectra every hour, only a fraction of which were utilized to form biological conclusions. Instead of being stored as flat files in public data repositories, this large amount of data can be better organized to facilitate data reuse. Clustering these spectra by similarity can be helpful in building high-quality spectral libraries, correcting identification errors, and highlighting frequently observed but unidentified spectra. However, large-scale clustering is time-consuming. Here, we present ClusterSheep, a method utilizing Graphics Processing Units (GPUs) to accelerate the process. Unlike previously proposed algorithms for this purpose, our method performs true pairwise comparison of all spectra within a precursor mass-to-charge ratio tolerance, thereby preserving the full cluster structures. ClusterSheep was benchmarked against previously reported clustering tools, MS-Cluster, MaRaCluster, and msCRUSH. The software tool also functions as an interactive visualization tool with a persistent state, enabling the user to explore the resulting clusters visually and retrieve the clustering results as desired.
Collapse
Affiliation(s)
- Paul Ka Po To
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Long Wu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Chak Ming Chan
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Ayman Hoque
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
46
|
Hruska M, Holub D. Evaluation of an integrative Bayesian peptide detection approach on a combinatorial peptide library. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2021; 27:217-234. [PMID: 34989269 DOI: 10.1177/14690667211066725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detection of peptides lies at the core of bottom-up proteomics analyses. We examined a Bayesian approach to peptide detection, integrating match-based models (fragments, retention time, isotopic distribution, and precursor mass) and peptide prior probability models under a unified probabilistic framework. To assess the relevance of these models and their various combinations, we employed a complete- and a tail-complete search of a low-precursor-mass synthetic peptide library based on oncogenic KRAS peptides. The fragment match was by far the most informative match-based model, while the retention time match was the only remaining such model with an appreciable impact--increasing correct detections by around 8 %. A peptide prior probability model built from a reference proteome greatly improved the detection over a uniform prior, essentially transforming de novo sequencing into a reference-guided search. The knowledge of a correct sequence tag in advance to peptide-spectrum matching had only a moderate impact on peptide detection unless the tag was long and of high certainty. The approach also derived more precise error rates on the analyzed combinatorial peptide library than those estimated using PeptideProphet and Percolator, showing its potential applicability for the detection of homologous peptides. Although the approach requires further computational developments for routine data analysis, it illustrates the value of peptide prior probabilities and presents a Bayesian approach for their incorporation into peptide detection.
Collapse
Affiliation(s)
- Miroslav Hruska
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
- Department of Computer Science, Faculty of Science, 98735Palacky University, Olomouc, Czech Republic
| | - Dusan Holub
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
| |
Collapse
|
47
|
Stieglitz F, Gerhard R, Pich A. The Binary Toxin of Clostridioides difficile Alters the Proteome and Phosphoproteome of HEp-2 Cells. Front Microbiol 2021; 12:725612. [PMID: 34594315 PMCID: PMC8477661 DOI: 10.3389/fmicb.2021.725612] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 08/09/2021] [Indexed: 12/03/2022] Open
Abstract
Clostridioides difficile is a major cause of nosocomial infection worldwide causing antibiotic-associated diarrhea and some cases are leading to pseudomembranous colitis. The main virulence factors are toxin A and toxin B. Hypervirulent strains of C. difficile are linked to higher mortality rates and most of these strains produce additionally the C. difficile binary toxin (CDT) that possesses two subunits, CDTa and CDTb. The latter is responsible for binding and transfer of CDTa into the cytoplasm of target cells; CDTa is an ADP ribosyltransferase catalyzing the modification of actin fibers that disturbs the actin vs microtubule balance and induces microtubule-based protrusions of the cell membrane increasing the adherence of C. difficile. The underlying mechanisms remain elusive. Thus, we performed a screening experiment using MS-based proteomics and phosphoproteomics techniques. Epithelial Hep-2 cells were treated with CDTa and CDTb in a multiplexed study for 4 and 8 h. Phosphopeptide enrichment was performed using affinity chromatography with TiO2 and Fe-NTA; for quantification, a TMT-based approach and DDA measurements were used. More than 4,300 proteins and 5,600 phosphosites were identified and quantified at all time points. Although only moderate changes were observed on proteome level, the phosphorylation level of nearly 1,100 phosphosites responded to toxin treatment. The data suggested that CSNK2A1 might act as an effector kinase after treatment with CDT. Additionally, we confirmed ADP-ribosylation on Arg-177 of actin and the kinetic of this modification for the first time.
Collapse
Affiliation(s)
- Florian Stieglitz
- Institute of Toxicology, Hannover Medical School, Hanover, Germany.,Core Facility Proteomics, Hannover Medical School, Hanover, Germany
| | - Ralf Gerhard
- Institute of Toxicology, Hannover Medical School, Hanover, Germany
| | - Andreas Pich
- Institute of Toxicology, Hannover Medical School, Hanover, Germany.,Core Facility Proteomics, Hannover Medical School, Hanover, Germany
| |
Collapse
|
48
|
Remoroza CA, Burke MC, Liu Y, Mirokhin YA, Tchekhovskoi DV, Yang X, Stein SE. Representing and Comparing Site-Specific Glycan Abundance Distributions of Glycoproteins. J Proteome Res 2021; 20:4475-4486. [PMID: 34327998 PMCID: PMC9830564 DOI: 10.1021/acs.jproteome.1c00442] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
A method for representing and comparing distributions of N-linked glycans located at specific sites on proteins is presented. The representation takes the form of a simple mass spectrum for a given peptide sequence, with each peak corresponding to a different glycopeptide. The mass (in place of m/z) of each peak is that of the glycan mass, and its abundance corresponds to its relative abundance in the electrospray MS1 spectrum. This provides a facile means of representing all identifiable glycopeptides arising from a single protein "sequon" on a specific sequence, thereby enabling the comparison and searching of these distributions as routinely done for mass spectra. Likewise, these reference glycopeptide abundance distribution spectra (GADS) can be stored in searchable libraries. A set of such libraries created from available data is provided along with an adapted version of the widely used NIST-MS library-search software. Since GADS contain only MS1 abundances and identifications, they are equally suitable for expressing collision-induced fragmentation and electron-transfer dissociation determinations of glycopeptide identity. Comparisons of GADS for N-glycosylated sites on several proteins, especially the SARS-CoV-2 spike protein, demonstrate the potential reproducibility of GADS and their utility for comparing site-specific distributions.
Collapse
|
49
|
Abstract
The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can be used as either a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published data set with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows, and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at https://github.com/wfondrie/ppx.
Collapse
Affiliation(s)
- William E Fondrie
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
50
|
Bittremieux W, Laukens K, Noble WS, Dorrestein PC. Large-scale tandem mass spectrum clustering using fast nearest neighbor searching. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021:e9153. [PMID: 34169593 PMCID: PMC8709870 DOI: 10.1002/rcm.9153] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 06/21/2021] [Accepted: 06/21/2021] [Indexed: 05/27/2023]
Abstract
RATIONALE Advanced algorithmic solutions are necessary to process the ever-increasing amounts of mass spectrometry data that are being generated. In this study, we describe the falcon spectrum clustering tool for efficient clustering of millions of MS/MS spectra. METHODS falcon succeeds in efficiently clustering large amounts of mass spectral data using advanced techniques for fast spectrum similarity searching. First, high-resolution spectra are binned and converted to low-dimensional vectors using feature hashing. Next, the spectrum vectors are used to construct nearest neighbor indexes for fast similarity searching. The nearest neighbor indexes are used to efficiently compute a sparse pairwise distance matrix without having to exhaustively perform all pairwise spectrum comparisons within the relevant precursor mass tolerance. Finally, density-based clustering is performed to group similar spectra into clusters. RESULTS Several state-of-the-art spectrum clustering tools were evaluated using a large draft human proteome data set consisting of 25 million spectra, indicating that alternative tools produce clustering results with different characteristics. Notably, falcon generates larger highly pure clusters than alternative tools, leading to a larger reduction in data volume without the loss of relevant information for more efficient downstream processing. CONCLUSIONS falcon is a highly efficient spectrum clustering tool, which is publicly available as an open source under the permissive BSD license at https://github.com/bittremieux/falcon.
Collapse
Affiliation(s)
- Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States
| | - Pieter C Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States
| |
Collapse
|