1
|
Perez-Riverol Y, Bittremieux W, Noble WS, Martens L, Bilbao A, Lazear MR, Grüning B, Katz DS, MacCoss MJ, Dai C, Eng JK, Bouwmeester R, Shortreed MR, Audain E, Sachsenberg T, Van Goey J, Wallmann G, Wen B, Käll L, Fondrie WE. Open-Source and FAIR Research Software for Proteomics. J Proteome Res 2025; 24:2222-2234. [PMID: 40267229 PMCID: PMC12053954 DOI: 10.1021/acs.jproteome.4c01079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Revised: 03/14/2025] [Accepted: 04/11/2025] [Indexed: 04/25/2025]
Abstract
Scientific discovery relies on innovative software as much as experimental methods, especially in proteomics, where computational tools are essential for mass spectrometer setup, data analysis, and interpretation. Since the introduction of SEQUEST, proteomics software has grown into a complex ecosystem of algorithms, predictive models, and workflows, but the field faces challenges, including the increasing complexity of mass spectrometry data, limited reproducibility due to proprietary software, and difficulties integrating with other omics disciplines. Closed-source, platform-specific tools exacerbate these issues by restricting innovation, creating inefficiencies, and imposing hidden costs on the community. Open-source software (OSS), aligned with the FAIR Principles (Findable, Accessible, Interoperable, Reusable), offers a solution by promoting transparency, reproducibility, and community-driven development, which fosters collaboration and continuous improvement. In this manuscript, we explore the role of OSS in computational proteomics, its alignment with FAIR principles, and its potential to address challenges related to licensing, distribution, and standardization. Drawing on lessons from other omics fields, we present a vision for a future where OSS and FAIR principles underpin a transparent, accessible, and innovative proteomics community.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Genome
Campus, Cambridge CB10
1SD, U.K.
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - William S. Noble
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Lennart Martens
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Aivett Bilbao
- Environmental
Molecular Sciences Laboratory, Pacific Northwest
National Laboratory, Richland, Washington 99352, United States
- US
Department of Energy Agile BioFoundry, Emeryville, California 94608, United States
| | - Michael R. Lazear
- Belharra
Therapeutics, 3985 Sorrento
Valley Boulevard Suite C, San Diego, California 92121, United States
| | - Bjorn Grüning
- Bioinformatics
Group, Department of Computer Science, Albert-Ludwigs
University Freiburg, Freiburg 79110, Germany
| | - Daniel S. Katz
- National
Center for Supercomputing Applications & Siebel School of Computing
and Data Science & School of Information Sciences, University of Illinois Urbana−Champaign, Urbana, Illinois 61801, United States
| | - Michael J. MacCoss
- Department
of Genome Sciences, University of Washington, 3720 15th St. NE, Seattle, Washington 98195, United States
| | - Chengxin Dai
- State
Key Laboratory of Proteomics, Beijing Proteome Research Center, National
Center for Protein Sciences (Beijing), Beijing
Institute of Life Omics, Beijing 102206, China
| | - Jimmy K. Eng
- Proteomics
Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Michael R. Shortreed
- Department
of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Enrique Audain
- Institute
of Medical Genetics, University Medicine
Oldenburg, Carl von Ossietzky University, Oldenburg 26129, Germany
| | - Timo Sachsenberg
- Department
of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany
| | | | - Georg Wallmann
- Proteomics
and Signal Transduction, Max Planck Institute
of Biochemistry, Martinsried 82152, Germany
| | - Bo Wen
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Lukas Käll
- Science
for Life Laboratory, School of Engineering Sciences in Chemistry,
Biotechnology and Health, KTH Royal Institute
of Technology, Stockholm 17165, Sweden
| | | |
Collapse
|
2
|
Deutsch EW, Mendoza L, Moritz RL. Quetzal: Comprehensive Peptide Fragmentation Annotation and Visualization. J Proteome Res 2025; 24:2196-2204. [PMID: 40111914 DOI: 10.1021/acs.jproteome.5c00092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Proteomics data-dependent acquisition data sets collected with high-resolution mass-spectrometry (MS) can achieve very high-quality results, but nearly every analysis yields results that are thresholded at some accepted false discovery rate, meaning that a substantial number of results are incorrect. For study conclusions that rely on a small number of peptide-spectrum matches being correct, it is thus important to examine at least some crucial spectra to ensure that they are not one of the incorrect identifications. We present Quetzal, a peptide fragment ion spectrum annotation tool to assist researchers in annotating and examining such spectra to ensure that they correctly support study conclusions. We describe how Quetzal annotates spectra using the new Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) mzPAF standard for fragment ion peak annotation, including the Python-based code, a web-service end point that provides annotation services, and a web-based application for annotating spectra and producing publication-quality figures. We illustrate its functionality with several annotated spectra of varying complexity. Quetzal provides easily accessible functionality that can assist in the effort to ensure and demonstrate that crucial spectra support study conclusions. Quetzal is publicly available at https://proteomecentral.proteomexchange.org/quetzal/.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
3
|
Petrova B, Guler AT. Recent Developments in Single-Cell Metabolomics by Mass Spectrometry─A Perspective. J Proteome Res 2025; 24:1493-1518. [PMID: 39437423 PMCID: PMC11976873 DOI: 10.1021/acs.jproteome.4c00646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Revised: 10/07/2024] [Accepted: 10/15/2024] [Indexed: 10/25/2024]
Abstract
Recent advancements in single-cell (sc) resolution analyses, particularly in sc transcriptomics and sc proteomics, have revolutionized our ability to probe and understand cellular heterogeneity. The study of metabolism through small molecules, metabolomics, provides an additional level of information otherwise unattainable by transcriptomics or proteomics by shedding light on the metabolic pathways that translate gene expression into functional outcomes. Metabolic heterogeneity, critical in health and disease, impacts developmental outcomes, disease progression, and treatment responses. However, dedicated approaches probing the sc metabolome have not reached the maturity of other sc omics technologies. Over the past decade, innovations in sc metabolomics have addressed some of the practical limitations, including cell isolation, signal sensitivity, and throughput. To fully exploit their potential in biological research, however, remaining challenges must be thoroughly addressed. Additionally, integrating sc metabolomics with orthogonal sc techniques will be required to validate relevant results and gain systems-level understanding. This perspective offers a broad-stroke overview of recent mass spectrometry (MS)-based sc metabolomics advancements, focusing on ongoing challenges from a biologist's viewpoint, aimed at addressing pertinent and innovative biological questions. Additionally, we emphasize the use of orthogonal approaches and showcase biological systems that these sophisticated methodologies are apt to explore.
Collapse
Affiliation(s)
- Boryana Petrova
- Medical
University of Vienna, Vienna 1090, Austria
- Department
of Pathology, Boston Children’s Hospital, Boston, Massachusetts 02115, United States
| | - Arzu Tugce Guler
- Department
of Pathology, Boston Children’s Hospital, Boston, Massachusetts 02115, United States
- Institute
for Experiential AI, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
4
|
Zhang W, Vesser M, Edwards N. GNOme, an ontology for glycan naming and subsumption. Anal Bioanal Chem 2025; 417:1961-1973. [PMID: 39921684 PMCID: PMC11961537 DOI: 10.1007/s00216-025-05757-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 01/16/2025] [Accepted: 01/20/2025] [Indexed: 02/10/2025]
Abstract
While GlyTouCan provides stable identifiers for referencing glycan structures, they are not organized semantically. GNOme, a glycan naming and subsumption ontology and a member of the OBOFoundry, organizes GlyTouCan accessions for automated reasoning and interactive browsing of glycan structures by subsumption. GNOme makes it quick and easy to discover glycans with a specific degree of characterization; provides a text-based table of common synonyms for specific structures and compositions; enumerates glycan subsumption relationships for automated reasoning; and assigns each glycan to well-defined categories based on their degree of characterization. As an OBOFoundry ontology, GNOme can be readily integrated with other OBOFoundry ontologies and standards initiatives that need to refer to glycans with various degrees of characterization. GNOme is integrated with GlyGen, a glycoinformatics knowledge base, providing navigation to "related glycans," and expanding the utility of species and glycan classification annotations. GNOme is available at https://gnome.glyomics.org/ and via GlyGen, the OBO Foundry, and GitHub.
Collapse
Affiliation(s)
- Wenjin Zhang
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, USA
| | - Michelle Vesser
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, USA
| | - Nathan Edwards
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, USA.
| |
Collapse
|
5
|
Orchard SE. What have Data Standards ever done for us? Mol Cell Proteomics 2025:100933. [PMID: 40024375 DOI: 10.1016/j.mcpro.2025.100933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 02/21/2025] [Accepted: 02/24/2025] [Indexed: 03/04/2025] Open
Abstract
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies for both the field of molecular interaction and that of mass spectrometry for more than 20 years. This review explores some of the ways that the proteomics community has benefitted from the development of community standards and takes a look at some of the tools and resources that have been improved or developed as a result of the work of the HUPO-PSI.
Collapse
Affiliation(s)
- S E Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
6
|
Perez-Riverol Y, Bandla C, Kundu D, Kamatchinathan S, Bai J, Hewapathirana S, John N, Prakash A, Walzer M, Wang S, Vizcaíno J. The PRIDE database at 20 years: 2025 update. Nucleic Acids Res 2025; 53:D543-D553. [PMID: 39494541 PMCID: PMC11701690 DOI: 10.1093/nar/gkae1011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/11/2024] [Accepted: 10/16/2024] [Indexed: 11/05/2024] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's leading mass spectrometry (MS)-based proteomics data repository and one of the founding members of the ProteomeXchange consortium. This manuscript summarizes the developments in PRIDE resources and related tools for the last three years. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 534 datasets per month. This has been possible thanks to continuous improvements in infrastructure such as a new file transfer protocol for very large datasets (Globus), a new data resubmission pipeline and an automatic dataset validation process. Additionally, we will highlight novel activities such as the availability of the PRIDE chatbot (based on the use of open-source Large Language Models), and our work to improve support for MS crosslinking datasets. Furthermore, we will describe how we have increased our efforts to reuse, reanalyze and disseminate high-quality proteomics data into added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Nithu Sara John
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
7
|
Marzano V, Levi Mortera S, Putignani L. Insights on Wet and Dry Workflows for Human Gut Metaproteomics. Proteomics 2024:e202400242. [PMID: 39740098 DOI: 10.1002/pmic.202400242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 12/10/2024] [Accepted: 12/11/2024] [Indexed: 01/02/2025]
Abstract
The human gut microbiota (GM) is a community of microorganisms that resides in the gastrointestinal (GI) tract. Recognized as a critical element of human health, the functions of the GM extend beyond GI well-being to influence overall systemic health and susceptibility to disease. Among the other omic sciences, metaproteomics highlights additional facets that make it a highly valuable discipline in the study of GM. Indeed, it allows the protein inventory of complex microbial communities. Proteins with associated taxonomic membership and function are identified and quantified from their constituent peptides by liquid chromatography coupled to mass spectrometry analyses and by querying specific databases (DBs). The aim of this review was to compile comprehensive information on metaproteomic studies of the human GM, with a focus on the bacterial component, to assist newcomers in understanding the methods and types of research conducted in this field. The review outlines key steps in a metaproteomic-based study, such as protein extraction, DB selection, and bioinformatic workflow. The importance of standardization is emphasized. In addition, a list of previously published studies is provided as hints for researchers interested in investigating the role of GM in health and disease states.
Collapse
Affiliation(s)
- Valeria Marzano
- Research Unit of Microbiome, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Stefano Levi Mortera
- Research Unit of Microbiome, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Lorenza Putignani
- Unit of Microbiomics and Research Unit of Microbiome, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| |
Collapse
|
8
|
He F, Aebersold R, Baker MS, Bian X, Bo X, Chan DW, Chang C, Chen L, Chen X, Chen YJ, Cheng H, Collins BC, Corrales F, Cox J, E W, Van Eyk JE, Fan J, Faridi P, Figeys D, Gao GF, Gao W, Gao ZH, Goda K, Goh WWB, Gu D, Guo C, Guo T, He Y, Heck AJR, Hermjakob H, Hunter T, Iyer NG, Jiang Y, Jimenez CR, Joshi L, Kelleher NL, Li M, Li Y, Lin Q, Liu CH, Liu F, Liu GH, Liu Y, Liu Z, Low TY, Lu B, Mann M, Meng A, Moritz RL, Nice E, Ning G, Omenn GS, Overall CM, Palmisano G, Peng Y, Pineau C, Poon TCW, Purcell AW, Qiao J, Reddel RR, Robinson PJ, Roncada P, Sander C, Sha J, Song E, Srivastava S, Sun A, Sze SK, Tang C, Tang L, Tian R, Vizcaíno JA, Wang C, Wang C, Wang X, Wang X, Wang Y, Weiss T, Wilhelm M, Winkler R, Wollscheid B, Wong L, Xie L, Xie W, Xu T, Xu T, Yan L, Yang J, Yang X, Yates J, Yun T, Zhai Q, Zhang B, Zhang H, Zhang L, Zhang L, Zhang P, Zhang Y, Zheng YZ, Zhong Q, et alHe F, Aebersold R, Baker MS, Bian X, Bo X, Chan DW, Chang C, Chen L, Chen X, Chen YJ, Cheng H, Collins BC, Corrales F, Cox J, E W, Van Eyk JE, Fan J, Faridi P, Figeys D, Gao GF, Gao W, Gao ZH, Goda K, Goh WWB, Gu D, Guo C, Guo T, He Y, Heck AJR, Hermjakob H, Hunter T, Iyer NG, Jiang Y, Jimenez CR, Joshi L, Kelleher NL, Li M, Li Y, Lin Q, Liu CH, Liu F, Liu GH, Liu Y, Liu Z, Low TY, Lu B, Mann M, Meng A, Moritz RL, Nice E, Ning G, Omenn GS, Overall CM, Palmisano G, Peng Y, Pineau C, Poon TCW, Purcell AW, Qiao J, Reddel RR, Robinson PJ, Roncada P, Sander C, Sha J, Song E, Srivastava S, Sun A, Sze SK, Tang C, Tang L, Tian R, Vizcaíno JA, Wang C, Wang C, Wang X, Wang X, Wang Y, Weiss T, Wilhelm M, Winkler R, Wollscheid B, Wong L, Xie L, Xie W, Xu T, Xu T, Yan L, Yang J, Yang X, Yates J, Yun T, Zhai Q, Zhang B, Zhang H, Zhang L, Zhang L, Zhang P, Zhang Y, Zheng YZ, Zhong Q, Zhu Y. π-HuB: the proteomic navigator of the human body. Nature 2024; 636:322-331. [PMID: 39663494 DOI: 10.1038/s41586-024-08280-5] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 10/23/2024] [Indexed: 12/13/2024]
Abstract
The human body contains trillions of cells, classified into specific cell types, with diverse morphologies and functions. In addition, cells of the same type can assume different states within an individual's body during their lifetime. Understanding the complexities of the proteome in the context of a human organism and its many potential states is a necessary requirement to understanding human biology, but these complexities can neither be predicted from the genome, nor have they been systematically measurable with available technologies. Recent advances in proteomic technology and computational sciences now provide opportunities to investigate the intricate biology of the human body at unprecedented resolution and scale. Here we introduce a big-science endeavour called π-HuB (proteomic navigator of the human body). The aim of the π-HuB project is to (1) generate and harness multimodality proteomic datasets to enhance our understanding of human biology; (2) facilitate disease risk assessment and diagnosis; (3) uncover new drug targets; (4) optimize appropriate therapeutic strategies; and (5) enable intelligent healthcare, thereby ushering in a new era of proteomics-driven phronesis medicine. This ambitious mission will be implemented by an international collaborative force of multidisciplinary research teams worldwide across academic, industrial and government sectors.
Collapse
Affiliation(s)
- Fuchu He
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China.
- International Academy of Phronesis Medicine (Guangdong), Guangdong, China.
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.
| | - Mark S Baker
- Macquarie Medical School, Macquarie University, Sydney, New South Wales, Australia
| | - Xiuwu Bian
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Third Military Medical University (Army Medical University) and Key Laboratory of Tumor Immunopathology, Ministry of Education of China, Chongqing, China
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, China
| | - Daniel W Chan
- Department of Pathology and The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA
| | - Cheng Chang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Xiangmei Chen
- Department of Nephrology, First Medical Center of Chinese PLA General Hospital, Nephrology Institute of the Chinese People's Liberation Army, State Key Laboratory of Kidney Diseases, National Clinical Research Center for Kidney Diseases, Beijing Key Laboratory of Kidney Disease Research, Beijing, China
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei, China
| | - Heping Cheng
- National Biomedical Imaging Center, State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking-Tsinghua Center for Life Sciences, College of Future Technology, Peking University, Beijing, China
| | - Ben C Collins
- School of Biological Sciences, Queen's University of Belfast, Belfast, UK
| | - Fernando Corrales
- Functional Proteomics Laboratory, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany
| | - Weinan E
- AI for Science Institute, Beijing, China
- Center for Machine Learning Research, Peking University, Beijing, China
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA, USA
| | - Jia Fan
- Department of Liver Surgery and Transplantation, Key Laboratory of Carcinogenesis and Cancer Invasion (Ministry of Education), Liver Cancer Institute, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Pouya Faridi
- Centre for Cancer Research, Hudson Institute of Medical Research, Clayton, Victoria, Australia
- Monash Proteomics and Metabolomics Platform, Department of Medicine, School of Clinical Sciences, Monash University, Clayton, Victoria, Australia
| | - Daniel Figeys
- School of Pharmaceutical Sciences and Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
| | - George Fu Gao
- The D. H. Chen School of Universal Health, Zhejiang University, Hangzhou, China
| | - Wen Gao
- Pengcheng Laboratory, Shenzhen, China
- School of Electronic Engineering and Computer Science, Peking University, Beijing, China
| | - Zu-Hua Gao
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Keisuke Goda
- Department of Chemistry, University of Tokyo, Tokyo, Japan
- Department of Bioengineering, University of California, Los Angeles, California, USA
- Institute of Technological Sciences, Wuhan University, Wuhan, Hubei, China
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Dongfeng Gu
- School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Changjiang Guo
- Department of Nutrition, Tianjin Institute of Environmental and Operational Medicine, Tianjin, China
| | - Tiannan Guo
- School of Medicine, Westlake University, Hangzhou, China
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, China
| | - Yuezhong He
- International Academy of Phronesis Medicine (Guangdong), Guangdong, China
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, the Netherlands
- Netherlands Proteomics Center, Utrecht, the Netherlands
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Tony Hunter
- Molecular and Cell Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Narayanan Gopalakrishna Iyer
- Department of Head & Neck Surgery, Division of Surgery & Surgical Oncology, Division of Medical Sciences, National Cancer Centre Singapore, Singapore, Singapore
| | - Ying Jiang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Connie R Jimenez
- OncoProteomics Laboratory, Department of Medical Oncology, Cancer Center Amsterdam, Amsterdam UMC, Amsterdam, the Netherlands
| | - Lokesh Joshi
- Advanced Glycoscience Research Cluster, School of Biological and Chemical Sciences, University of Galway, Galway, Ireland
| | - Neil L Kelleher
- Departments of Molecular Biosciences, Departments of Chemistry, Northwestern University, Evanston, IL, USA
| | - Ming Li
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
- Central China Institute of Artificial Intelligence, Henan, China
| | - Yang Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Qingsong Lin
- Department of Biological Sciences, Faculty of Science, National University of Singapore, Singapore, Singapore
| | - Cui Hua Liu
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Fan Liu
- Department of Structural Biology, Leibniz-Forschungsinstitut für MolekularePharmakologie (FMP), Berlin, Germany
| | - Guang-Hui Liu
- State Key Laboratory of Membrane Biology, Key Laboratory of Organ Regeneration and Reconstruction, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Yansheng Liu
- Cancer Biology Institute, Yale University School of Medicine, West Haven, CT, USA
| | - Zhihua Liu
- State Key Laboratory of Molecular Oncology, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Ben Lu
- Department of Critical Care Medicine and Hematology, The Third Xiangya Hospital, Central South University; Department of Hematology and Critical Care Medicine, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Anming Meng
- School of Life Sciences, Tsinghua University, Tsinghua-Peking Center for Life Sciences, Beijing, China
| | | | - Edouard Nice
- Clinical Biomarker Discovery and Validation, Monash University, Clayton, Victoria, Australia
| | - Guang Ning
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Shanghai National Clinical Research Center for Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai, China
- Shanghai Key Laboratory for Endocrine Tumor, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Gilbert S Omenn
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Christopher M Overall
- Department of Oral Biological and Medical Sciences, University of British Columbia, Vancouver, British Columbia, Canada
- Yonsei Frontier Lab, Yonsei University, Seoul, Republic of Korea
| | - Giuseppe Palmisano
- Glycoproteomics Laboratory, Department of Parasitology, University of São Paulo, Sao Paulo, Brazil
| | - Yaojin Peng
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Beijing Institute for Stem Cell and Regenerative Medicine, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Charles Pineau
- Institut de Recherche en Santé Environnement et Travail, Univ. Rennes, Inserm, EHESP, Irset, Rennes, France
| | - Terence Chuen Wai Poon
- Pilot Laboratory, MOE Frontier Science Centre for Precision Oncology, Centre for Precision Medicine Research and Training, Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Macau, China
| | - Anthony W Purcell
- Infection and Immunity Program and Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Clayton, Victoria, Australia
| | - Jie Qiao
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
| | - Roger R Reddel
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales, Australia
| | - Phillip J Robinson
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales, Australia
| | - Paola Roncada
- Department of Health Sciences, University Magna Græcia of Catanzaro, Catanzaro, Italy
| | - Chris Sander
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jiahao Sha
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Erwei Song
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Medical Research Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China
- Breast Tumor Center, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China
| | | | - Aihua Sun
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Siu Kwan Sze
- Department of Health Sciences, Faculty of Applied Health Sciences, Brock University, St. Catharines, Ontario, Canada
| | - Chao Tang
- Center for Quantitative Biology, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China
| | - Liujun Tang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Ruijun Tian
- Department of Chemistry, Southern University of Science and Technology, Shenzhen, China
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Chanjuan Wang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Chen Wang
- State Key Laboratory of Respiratory Health and Multimorbidity, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing, China
- Department of Pulmonary and Critical Care Medicine, Center of Respiratory Medicine, National Clinical Research Center for Respiratory Diseases, China-Japan Friendship Hospital, Beijing, China
| | - Xiaowen Wang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Xinxing Wang
- Department of Nutrition, Tianjin Institute of Environmental and Operational Medicine, Tianjin, China
| | - Yan Wang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Tobias Weiss
- Department of Neurology, Clinical Neuroscience Center, University Hospital Zurich and University of Zurich, Zurich, Switzerland
| | | | - Robert Winkler
- Advanced Genomics Unit, Center for Research and Advanced Studies, Irapuato, Mexico
| | - Bernd Wollscheid
- Institute of Translational Medicine, Department of Health Sciences and Technology, ETH Zurich, Zurich, Switzerland
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore, Singapore
- Department of Pathology, National University of Singapore, Singapore, Singapore
| | - Linhai Xie
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Wei Xie
- School of Life Sciences, Tsinghua University, Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Tao Xu
- Guangzhou National Laboratory, Guangzhou, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Tianhao Xu
- International Academy of Phronesis Medicine (Guangdong), Guangdong, China
| | - Liying Yan
- State Key Laboratory of Female Fertility Promotion, Center for Reproductive Medicine, Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing, China
| | - Jing Yang
- Guangzhou National Laboratory, Guangzhou, China
| | - Xiao Yang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - John Yates
- The Scripps Research Institute, La Jolla, CA, USA
| | - Tao Yun
- China Science and Technology Exchange Center, Beijing, China
| | - Qiwei Zhai
- CAS Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Hui Zhang
- Department of Pathology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Lihua Zhang
- State Key Laboratory of Medical Proteomics, National Chromatography R. & A. Center, CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China
| | - Lingqiang Zhang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Pingwen Zhang
- School of Mathematical Sciences, Peking University, Beijing, China
- Wuhan University, Wuhan, China
| | - Yukui Zhang
- State Key Laboratory of Medical Proteomics, National Chromatography R. & A. Center, CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, China
| | - Yu Zi Zheng
- International Academy of Phronesis Medicine (Guangdong), Guangdong, China
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Qing Zhong
- ProCan®, Children's Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, New South Wales, Australia
| | - Yunping Zhu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| |
Collapse
|
9
|
Klein J, Lam H, Mak TD, Bittremieux W, Perez-Riverol Y, Gabriels R, Shofstahl J, Hecht H, Binz PA, Kawano S, Van Den Bossche T, Carver J, Neely BA, Mendoza L, Suomi T, Claeys T, Payne T, Schulte D, Sun Z, Hoffmann N, Zhu Y, Neumann S, Jones AR, Bandeira N, Vizcaíno JA, Deutsch EW. The Proteomics Standards Initiative Standardized Formats for Spectral Libraries and Fragment Ion Peak Annotations: mzSpecLib and mzPAF. Anal Chem 2024; 96:18491-18501. [PMID: 39514576 PMCID: PMC11579979 DOI: 10.1021/acs.analchem.4c04091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2024] [Revised: 10/16/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024]
Abstract
Mass spectral libraries are collections of reference spectra, usually associated with specific analytes from which the spectra were generated, that are used for further downstream analysis of new spectra. There are many different formats used for encoding spectral libraries, but none have undergone a standardization process to ensure broad applicability to many applications. As part of the Human Proteome Organization Proteomics Standards Initiative (PSI), we have developed a standardized format for encoding spectral libraries, called mzSpecLib (https://psidev.info/mzSpecLib). It is primarily a data model that flexibly encodes metadata about the library entries using the extensible PSI-MS controlled vocabulary and can be encoded in and converted between different serialization formats. We have also developed a standardized data model and serialization for fragment ion peak annotations, called mzPAF (https://psidev.info/mzPAF). It is defined as a separate standard, since it may be used for other applications besides spectral libraries. The mzSpecLib and mzPAF standards are compatible with existing PSI standards such as ProForma 2.0 and the Universal Spectrum Identifier. The mzSpecLib and mzPAF standards have been primarily defined for peptides in proteomics applications with basic small molecule support. They could be extended in the future to other fields that need to encode spectral libraries for nonpeptidic analytes.
Collapse
Affiliation(s)
- Joshua Klein
- Program
for Bioinformatics, Boston University, Boston, Massachusetts 02215, United States
| | - Henry Lam
- Department
of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, 999077 Hong Kong, P. R. China
| | - Tytus D. Mak
- Mass
Spectrometry Data Center, National Institute
of Standards and Technology, 100 Bureau Drive, Gaithersburg, Maryland 20899, United States
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, 2020 Antwerpen, Belgium
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jim Shofstahl
- Thermo
Fisher
Scientific, 355 River Oaks Parkway, San Jose, California 95134, United States
| | - Helge Hecht
- RECETOX,
Faculty of Science, Masaryk University, Kotlářská 2, 60200 Brno, Czech Republic
| | | | - Shin Kawano
- Database
Center for Life Science, Joint Support Center
for Data Science Research, Research Organization of Information and
Systems, Chiba 277-0871, Japan
- School
of Frontier Engineering, Kitasato University, Sagamihara 252-0373, Japan
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Jeremy Carver
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
| | - Benjamin A. Neely
- National
Institute of Standards and Technology (NIST) Charleston, Charleston, South Carolina 29412, United States
| | - Luis Mendoza
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Tomi Suomi
- Turku Bioscience
Centre, University of Turku and Åbo
Akademi University, FI-20520 Turku, Finland
| | - Tine Claeys
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, 9052 Ghent, Belgium
| | - Thomas Payne
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Douwe Schulte
- Biomolecular
Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular
Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584,
CH, Utrecht, The
Netherlands
| | - Zhi Sun
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum
Jülich GmbH, 52428 Jülich, Germany
| | - Yunping Zhu
- National
Center for Protein Sciences (Beijing), Beijing
Institute of Lifeomics, #38, Life Science Park, Changping District, Beijing 102206, China
| | - Steffen Neumann
- Computational
Plant Biochemistry, Leibniz Institute of
Plant Biochemistry, 06120 Halle, Germany
- German
Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Andrew R. Jones
- Institute
of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 3BX, United Kingdom
| | - Nuno Bandeira
- Center
for Computational Mass Spectrometry, Department of Computer Science
and Engineering, University of California, San Diego, California 92093-0404, United
States
- Skaggs
School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
10
|
Price E, Feyertag F, Evans T, Miskin J, Mitrophanous K, Dikicioglu D. What is the real value of omics data? Enhancing research outcomes and securing long-term data excellence. Nucleic Acids Res 2024; 52:12130-12140. [PMID: 39417504 PMCID: PMC11551742 DOI: 10.1093/nar/gkae901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 09/24/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
A wealth of high-throughput biological data, of which omics constitute a significant fraction, has been made publicly available in repositories over the past decades. These data come in various formats and cover a range of species and research areas providing insights into the complexities of biological systems; the public repositories hosting these data serve as multifaceted resources. The potentially greater value of these data lies in their secondary utilization as the deployment of data science and artificial intelligence in biology advances. Here, we critically evaluate challenges in secondary data use, focusing on omics data of human embryonic kidney cell lines available in public repositories. The emerging issues are obstacles faced by secondary data users across diverse domains as they concern platforms and repositories, which accept deposition of data irrespective of their species type. The evolving landscape of data-driven research in biology prompts re-evaluation of open access data curation and submission procedures to ensure that these challenges do not impede novel research opportunities through data exploitation. This paper aims to draw attention to widespread issues with data reporting and encourages data owners to meticulously curate submissions to maximize not only their immediate research impact but also the long-term legacy of datasets.
Collapse
Affiliation(s)
- Eva Price
- Department of Biochemical Engineering, University College London, Gower Street, London WC1E 6BT, UK
| | - Felix Feyertag
- Oxford Biomedica (UK) Ltd, Windrush Court, Transport Way, Oxford OX4 6LT, UK
| | - Thomas Evans
- Oxford Biomedica (UK) Ltd, Windrush Court, Transport Way, Oxford OX4 6LT, UK
| | - James Miskin
- Oxford Biomedica (UK) Ltd, Windrush Court, Transport Way, Oxford OX4 6LT, UK
| | | | - Duygu Dikicioglu
- Department of Biochemical Engineering, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
11
|
Bai J, Kamatchinathan S, Kundu DJ, Bandla C, Vizcaíno JA, Perez-Riverol Y. Open-source large language models in action: A bioinformatics chatbot for PRIDE database. Proteomics 2024; 24:e2400005. [PMID: 38556628 DOI: 10.1002/pmic.202400005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 04/02/2024]
Abstract
We here present a chatbot assistant infrastructure (https://www.ebi.ac.uk/pride/chatbot/) that simplifies user interactions with the PRIDE database's documentation and dataset search functionality. The framework utilizes multiple Large Language Models (LLM): llama2, chatglm, mixtral (mistral), and openhermes. It also includes a web service API (Application Programming Interface), web interface, and components for indexing and managing vector databases. An Elo-ranking system-based benchmark component is included in the framework as well, which allows for evaluating the performance of each LLM and for improving PRIDE documentation. The chatbot not only allows users to interact with PRIDE documentation but can also be used to search and find PRIDE datasets using an LLM-based recommendation system, enabling dataset discoverability. Importantly, while our infrastructure is exemplified through its application in the PRIDE database context, the modular and adaptable nature of our approach positions it as a valuable tool for improving user experiences across a spectrum of bioinformatics and proteomics tools and resources, among other domains. The integration of advanced LLMs, innovative vector-based construction, the benchmarking framework, and optimized documentation collectively form a robust and transferable chatbot assistant infrastructure. The framework is open-source (https://github.com/PRIDE-Archive/pride-chatbot).
Collapse
Affiliation(s)
- Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
12
|
Moxon JV, Pretorius C, Trollope AF, Mittal P, Klingler-Hoffmann M, Hoffmann P, Golledge J. A systematic review and in silico analysis of studies investigating the ischemic penumbra proteome in animal models of experimental stroke. J Cereb Blood Flow Metab 2024; 44:1709-1722. [PMID: 38639008 PMCID: PMC11504113 DOI: 10.1177/0271678x241248502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 03/13/2024] [Accepted: 03/19/2024] [Indexed: 04/20/2024]
Abstract
Ischaemic stroke results in the formation of a cerebral infarction bordered by an ischaemic penumbra. Characterising the proteins within the ischaemic penumbra may identify neuro-protective targets and novel circulating markers to improve patient care. This review assessed data from studies using proteomic platforms to compare ischaemic penumbra tissues to controls following experimental stroke in animal models. Proteins reported to differ significantly between penumbra and control tissues were analysed in silico to identify protein-protein interactions and over-represented pathways. Sixteen studies using rat (n = 12), mouse (n = 2) or primate (n = 2) models were included. Heterogeneity in the design of the studies and definition of the penumbra were observed. Analyses showed high abundance of p53 in the penumbra within 24 hours of permanent ischaemic stroke and was implicated in driving apoptosis, cell cycle progression, and ATM- MAPK- and p53- signalling. Between 1 and 7 days after stroke there were changes in the abundance of proteins involved in the complement and coagulation pathways. Favourable recovery 1 month after stroke was associated with an increase in the abundance of proteins involved in wound healing. Poor recovery was associated with increases in prostaglandin signalling. Findings suggest that p53 may be a target for novel therapeutics for ischaemic stroke.
Collapse
Affiliation(s)
- Joseph V Moxon
- Queensland Research Centre for Peripheral Vascular Disease, College of Medicine and Dentistry, James Cook University, Townsville, Australia
- Australian Institute of Tropical Health and Medicine, James Cook University, Townsville, Australia
- Centre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, Australia
- College of Medicine and Dentistry, James Cook University, Townsville, Australia
| | - Cornea Pretorius
- Townsville University Hospital, Angus Smith Drive, Douglas, Townsville, Australia
| | - Alexandra F Trollope
- Australian Institute of Tropical Health and Medicine, James Cook University, Townsville, Australia
- College of Medicine and Dentistry, James Cook University, Townsville, Australia
| | - Parul Mittal
- Mass Spectrometry and Proteomics Group, UniSA Clinical and Health Sciences, University of South Australia, Adelaide, Australia
| | - Manuela Klingler-Hoffmann
- Mass Spectrometry and Proteomics Group, UniSA Clinical and Health Sciences, University of South Australia, Adelaide, Australia
| | - Peter Hoffmann
- Mass Spectrometry and Proteomics Group, UniSA Clinical and Health Sciences, University of South Australia, Adelaide, Australia
| | - Jonathan Golledge
- Queensland Research Centre for Peripheral Vascular Disease, College of Medicine and Dentistry, James Cook University, Townsville, Australia
- Australian Institute of Tropical Health and Medicine, James Cook University, Townsville, Australia
- Department of Vascular and Endovascular Surgery, Townsville University Hospital, Townsville, Australia
| |
Collapse
|
13
|
Dens C, Adams C, Laukens K, Bittremieux W. Machine Learning Strategies to Tackle Data Challenges in Mass Spectrometry-Based Proteomics. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:2143-2155. [PMID: 39074335 DOI: 10.1021/jasms.4c00180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/31/2024]
Abstract
In computational proteomics, machine learning (ML) has emerged as a vital tool for enhancing data analysis. Despite significant advancements, the diversity of ML model architectures and the complexity of proteomics data present substantial challenges in the effective development and evaluation of these tools. Here, we highlight the necessity for high-quality, comprehensive data sets to train ML models and advocate for the standardization of data to support robust model development. We emphasize the instrumental role of key data sets like ProteomeTools and MassIVE-KB in advancing ML applications in proteomics and discuss the implications of data set size on model performance, highlighting that larger data sets typically yield more accurate models. To address data scarcity, we explore algorithmic strategies such as self-supervised pretraining and multitask learning. Ultimately, we hope that this discussion can serve as a call to action for the proteomics community to collaborate on data standardization and collection efforts, which are crucial for the sustainable advancement and refinement of ML methodologies in the field.
Collapse
Affiliation(s)
- Ceder Dens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| | - Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
| |
Collapse
|
14
|
Combe CW, Kolbowski L, Fischer L, Koskinen V, Klein J, Leitner A, Jones AR, Vizcaíno JA, Rappsilber J. mzIdentML 1.3.0 - Essential progress on the support of crosslinking and other identifications based on multiple spectra. Proteomics 2024; 24:e2300385. [PMID: 39001627 DOI: 10.1002/pmic.202300385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/07/2024] [Accepted: 02/09/2024] [Indexed: 10/10/2024]
Abstract
The mzIdentML data format, originally developed by the Proteomics Standards Initiative in 2011, is the open XML data standard for peptide and protein identification results coming from mass spectrometry. We present mzIdentML version 1.3.0, which introduces new functionality and support for additional use cases. First of all, a new mechanism for encoding identifications based on multiple spectra has been introduced. Furthermore, the main mzIdentML specification document can now be supplemented by extension documents which provide further guidance for encoding specific use cases for different proteomics subfields. One extension document has been added, covering additional use cases for the encoding of crosslinked peptide identifications. The ability to add extension documents facilitates keeping the mzIdentML standard up to date with advances in the proteomics field, without having to change the main specification document. The crosslinking extension document provides further explanation of the crosslinking use cases already supported in mzIdentML version 1.2.0, and provides support for encoding additional scenarios that are critical to reflect developments in the crosslinking field and facilitate its integration in structural biology. These are: (i) support for cleavable crosslinkers, (ii) support for internally linked peptides, (iii) support for noncovalently associated peptides, and (iv) improved support for encoding scores and the corresponding thresholds.
Collapse
Affiliation(s)
- Colin W Combe
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | - Lars Kolbowski
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | - Lutz Fischer
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| | | | - Joshua Klein
- Program for Bioinformatics, Boston University, Boston, Massachusetts, USA
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zurich, Switzerland
| | - Andrew R Jones
- Department of Biochemistry & Systems Biology, University of Liverpool, Liverpool, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute, (EMBL-EBI), Hinxton, Cambridge, UK
| | - Juri Rappsilber
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
- Chair of Bioanalytics, Technische Universität Berlin, Berlin, Germany
| |
Collapse
|
15
|
Combe CW, Graham M, Kolbowski L, Fischer L, Rappsilber J. xiVIEW: Visualisation of Crosslinking Mass Spectrometry Data. J Mol Biol 2024; 436:168656. [PMID: 39237202 DOI: 10.1016/j.jmb.2024.168656] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/17/2024] [Accepted: 06/07/2024] [Indexed: 09/07/2024]
Abstract
Crosslinking mass spectrometry (MS) has emerged as an important technique for elucidating the in-solution structures of protein complexes and the topology of protein-protein interaction networks. However, the expanding user community lacked an integrated visualisation tool that helped them make use of the crosslinking data for investigating biological mechanisms. We addressed this need by developing xiVIEW, a web-based application designed to streamline crosslinking MS data analysis, which we present here. xiVIEW provides a user-friendly interface for accessing coordinated views of mass spectrometric data, network visualisation, annotations extracted from trusted repositories like UniProtKB, and available 3D structures. In accordance with recent recommendations from the crosslinking MS community, xiVIEW (i) provides a standards compliant parser to improve data integration and (ii) offers accessible visualisation tools. By promoting the adoption of standard file formats and providing a comprehensive visualisation platform, xiVIEW empowers both experimentalists and modellers alike to pursue their respective research interests. We anticipate that xiVIEW will advance crosslinking MS-inspired research, and facilitate broader and more effective investigations into complex biological systems.
Collapse
Affiliation(s)
- Colin W Combe
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK.
| | - Martin Graham
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK
| | - Lars Kolbowski
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK; Technische Universität Berlin, 10623 Berlin, Germany
| | - Lutz Fischer
- Technische Universität Berlin, 10623 Berlin, Germany.
| | - Juri Rappsilber
- University of Edinburgh, School of Biological Sciences, Edinburgh EH9 3JR, UK; Technische Universität Berlin, 10623 Berlin, Germany.
| |
Collapse
|
16
|
Kopczynski D, Ejsing CS, McDonald JG, Bamba T, Baker ES, Bertrand-Michel J, Brügger B, Coman C, Ellis SR, Garrett TJ, Griffiths WJ, Guan XL, Han X, Höring M, Holčapek M, Hoffmann N, Huynh K, Lehmann R, Jones JW, Kaddurah-Daouk R, Köfeler HC, Meikle PJ, Metz TO, O'Donnell VB, Saigusa D, Schwudke D, Shevchenko A, Torta F, Vizcaíno JA, Welti R, Wenk MR, Wolrab D, Xia Y, Ekroos K, Ahrends R, Liebisch G. The lipidomics reporting checklist a framework for transparency of lipidomic experiments and repurposing resource data. J Lipid Res 2024; 65:100621. [PMID: 39151590 PMCID: PMC11417233 DOI: 10.1016/j.jlr.2024.100621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 07/30/2024] [Accepted: 08/09/2024] [Indexed: 08/19/2024] Open
Abstract
The rapid increase in lipidomic studies has led to a collaborative effort within the community to establish standards and criteria for producing, documenting, and disseminating data. Creating a dynamic easy-to-use checklist that condenses key information about lipidomic experiments into common terminology will enhance the field's consistency, comparability, and repeatability. Here, we describe the structure and rationale of the established Lipidomics Minimal Reporting Checklist to increase transparency in lipidomics research.
Collapse
Affiliation(s)
- Dominik Kopczynski
- Department of Analytical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, Austria
| | - Christer S Ejsing
- Department of Biochemistry and Molecular Biology, VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Odense, Denmark; Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jeffrey G McDonald
- Center for Human Nutrition and Department of Molecular Genetics, UT Southwestern Medical Center, Dallas, Texas, USA
| | - Takeshi Bamba
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Erin S Baker
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Justine Bertrand-Michel
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Inserm I2MC, Toulouse, France
| | - Britta Brügger
- Heidelberg University Biochemistry Center (BZH), University of Heidelberg, Heidelberg, Germany
| | - Cristina Coman
- Department of Analytical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, Austria
| | - Shane R Ellis
- Molecular Horizons and School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, NSW, Australia
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, Florida, USA
| | | | - Xue Li Guan
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Xianlin Han
- Barshop Institute for Longevity and Aging Studies, University of Texas Health Science Center at San Antonio, San Antonio, Texas, USA
| | - Marcus Höring
- Institute of Clinical Chemistry and Laboratory Medicine, University of Regensburg, Regensburg, Germany
| | - Michal Holčapek
- Department of Analytical Chemistry, Faculty of Chemical Technology, University of Pardubice, Pardubice, Czech Republic
| | - Nils Hoffmann
- Institute for Bio- and Geosciences (IBG-5), Forschungszentrum Jülich GmbH, Jülich, Germany
| | - Kevin Huynh
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia; Department of Cardiovascular Research Translation and Implementation, La Trobe University, Bundoora, VIC, Australia
| | - Rainer Lehmann
- Institute for Clinical Chemistry and Pathobiochemistry, Department for Diagnostic Laboratory Medicine, University Hospital Tuebingen, Tuebingen, Germany
| | - Jace W Jones
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, Maryland, USA
| | - Rima Kaddurah-Daouk
- Department of Psychiatry and Behavioural Sciences, Duke University, Durham, North Carolina, USA; Duke Institute of Brain Sciences, Duke University, Durham, North Carolina, USA; Department of Medicine, Duke University, Durham, North Carolina, USA
| | - Harald C Köfeler
- Core Facility Mass Spectrometry and Lipidomics, ZMF, Medical University of Graz, Graz, Austria
| | - Peter J Meikle
- Baker Heart and Diabetes Institute, Melbourne, VIC, Australia; Department of Cardiovascular Research Translation and Implementation, La Trobe University, Bundoora, VIC, Australia
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Valerie B O'Donnell
- Systems Immunity Research Institute, School of Medicine, Cardiff University, Cardiff, UK
| | - Daisuke Saigusa
- Laboratory of Biomedical and Analytical Sciences, Faculty of Pharma-Science, Teikyo University, Tokyo, Japan
| | - Dominik Schwudke
- Division of Bioanalytical Chemistry, Research Center Borstel - Leibniz Lung Center, Borstel, Germany; German Center for Infection Research, Thematic Translational Unit Tuberculosis, Partner Site Hamburg-Lübeck-Borstel-Riems, Germany; German Center for Lung Research (DZL), Airway Research Center North (ARCN), Borstel, Germany
| | - Andrej Shevchenko
- Max-Planck-Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Federico Torta
- Singapore Lipidomics Incubator (SLING), Department of Biochemistry, YLL School of Medicine, National University of Singapore, Singapore, Singapore; Signature Research Program in Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, Singapore
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Ruth Welti
- Kansas Lipidomics Research Center, Division of Biology, Kansas State University, Manhattan, Kansas, USA
| | - Markus R Wenk
- Singapore Lipidomics Incubator (SLING), Department of Biochemistry, YLL School of Medicine, National University of Singapore, Singapore, Singapore
| | - Denise Wolrab
- Department of Analytical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, Austria
| | - Yu Xia
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology, Department of Chemistry, Tsinghua University, Beijing, China
| | - Kim Ekroos
- Lipidomics Consulting Ltd., Esbo, Finland.
| | - Robert Ahrends
- Department of Analytical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, Austria.
| | - Gerhard Liebisch
- Institute of Clinical Chemistry and Laboratory Medicine, University of Regensburg, Regensburg, Germany.
| |
Collapse
|
17
|
Oeljeklaus S, Sharma L, Bender J, Warscheid B. Mass spectrometry-based proteomics to study mutants and interactomes of mitochondrial translocation proteins. Methods Enzymol 2024; 707:101-152. [PMID: 39488372 DOI: 10.1016/bs.mie.2024.07.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2024]
Abstract
The multiple functions of mitochondria are governed by their proteome comprising 1000-1500 proteins depending on the organism. However, only few proteins are synthesized inside mitochondria, whereas most are "born" outside mitochondria. To reach their destined location, these mitochondrial proteins follow specific import routes established by a mitochondrial translocase network. A detailed understanding of the role and interplay of the different translocases is imperative to understand mitochondrial biology and how mitochondria are integrated into the cellular network. Mass spectrometry (MS) proved to be effective to study the translocase network regarding composition, functions, interplay, and cellular responses evoked by dysfunction. In this chapter, we provide protocols tailored to MS-enabled functional analysis of mutants and interactomes of mitochondrial translocation proteins. In the first part, we exemplify the MS-based proteomics analysis of translocation mutants for delineating the human mitochondrial importome following depletion of the central translocation protein TOMM40. The protocol comprises metabolic stable isotope labeling, TOMM40 knockdown, preparation of mitochondrial fractions, and sample preparation for liquid chromatography (LC)-MS. For deep MS analysis, prefractionation of peptide mixtures by high pH reversed-phase LC is described. In the second part, we outline an affinity purification MS approach to reveal the association of an orphaned protein with the translocase TIM23. The protocol covers FLAG-tag affinity purification of protein complexes from mitochondrial fractions and downstream sample preparation for interactome analysis. In the last unifying part, we describe methods for LC-MS, data processing, statistical analysis and visualization of quantitative MS data, and provide a Python code for effective, customizable analysis.
Collapse
Affiliation(s)
- Silke Oeljeklaus
- Biochemistry II, Theodor Boveri-Institute, Biocenter, University of Würzburg, Würzburg, Germany
| | - Lakshita Sharma
- Biochemistry II, Theodor Boveri-Institute, Biocenter, University of Würzburg, Würzburg, Germany
| | - Julian Bender
- Biochemistry II, Theodor Boveri-Institute, Biocenter, University of Würzburg, Würzburg, Germany
| | - Bettina Warscheid
- Biochemistry II, Theodor Boveri-Institute, Biocenter, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
18
|
Bielow C, Hoffmann N, Jimenez-Morales D, Van Den Bossche T, Vizcaíno JA, Tabb DL, Bittremieux W, Walzer M. Communicating Mass Spectrometry Quality Information in mzQC with Python, R, and Java. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:1875-1882. [PMID: 38918936 PMCID: PMC11311537 DOI: 10.1021/jasms.4c00174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 06/07/2024] [Accepted: 06/11/2024] [Indexed: 06/27/2024]
Abstract
Mass spectrometry is a powerful technique for analyzing molecules in complex biological samples. However, inter- and intralaboratory variability and bias can affect the data due to various factors, including sample handling and preparation, instrument calibration and performance, and data acquisition and processing. To address this issue, the Quality Control (QC) working group of the Human Proteome Organization's Proteomics Standards Initiative has established the standard mzQC file format for reporting and exchanging information relating to data quality. mzQC is based on the JavaScript Object Notation (JSON) format and provides a lightweight yet versatile file format that can be easily implemented in software. Here, we present open-source software libraries to process mzQC data in three programming languages: Python, using pymzqc; R, using rmzqc; and Java, using jmzqc. The libraries follow a common data model and provide shared functionalities, including the (de)serialization and validation of mzQC files. We demonstrate use of the software libraries in a workflow for extracting, analyzing, and visualizing QC metrics from different sources. Additionally, we show how these libraries can be integrated with each other, with existing software tools, and in automated workflows for the QC of mass spectrometry data. All software libraries are available as open source under the MS-Quality-Hub organization on GitHub (https://github.com/MS-Quality-Hub).
Collapse
Affiliation(s)
- Chris Bielow
- Bioinformatics
Solution Center, Institut für Mathematik und Informatik, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany
| | - Nils Hoffmann
- Institute
for Bio- and Geosciences (IBG-5), Forschungszentrum Jülich
GmbH, 52428 Jülich, Germany
| | - David Jimenez-Morales
- Department
of Medicine, Stanford University School
of Medicine, Stanford, California 94305, United States
| | - Tim Van Den Bossche
- Department
of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- VIB-UGent
Center for Medical Biotechnology, VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, EMBL-European
Bioinformatics Institute (EMBL-EBI),
Hinxton, Cambridge CB10 1SD, United Kingdom
| | - David L. Tabb
- European
Research Institute for the Biology of Ageing, University Medical Center Groningen, Groningen 9713 AV, The Netherlands
| | - Wout Bittremieux
- Department
of Computer Science, University of Antwerp, Antwerpen 2020, Belgium
| | - Mathias Walzer
- European
Molecular Biology Laboratory, EMBL-European
Bioinformatics Institute (EMBL-EBI),
Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
19
|
Yurkovich JT, Evans SJ, Rappaport N, Boore JL, Lovejoy JC, Price ND, Hood LE. The transition from genomics to phenomics in personalized population health. Nat Rev Genet 2024; 25:286-302. [PMID: 38093095 DOI: 10.1038/s41576-023-00674-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2023] [Indexed: 03/21/2024]
Abstract
Modern health care faces several serious challenges, including an ageing population and its inherent burden of chronic diseases, rising costs and marginal quality metrics. By assessing and optimizing the health trajectory of each individual using a data-driven personalized approach that reflects their genetics, behaviour and environment, we can start to address these challenges. This assessment includes longitudinal phenome measures, such as the blood proteome and metabolome, gut microbiome composition and function, and lifestyle and behaviour through wearables and questionnaires. Here, we review ongoing large-scale genomics and longitudinal phenomics efforts and the powerful insights they provide into wellness. We describe our vision for the transformation of the current health care from disease-oriented to data-driven, wellness-oriented and personalized population health.
Collapse
Affiliation(s)
- James T Yurkovich
- Phenome Health, Seattle, WA, USA
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA
| | - Simon J Evans
- Phenome Health, Seattle, WA, USA
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
| | - Noa Rappaport
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
- Institute for Systems Biology, Seattle, WA, USA
| | - Jeffrey L Boore
- Phenome Health, Seattle, WA, USA
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
| | - Jennifer C Lovejoy
- Phenome Health, Seattle, WA, USA
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA
- Institute for Systems Biology, Seattle, WA, USA
| | - Nathan D Price
- Institute for Systems Biology, Seattle, WA, USA
- Thorne HealthTech, New York, NY, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Leroy E Hood
- Phenome Health, Seattle, WA, USA.
- Center for Phenomic Health, The Buck Institute for Research on Aging, Novato, CA, USA.
- Institute for Systems Biology, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
- Department of Immunology, University of Washington, Seattle, WA, USA.
| |
Collapse
|
20
|
Gouveia GJ, Head T, Cheng LL, Clendinen CS, Cort JR, Du X, Edison AS, Fleischer CC, Hoch J, Mercaldo N, Pathmasiri W, Raftery D, Schock TB, Sumner LW, Takis PG, Copié V, Eghbalnia HR, Powers R. Perspective: use and reuse of NMR-based metabolomics data: what works and what remains challenging. Metabolomics 2024; 20:41. [PMID: 38480600 DOI: 10.1007/s11306-024-02090-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 01/12/2024] [Indexed: 04/20/2024]
Abstract
BACKGROUND The National Cancer Institute issued a Request for Information (RFI; NOT-CA-23-007) in October 2022, soliciting input on using and reusing metabolomics data. This RFI aimed to gather input on best practices for metabolomics data storage, management, and use/reuse. AIM OF REVIEW The nuclear magnetic resonance (NMR) Interest Group within the Metabolomics Association of North America (MANA) prepared a set of recommendations regarding the deposition, archiving, use, and reuse of NMR-based and, to a lesser extent, mass spectrometry (MS)-based metabolomics datasets. These recommendations were built on the collective experiences of metabolomics researchers within MANA who are generating, handling, and analyzing diverse metabolomics datasets spanning experimental (sample handling and preparation, NMR/MS metabolomics data acquisition, processing, and spectral analyses) to computational (automation of spectral processing, univariate and multivariate statistical analysis, metabolite prediction and identification, multi-omics data integration, etc.) studies. KEY SCIENTIFIC CONCEPTS OF REVIEW We provide a synopsis of our collective view regarding the use and reuse of metabolomics data and articulate several recommendations regarding best practices, which are aimed at encouraging researchers to strengthen efforts toward maximizing the utility of metabolomics data, multi-omics data integration, and enhancing the overall scientific impact of metabolomics studies.
Collapse
Affiliation(s)
- Goncalo Jorge Gouveia
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology, University of Maryland, Gudelsky Drive, Rockville, MD, 20850, USA
| | - Thomas Head
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- University of British Columbia, Kelowna, BC, V1V 1V7, Canada
| | - Leo L Cheng
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Pathology and Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Chaevien S Clendinen
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Earth and Biological Sciences Directorate, Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | - John R Cort
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Earth and Biological Sciences Directorate, Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | - Xiuxia Du
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9291 University City Blvd, Charlotte, NC, 28223, USA
| | - Arthur S Edison
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Biochemistry, University of Georgia, Athens, GA, USA
| | - Candace C Fleischer
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Jeffrey Hoch
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, 06030-3305, USA
| | - Nathaniel Mercaldo
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Wimal Pathmasiri
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Nutrition, School of Public Health, Nutrition Research Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Daniel Raftery
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Anesthesia and Pain Medicine, University of Washington, Seattle, WA, 98109, USA
| | - Tracey B Schock
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Chemical Sciences Division, National Institute of Standards and Technology (NIST), Charleston, SC, 29412, USA
| | - Lloyd W Sumner
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Biochemistry, MU Metabolomics Center, Bond Life Sciences Center, Interdisciplinary Plant Group, University of Missouri, Columbia, MO, 65211, USA
| | - Panteleimon G Takis
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Section of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London, SW7 2AZ, UK
- Department of Metabolism, Digestion and Reproduction, National Phenome Centre, Imperial College London, London, W12 0NN, UK
| | - Valérie Copié
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT, 59717-3400, USA
| | - Hamid R Eghbalnia
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, 06030-3305, USA
| | - Robert Powers
- Metabolomics Association of North America (MANA), NMR Special Interest Group, Edmonton, Canada.
- Department of Chemistry, Nebraska Center for Integrated Biomolecular Communication, University of Nebraska-Lincoln, 722 Hamilton Hall, Lincoln, NE, 68588-0304, USA.
| |
Collapse
|
21
|
Shome M, MacKenzie TMG, Subbareddy SR, Snyder MP. The Importance, Challenges, and Possible Solutions for Sharing Proteomics Data While Safeguarding Individuals' Privacy. Mol Cell Proteomics 2024; 23:100731. [PMID: 38331191 PMCID: PMC10915627 DOI: 10.1016/j.mcpro.2024.100731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 01/28/2024] [Accepted: 02/05/2024] [Indexed: 02/10/2024] Open
Abstract
Proteomics data sharing has profound benefits at the individual level as well as at the community level. While data sharing has increased over the years, mostly due to journal and funding agency requirements, the reluctance of researchers with regard to data sharing is evident as many shares only the bare minimum dataset required to publish an article. In many cases, proper metadata is missing, essentially making the dataset useless. This behavior can be explained by a lack of incentives, insufficient awareness, or a lack of clarity surrounding ethical issues. Through adequate training at research institutes, researchers can realize the benefits associated with data sharing and can accelerate the norm of data sharing for the field of proteomics, as has been the standard in genomics for decades. In this article, we have put together various repository options available for proteomics data. We have also added pros and cons of those repositories to facilitate researchers in selecting the repository most suitable for their data submission. It is also important to note that a few types of proteomics data have the potential to re-identify an individual in certain scenarios. In such cases, extra caution should be taken to remove any personal identifiers before sharing on public repositories. Data sets that will be useless without personal identifiers need to be shared in a controlled access repository so that only authorized researchers can access the data and personal identifiers are kept safe.
Collapse
Affiliation(s)
- Mahasish Shome
- Department of Genetics, Stanford University, Palo Alto, California, USA
| | - Tim M G MacKenzie
- Department of Genetics, Stanford University, Palo Alto, California, USA
| | | | - Michael P Snyder
- Department of Genetics, Stanford University, Palo Alto, California, USA.
| |
Collapse
|
22
|
Henke AN, Chilukuri S, Langan LM, Brooks BW. Reporting and reproducibility: Proteomics of fish models in environmental toxicology and ecotoxicology. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 912:168455. [PMID: 37979845 DOI: 10.1016/j.scitotenv.2023.168455] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/20/2023]
Abstract
Environmental toxicology and ecotoxicology research efforts are employing proteomics with fish models as New Approach Methodologies, along with in silico, in vitro and other omics techniques to elucidate hazards of toxicants and toxins. We performed a critical review of toxicology studies with fish models using proteomics and reported fundamental parameters across experimental design, sample preparation, mass spectrometry, and bioinformatics of fish, which represent alternative vertebrate models in environmental toxicology, and routinely studied animals in ecotoxicology. We observed inconsistencies in reporting and methodologies among experimental designs, sample preparations, data acquisitions and bioinformatics, which can affect reproducibility of experimental results. We identified a distinct need to develop reporting guidelines for proteomics use in environmental toxicology and ecotoxicology, increased QA/QC throughout studies, and method optimization with an emphasis on reducing inconsistencies among studies. Several recommendations are offered as logical steps to advance development and application of this emerging research area to understand chemical hazards to public health and the environment.
Collapse
Affiliation(s)
- Abigail N Henke
- Department of Biology, Baylor University Waco, TX, USA; Center for Reservoir and Aquatic Systems Research (CRASR), Baylor University Waco, TX, USA
| | | | - Laura M Langan
- Department of Environmental Science, Baylor University Waco, TX, USA; Center for Reservoir and Aquatic Systems Research (CRASR), Baylor University Waco, TX, USA.
| | - Bryan W Brooks
- Department of Environmental Science, Baylor University Waco, TX, USA; Center for Reservoir and Aquatic Systems Research (CRASR), Baylor University Waco, TX, USA.
| |
Collapse
|
23
|
Iacobescu M, Pop C, Uifălean A, Mogoşan C, Cenariu D, Zdrenghea M, Tănase A, Bergthorsson JT, Greiff V, Cenariu M, Iuga CA, Tomuleasa C, Tătaru D. Unlocking protein-based biomarker potential for graft-versus-host disease following allogenic hematopoietic stem cell transplants. Front Immunol 2024; 15:1327035. [PMID: 38433830 PMCID: PMC10904603 DOI: 10.3389/fimmu.2024.1327035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 02/01/2024] [Indexed: 03/05/2024] Open
Abstract
Despite the numerous advantages of allogeneic hematopoietic stem cell transplants (allo-HSCT), there exists a notable association with risks, particularly during the preconditioning period and predominantly post-intervention, exemplified by the occurrence of graft-versus-host disease (GVHD). Risk stratification prior to symptom manifestation, along with precise diagnosis and prognosis, relies heavily on clinical features. A critical imperative is the development of tools capable of early identification and effective management of patients undergoing allo-HSCT. A promising avenue in this pursuit is the utilization of proteomics-based biomarkers obtained from non-invasive biospecimens. This review comprehensively outlines the application of proteomics and proteomics-based biomarkers in GVHD patients. It delves into both single protein markers and protein panels, offering insights into their relevance in acute and chronic GVHD. Furthermore, the review provides a detailed examination of the site-specific involvement of GVHD. In summary, this article explores the potential of proteomics as a tool for timely and accurate intervention in the context of GVHD following allo-HSCT.
Collapse
Affiliation(s)
- Maria Iacobescu
- Department of Proteomics and Metabolomics, MEDFUTURE Research Center for Advanced Medicine, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Cristina Pop
- Department of Pharmacology, Physiology and Pathophysiology, Faculty of Pharmacy, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Alina Uifălean
- Department of Pharmaceutical Analysis, Faculty of Pharmacy, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Cristina Mogoşan
- Department of Pharmacology, Physiology and Pathophysiology, Faculty of Pharmacy, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Diana Cenariu
- Department of Translational Medicine, MEDFUTURE Research Center for Advanced Medicine, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Mihnea Zdrenghea
- Department of Hematology, Faculty of Medicine, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Alina Tănase
- Department of Stem Cell Transplantation, Fundeni Clinical Institute, Bucharest, Romania
| | - Jon Thor Bergthorsson
- Department of Laboratory Hematology, Stem Cell Research Unit, Biomedical Center, School of Health Sciences, University Iceland, Reykjavik, Iceland
| | - Victor Greiff
- Department of Immunology, University of Oslo, Oslo, Norway
| | - Mihai Cenariu
- Department of Animal Reproduction, University of Agricultural Sciences and Veterinary Medicine, Cluj-Napoca, Romania
| | - Cristina Adela Iuga
- Department of Proteomics and Metabolomics, MEDFUTURE Research Center for Advanced Medicine, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
- Department of Pharmaceutical Analysis, Faculty of Pharmacy, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Ciprian Tomuleasa
- Department of Translational Medicine, MEDFUTURE Research Center for Advanced Medicine, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
- Department of Hematology, Faculty of Medicine, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| | - Dan Tătaru
- Department of Internal Medicine, Faculty of Medicine, “Iuliu Hatieganu” University of Medicine and Pharmacy, Cluj-Napoca, Romania
| |
Collapse
|
24
|
Omenn GS, Lane L, Overall CM, Lindskog C, Pineau C, Packer NH, Cristea IM, Weintraub ST, Orchard S, Roehrl MHA, Nice E, Guo T, Van Eyk JE, Liu S, Bandeira N, Aebersold R, Moritz RL, Deutsch EW. The 2023 Report on the Proteome from the HUPO Human Proteome Project. J Proteome Res 2024; 23:532-549. [PMID: 38232391 PMCID: PMC11026053 DOI: 10.1021/acs.jproteome.3c00591] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Since 2010, the Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify the protein parts list and (2) to make proteomics an integral part of multiomics studies of human health and disease. The HPP relies on international collaboration, data sharing, standardized reanalysis of MS data sets by PeptideAtlas and MassIVE-KB using HPP Guidelines for quality assurance, integration and curation of MS and non-MS protein data by neXtProt, plus extensive use of antibody profiling carried out by the Human Protein Atlas. According to the neXtProt release 2023-04-18, protein expression has now been credibly detected (PE1) for 18,397 of the 19,778 neXtProt predicted proteins coded in the human genome (93%). Of these PE1 proteins, 17,453 were detected with mass spectrometry (MS) in accordance with HPP Guidelines and 944 by a variety of non-MS methods. The number of neXtProt PE2, PE3, and PE4 missing proteins now stands at 1381. Achieving the unambiguous identification of 93% of predicted proteins encoded from across all chromosomes represents remarkable experimental progress on the Human Proteome parts list. Meanwhile, there are several categories of predicted proteins that have proved resistant to detection regardless of protein-based methods used. Additionally there are some PE1-4 proteins that probably should be reclassified to PE5, specifically 21 LINC entries and ∼30 HERV entries; these are being addressed in the present year. Applying proteomics in a wide array of biological and clinical studies ensures integration with other omics platforms as reported by the Biology and Disease-driven HPP teams and the antibody and pathology resource pillars. Current progress has positioned the HPP to transition to its Grand Challenge Project focused on determining the primary function(s) of every protein itself and in networks and pathways within the context of human health and disease.
Collapse
Affiliation(s)
- Gilbert S. Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and University of Geneva, 1015 Lausanne, Switzerland
| | - Christopher M. Overall
- University of British Columbia, Vancouver, BC V6T 1Z4, Canada, Yonsei University Republic of Korea
| | | | - Charles Pineau
- University Rennes, Inserm U1085, Irset, 35042 Rennes, France
| | | | | | - Susan T. Weintraub
- University of Texas Health Science Center-San Antonio, San Antonio, Texas 78229-3900, United States
| | | | - Michael H. A. Roehrl
- Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, United States
| | | | - Tiannan Guo
- Westlake Center for Intelligent Proteomics, Westlake Laboratory, Westlake University, Hangzhou 310024, Zhejiang Province, China
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, Smidt Heart Institute, Cedars-Sinai Medical Center, 127 South San Vicente Boulevard, Pavilion, 9th Floor, Los Angeles, CA, 90048, United States
| | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, CA, 92093, United States
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology in ETH Zurich, 8092 Zurich, Switzerland
- University of Zurich, 8092 Zurich, Switzerland
| | - Robert L. Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
25
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
26
|
Uszkoreit J, Palmblad M, Schwämmle V. Tackling reproducibility: lessons for the proteomics community. Expert Rev Proteomics 2024; 21:9-11. [PMID: 38362700 DOI: 10.1080/14789450.2024.2320166] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/03/2024] [Indexed: 02/17/2024]
Affiliation(s)
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
27
|
Afridi R, Lee WH, Kim JH, Suk K. Utilizing databases for astrocyte secretome research. Expert Rev Proteomics 2023; 20:371-379. [PMID: 37978891 DOI: 10.1080/14789450.2023.2285311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/07/2023] [Indexed: 11/19/2023]
Abstract
INTRODUCTION Astrocytes are the most abundant cell type in the central nervous system (CNS). They play a pivotal role in supporting neuronal function and maintaining homeostasis by releasing a variety of bioactive proteins, collectively known as the astrocyte secretome. Investigating secretome provides insights into the molecular mechanisms underlying astrocyte function and dysfunction, as well as novel strategies to prevent and treat diseases affecting the CNS. AREAS COVERED Proteomics databases are a valuable resource for studying the role of astrocytes in healthy and diseased brain function, as they provide information about gene expression, protein expression, and cellular function. In this review, we discuss existing databases that are useful for astrocyte secretome research. EXPERT OPINION Astrocyte secretomics is a field that is rapidly progressing, yet the availability of dedicated databases is currently limited. To meet the increasing demand for comprehensive omics data in glia research, developing databases specifically focused on astrocyte secretome is crucial. Such databases would allow researchers to investigate the intricate molecular landscape of astrocytes and comprehend their involvement in diverse physiological and pathological processes. Expanding resources through the development of databases dedicated to the astrocyte secretome may facilitate further advancements in this field.
Collapse
Affiliation(s)
- Ruqayya Afridi
- Department of Pharmacology, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Won-Ha Lee
- Brain Science & Engineering Institute, Kyungpook National University, Daegu, Republic of Korea
- School of Life Sciences, BK21 Plus KNU Creative BioResearch Group, Kyungpook National University, Daegu, Republic of Korea
| | - Jong-Heon Kim
- Brain Science & Engineering Institute, Kyungpook National University, Daegu, Republic of Korea
| | - Kyoungho Suk
- Department of Pharmacology, BK21 Plus KNU Biomedical Convergence Program, School of Medicine, Kyungpook National University, Daegu, Republic of Korea
- Brain Science & Engineering Institute, Kyungpook National University, Daegu, Republic of Korea
| |
Collapse
|
28
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|