1
|
Vašíček J, Kuznetsova KG, Skiadopoulou D, Unger L, Chera S, Ghila LM, Bandeira N, Njølstad PR, Johansson S, Bruckner S, Käll L, Vaudel M. ProHap enables human proteomic database generation accounting for population diversity. Nat Methods 2025; 22:273-277. [PMID: 39653819 DOI: 10.1038/s41592-024-02506-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 10/10/2024] [Indexed: 02/12/2025]
Abstract
Amid the advances in genomics, the availability of large reference panels of human haplotypes is key to account for human diversity within and across populations. However, mass spectrometry-based proteomics does not benefit from this information. To address this gap, we introduce ProHap, a Python-based tool that constructs protein sequence databases from phased genotypes of reference panels. ProHap enables researchers to account for haplotype diversity in proteomic searches.
Collapse
Affiliation(s)
- Jakub Vašíček
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Ksenia G Kuznetsova
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Dafni Skiadopoulou
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Lucas Unger
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Simona Chera
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Luiza M Ghila
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, CA, USA
| | - Pål R Njølstad
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Children and Youth Clinic, Haukeland University Hospital, Bergen, Norway
| | - Stefan Johansson
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| | - Stefan Bruckner
- Institute for Visual and Analytic Computing, University of Rostock, Rostock, Germany
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm, Sweden
| | - Marc Vaudel
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway.
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
- Department of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, Oslo, Norway.
| |
Collapse
|
2
|
Perez-Riverol Y, Bandla C, Kundu D, Kamatchinathan S, Bai J, Hewapathirana S, John N, Prakash A, Walzer M, Wang S, Vizcaíno J. The PRIDE database at 20 years: 2025 update. Nucleic Acids Res 2025; 53:D543-D553. [PMID: 39494541 PMCID: PMC11701690 DOI: 10.1093/nar/gkae1011] [Citation(s) in RCA: 115] [Impact Index Per Article: 115.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/11/2024] [Accepted: 10/16/2024] [Indexed: 11/05/2024] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's leading mass spectrometry (MS)-based proteomics data repository and one of the founding members of the ProteomeXchange consortium. This manuscript summarizes the developments in PRIDE resources and related tools for the last three years. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 534 datasets per month. This has been possible thanks to continuous improvements in infrastructure such as a new file transfer protocol for very large datasets (Globus), a new data resubmission pipeline and an automatic dataset validation process. Additionally, we will highlight novel activities such as the availability of the PRIDE chatbot (based on the use of open-source Large Language Models), and our work to improve support for MS crosslinking datasets. Furthermore, we will describe how we have increased our efforts to reuse, reanalyze and disseminate high-quality proteomics data into added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Nithu Sara John
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
3
|
Zhu C, Liu LY, Ha A, Yamaguchi TN, Zhu H, Hugh-White R, Livingstone J, Patel Y, Kislinger T, Boutros PC. moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587261. [PMID: 38585946 PMCID: PMC10996593 DOI: 10.1101/2024.03.28.587261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Gene expression is a multi-step transformation of biological information from its storage form (DNA) into functional forms (protein and some RNAs). Regulatory activities at each step of this transformation multiply a single gene into a myriad of proteoforms. Proteogenomics is the study of how genomic and transcriptomic variation creates this proteomic diversity, and is limited by the challenges of modeling the complexities of gene-expression. We therefore created moPepGen, a graph-based algorithm that comprehensively generates non-canonical peptides in linear time. moPepGen works with multiple technologies, in multiple species and on all types of genetic and transcriptomic data. In human cancer proteomes, it enumerates previously unobservable noncanonical peptides arising from germline and somatic genomic variants, noncoding open reading frames, RNA fusions and RNA circularization. By enabling efficient detection and quantitation of previously hidden proteins in both existing and new proteomic data, moPepGen facilitates all proteogenomics applications. It is available at: https://github.com/uclahs-cds/package-moPepGen.
Collapse
Affiliation(s)
- Chenghao Zhu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
| | - Lydia Y. Liu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Annie Ha
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Takafumi N. Yamaguchi
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Helen Zhu
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Rupert Hugh-White
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Julie Livingstone
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Yash Patel
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Thomas Kislinger
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Paul C. Boutros
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
4
|
Piana D, Iavarone F, De Paolis E, Daniele G, Parisella F, Minucci A, Greco V, Urbani A. Phenotyping Tumor Heterogeneity through Proteogenomics: Study Models and Challenges. Int J Mol Sci 2024; 25:8830. [PMID: 39201516 PMCID: PMC11354793 DOI: 10.3390/ijms25168830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 07/31/2024] [Accepted: 08/06/2024] [Indexed: 09/02/2024] Open
Abstract
Tumor heterogeneity refers to the diversity observed among tumor cells: both between different tumors (inter-tumor heterogeneity) and within a single tumor (intra-tumor heterogeneity). These cells can display distinct morphological and phenotypic characteristics, including variations in cellular morphology, metastatic potential and variability treatment responses among patients. Therefore, a comprehensive understanding of such heterogeneity is necessary for deciphering tumor-specific mechanisms that may be diagnostically and therapeutically valuable. Innovative and multidisciplinary approaches are needed to understand this complex feature. In this context, proteogenomics has been emerging as a significant resource for integrating omics fields such as genomics and proteomics. By combining data obtained from both Next-Generation Sequencing (NGS) technologies and mass spectrometry (MS) analyses, proteogenomics aims to provide a comprehensive view of tumor heterogeneity. This approach reveals molecular alterations and phenotypic features related to tumor subtypes, potentially identifying therapeutic biomarkers. Many achievements have been made; however, despite continuous advances in proteogenomics-based methodologies, several challenges remain: in particular the limitations in sensitivity and specificity and the lack of optimal study models. This review highlights the impact of proteogenomics on characterizing tumor phenotypes, focusing on the critical challenges and current limitations of its use in different clinical and preclinical models for tumor phenotypic characterization.
Collapse
Affiliation(s)
- Diletta Piana
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
| | - Federica Iavarone
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
| | - Elisa De Paolis
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
- Departmental Unit of Molecular and Genomic Diagnostics, Genomics Core Facility, Gemelli Science and Technology Park (G-STeP), Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
| | - Gennaro Daniele
- Phase 1 Unit, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy;
| | - Federico Parisella
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
| | - Angelo Minucci
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
- Departmental Unit of Molecular and Genomic Diagnostics, Genomics Core Facility, Gemelli Science and Technology Park (G-STeP), Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
| | - Viviana Greco
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
| | - Andrea Urbani
- Department of Basic Biotechnological Sciences, Intensivological and Perioperative Clinics, Università Cattolica del Sacro Cuore, 00168 Rome, Italy; (D.P.); (F.I.); (F.P.)
- Departmen Unity of Chemistry, Biochemistry and Clinical Molecular Biology, Department of Diagnostic and Laboratory Medicine, Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy; (E.D.P.); (A.M.)
| |
Collapse
|
5
|
Kuznetsova KG, Vašíček J, Skiadopoulou D, Molnes J, Udler M, Johansson S, Njølstad PR, Manning A, Vaudel M. Bioinformatics pipeline for the systematic mining genomic and proteomic variation linked to rare diseases: The example of monogenic diabetes. PLoS One 2024; 19:e0300350. [PMID: 38635808 PMCID: PMC11025945 DOI: 10.1371/journal.pone.0300350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/23/2024] [Indexed: 04/20/2024] Open
Abstract
Monogenic diabetes is characterized as a group of diseases caused by rare variants in single genes. Like for other rare diseases, multiple genes have been linked to monogenic diabetes with different measures of pathogenicity, but the information on the genes and variants is not unified among different resources, making it challenging to process them informatically. We have developed an automated pipeline for collecting and harmonizing data on genetic variants linked to monogenic diabetes. Furthermore, we have translated variant genetic sequences into protein sequences accounting for all protein isoforms and their variants. This allows researchers to consolidate information on variant genes and proteins linked to monogenic diabetes and facilitates their study using proteomics or structural biology. Our open and flexible implementation using Jupyter notebooks enables tailoring and modifying the pipeline and its application to other rare diseases.
Collapse
Affiliation(s)
- Ksenia G. Kuznetsova
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Jakub Vašíček
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Dafni Skiadopoulou
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Janne Molnes
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| | - Miriam Udler
- Department of Medicine, Massachusetts General Hospital, Boston, MA, United States of America
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States of America
- Department of Medicine, Harvard Medical School, Boston, MA, United States of America
| | - Stefan Johansson
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
| | - Pål Rasmus Njølstad
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Children and Youth Clinic, Haukeland University Hospital, Bergen, Norway
| | - Alisa Manning
- Department of Medicine, Massachusetts General Hospital, Boston, MA, United States of America
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA, United States of America
- Department of Medicine, Harvard Medical School, Boston, MA, United States of America
| | - Marc Vaudel
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway
| |
Collapse
|
6
|
Wang F, Zhang Z, Mao M, Yang Y, Xu P, Lu S. COSMIC-based mutation database enhances identification efficiency of HLA-I immunopeptidome. J Transl Med 2024; 22:144. [PMID: 38336780 PMCID: PMC10858511 DOI: 10.1186/s12967-023-04821-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 12/20/2023] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Neoantigens have emerged as a promising area of focus in tumor immunotherapy, with several established strategies aiming to enhance their identification. Human leukocyte antigen class I molecules (HLA-I), which present intracellular immunopeptides to T cells, provide an ideal source for identifying neoantigens. However, solely relying on a mutation database generated through commonly used whole exome sequencing (WES) for the identification of HLA-I immunopeptides, may result in potential neoantigens being missed due to limitations in sequencing depth and sample quality. METHOD In this study, we constructed and evaluated an extended database for neoantigen identification, based on COSMIC mutation database. This study utilized mass spectrometry-based proteogenomic profiling to identify the HLA-I immunopeptidome enriched from HepG2 cell. HepG2 WES-based and the COSMIC-based mutation database were generated and utilized to identify HepG2-specific mutant immunopeptides. RESULT The results demonstrated that COSMIC-based database identified 5 immunopeptides compared to only 1 mutant peptide identified by HepG2 WES-based database, indicating its effectiveness in identifying mutant immunopeptides. Furthermore, HLA-I affinity of the mutant immunopeptides was evaluated through NetMHCpan and peptide-docking modeling to validate their binding to HLA-I molecules, demonstrating the potential of mutant peptides identified by the COSMIC-based database as neoantigens. CONCLUSION Utilizing the COSMIC-based mutation database is a more efficient strategy for identifying mutant peptides from HLA-I immunopeptidome without significantly increasing the false positive rate. HepG2 specific WES-based database may exclude certain mutant peptides due to WES sequencing depth or sample heterogeneity. The COSMIC-based database can effectively uncover potential neoantigens within the HLA-I immunopeptidomes.
Collapse
Affiliation(s)
- Fangzhou Wang
- Medical School of Chinese People's Liberation Army (PLA), Faculty of Hepato-Pancreato-Biliary Surgery, Chinese PLA General Hospital, Institute of Hepatobiliary Surgery of Chinese PLA, Key Laboratory of Digital Hepatobiliary Surgery PLA, 28 Fuxing Road, Haidian District, Beijing, 100853, China
| | - Zhenpeng Zhang
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Beijing Proteome Research Center, Institute of Lifeomics, 38 Life Science Park Road, Changping District, Beijing, 102206, China
| | - Mingsong Mao
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Beijing Proteome Research Center, Institute of Lifeomics, 38 Life Science Park Road, Changping District, Beijing, 102206, China
- School of Basic Medical Sciences, Anhui Medical University, Hefei, China
| | - Yudai Yang
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Beijing Proteome Research Center, Institute of Lifeomics, 38 Life Science Park Road, Changping District, Beijing, 102206, China
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Ping Xu
- State Key Laboratory of Proteomics, National Center for Protein Sciences (Beijing), Research Unit of Proteomics and Research and Development of New Drug of Chinese Academy of Medical Sciences, Beijing Proteome Research Center, Institute of Lifeomics, 38 Life Science Park Road, Changping District, Beijing, 102206, China.
- School of Basic Medical Sciences, Anhui Medical University, Hefei, China.
- Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
- School of Medicine, Guizhou University, Guiyang, China.
| | - Shichun Lu
- Medical School of Chinese People's Liberation Army (PLA), Faculty of Hepato-Pancreato-Biliary Surgery, Chinese PLA General Hospital, Institute of Hepatobiliary Surgery of Chinese PLA, Key Laboratory of Digital Hepatobiliary Surgery PLA, 28 Fuxing Road, Haidian District, Beijing, 100853, China.
| |
Collapse
|
7
|
Skiadopoulou D, Vašíček J, Kuznetsova K, Bouyssié D, Käll L, Vaudel M. Retention Time and Fragmentation Predictors Increase Confidence in Identification of Common Variant Peptides. J Proteome Res 2023; 22:3190-3199. [PMID: 37656829 PMCID: PMC10563157 DOI: 10.1021/acs.jproteome.3c00243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Indexed: 09/03/2023]
Abstract
Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.
Collapse
Affiliation(s)
- Dafni Skiadopoulou
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Jakub Vašíček
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Ksenia Kuznetsova
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - David Bouyssié
- Institut
de Pharmacologie et de Biologie Structurale (IPBS), Université
de Toulouse, CNRS, Université Toulouse III—Paul Sabatier
(UT3), 31000 Toulouse, France
| | - Lukas Käll
- Science
for Life Laboratory, School of Engineering Sciences in Chemistry,
Biotechnology and Health, KTH Royal Institute
of Technology, SE-100 44 Stockholm, Sweden
| | - Marc Vaudel
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
- Department
of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, N-0213 Oslo, Norway
| |
Collapse
|
8
|
Zhang B, Bassani-Sternberg M. Current perspectives on mass spectrometry-based immunopeptidomics: the computational angle to tumor antigen discovery. J Immunother Cancer 2023; 11:e007073. [PMID: 37899131 PMCID: PMC10619091 DOI: 10.1136/jitc-2023-007073] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/21/2023] [Indexed: 10/31/2023] Open
Abstract
Identification of tumor antigens presented by the human leucocyte antigen (HLA) molecules is essential for the design of effective and safe cancer immunotherapies that rely on T cell recognition and killing of tumor cells. Mass spectrometry (MS)-based immunopeptidomics enables high-throughput, direct identification of HLA-bound peptides from a variety of cell lines, tumor tissues, and healthy tissues. It involves immunoaffinity purification of HLA complexes followed by MS profiling of the extracted peptides using data-dependent acquisition, data-independent acquisition, or targeted approaches. By incorporating DNA, RNA, and ribosome sequencing data into immunopeptidomics data analysis, the proteogenomic approach provides a powerful means for identifying tumor antigens encoded within the canonical open reading frames of annotated coding genes and non-canonical tumor antigens derived from presumably non-coding regions of our genome. We discuss emerging computational challenges in immunopeptidomics data analysis and tumor antigen identification, highlighting key considerations in the proteogenomics-based approach, including accurate DNA, RNA and ribosomal sequencing data analysis, careful incorporation of predicted novel protein sequences into reference protein database, special quality control in MS data analysis due to the expanded and heterogeneous search space, cancer-specificity determination, and immunogenicity prediction. The advancements in technology and computation is continually enabling us to identify tumor antigens with higher sensitivity and accuracy, paving the way toward the development of more effective cancer immunotherapies.
Collapse
Affiliation(s)
- Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Michal Bassani-Sternberg
- Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
- Department of Oncology, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
- Agora Cancer Research Centre, Lausanne, Switzerland
| |
Collapse
|
9
|
Wang H, Dai C, Pfeuffer J, Sachsenberg T, Sanchez A, Bai M, Perez-Riverol Y. Tissue-based absolute quantification using large-scale TMT and LFQ experiments. Proteomics 2023; 23:e2300188. [PMID: 37488995 DOI: 10.1002/pmic.202300188] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 07/04/2023] [Accepted: 07/05/2023] [Indexed: 07/26/2023]
Abstract
Relative and absolute intensity-based protein quantification across cell lines, tissue atlases and tumour datasets is increasingly available in public datasets. These atlases enable researchers to explore fundamental biological questions, such as protein existence, expression location, quantity and correlation with RNA expression. Most studies provide MS1 feature-based label-free quantitative (LFQ) datasets; however, growing numbers of isobaric tandem mass tags (TMT) datasets remain unexplored. Here, we compare traditional intensity-based absolute quantification (iBAQ) proteome abundance ranking to an analogous method using reporter ion proteome abundance ranking with data from an experiment where LFQ and TMT were measured on the same samples. This new TMT method substitutes reporter ion intensities for MS1 feature intensities in the iBAQ framework. Additionally, we compared LFQ-iBAQ values to TMT-iBAQ values from two independent large-scale tissue atlas datasets (one LFQ and one TMT) using robust bottom-up proteomic identification, normalisation and quantitation workflows.
Collapse
Affiliation(s)
- Hong Wang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Chengxin Dai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Julianus Pfeuffer
- Algorithmic Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Timo Sachsenberg
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Aniel Sanchez
- Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Skåne University Hospital Malmö, Malmö, Sweden
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing, China
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
10
|
Kazdal D, Menzel M, Budczies J, Stenzinger A. [Molecular tumor diagnostics as the driving force behind precision oncology]. Dtsch Med Wochenschr 2023; 148:1157-1165. [PMID: 37657453 DOI: 10.1055/a-1937-0347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Molecular pathological diagnostics plays a central role in personalized oncology and requires multidisciplinary teamwork. It is just as relevant for the individual patient who is being treated with an approved therapy method or an individual treatment attempt as it is for prospective clinical studies that require the identification of specific therapeutic target structures or complex biomarkers for study inclusion. It is also of crucial importance for the generation of real-world data, which is becoming increasingly important for drug development. Future developments will be significantly shaped by improvements in scalable molecular diagnostics, in which increasingly complex and multi-layered data sets must be quickly converted into clinically useful information. One focus will be on the development of adaptive diagnostic strategies in order to be able to depict the enormous plasticity of a cancer disease over time.
Collapse
|
11
|
Bai M, Deng J, Dai C, Pfeuffer J, Sachsenberg T, Perez-Riverol Y. LFQ-Based Peptide and Protein Intensity Differential Expression Analysis. J Proteome Res 2023. [PMID: 37220883 DOI: 10.1021/acs.jproteome.2c00812] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Testing for significant differences in quantities at the protein level is a common goal of many LFQ-based mass spectrometry proteomics experiments. Starting from a table of protein and/or peptide quantities from a given proteomics quantification software, many tools and R packages exist to perform the final tasks of imputation, summarization, normalization, and statistical testing. To evaluate the effects of packages and settings in their substeps on the final list of significant proteins, we studied several packages on three public data sets with known expected protein fold changes. We found that the results between packages and even across different parameters of the same package can vary significantly. In addition to usability aspects and feature/compatibility lists of different packages, this paper highlights sensitivity and specificity trade-offs that come with specific packages and settings.
Collapse
Affiliation(s)
- Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Jingwen Deng
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
| | - Chengxin Dai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Julianus Pfeuffer
- Algorithmic Bioinformatics, Freie Universität Berlin, Berlin 14195, Germany
- Visualization and Data Analysis, Zuse Institute Berlin, Berlin 14195, Germany
| | - Timo Sachsenberg
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen 72076, Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hixton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
12
|
Babačić H, Galardi S, Umer HM, Hellström M, Uhrbom L, Maturi N, Cardinali D, Pellegatta S, Michienzi A, Trevisi G, Mangiola A, Lehtiö J, Ciafrè SA, Pernemalm M. Glioblastoma stem cells express non-canonical proteins and exclusive mesenchymal-like or non-mesenchymal-like protein signatures. Mol Oncol 2023; 17:238-260. [PMID: 36495079 PMCID: PMC9892829 DOI: 10.1002/1878-0261.13355] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/08/2022] [Indexed: 12/14/2022] Open
Abstract
Glioblastoma (GBM) cancer stem cells (GSCs) contribute to GBM's origin, recurrence, and resistance to treatment. However, the understanding of how mRNA expression patterns of GBM subtypes are reflected at global proteome level in GSCs is limited. To characterize protein expression in GSCs, we performed in-depth proteogenomic analysis of patient-derived GSCs by RNA-sequencing and mass-spectrometry. We quantified > 10 000 proteins in two independent GSC panels and propose a GSC-associated proteomic signature characterizing two distinct phenotypic conditions; one defined by proteins upregulated in proneural and classical GSCs (GPC-like), and another by proteins upregulated in mesenchymal GSCs (GM-like). The GM-like protein set in GBM tissue was associated with necrosis, recurrence, and worse overall survival. Through proteogenomics, we discovered 252 non-canonical peptides in the GSCs, i.e., protein sequences that are variant or derive from genome regions previously considered non-protein-coding, including variants of the heterogeneous ribonucleoproteins implicated in RNA splicing. In summary, GSCs express two protein sets that have an inverse association with clinical outcomes in GBM. The discovery of non-canonical protein sequences questions existing gene models and pinpoints new protein targets for research in GBM.
Collapse
Affiliation(s)
- Haris Babačić
- Department of Oncology and PathologyKarolinska Institute, Science for Life LaboratoryStockholmSweden
| | - Silvia Galardi
- Department of Biomedicine and PreventionUniversity of Rome Tor VergataItaly
| | - Husen M. Umer
- Department of Oncology and PathologyKarolinska Institute, Science for Life LaboratoryStockholmSweden
| | - Mats Hellström
- Department of Immunology, Genetics and PathologyUppsala UniversitySweden
| | - Lene Uhrbom
- Department of Immunology, Genetics and PathologyUppsala UniversitySweden
| | | | - Deborah Cardinali
- Department of Biomedicine and PreventionUniversity of Rome Tor VergataItaly
| | - Serena Pellegatta
- Unit of Immunotherapy of Brain Tumors, Department of Molecular Neuro‐Oncology, Foundation IRCCSInstitute for Neurology Carlo BestaMilanItaly
| | | | - Gianluca Trevisi
- Neurosurgical UnitHospital Spirito Santo, Pescara, “G. D'Annunzio” UniversityChietiItaly
| | - Annunziato Mangiola
- Neurosurgical UnitHospital Spirito Santo, Pescara, “G. D'Annunzio” UniversityChietiItaly
| | - Janne Lehtiö
- Department of Oncology and PathologyKarolinska Institute, Science for Life LaboratoryStockholmSweden
| | - Silvia Anna Ciafrè
- Department of Biomedicine and PreventionUniversity of Rome Tor VergataItaly
| | - Maria Pernemalm
- Department of Oncology and PathologyKarolinska Institute, Science for Life LaboratoryStockholmSweden
| |
Collapse
|
13
|
Vašíček J, Skiadopoulou D, Kuznetsova KG, Wen B, Johansson S, Njølstad PR, Bruckner S, Käll L, Vaudel M. Finding haplotypic signatures in proteins. Gigascience 2022; 12:giad093. [PMID: 37919975 PMCID: PMC10622322 DOI: 10.1093/gigascience/giad093] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/24/2023] [Accepted: 10/08/2023] [Indexed: 11/04/2023] Open
Abstract
BACKGROUND The nonrandom distribution of alleles of common genomic variants produces haplotypes, which are fundamental in medical and population genetic studies. Consequently, protein-coding genes with different co-occurring sets of alleles can encode different amino acid sequences: protein haplotypes. These protein haplotypes are present in biological samples and detectable by mass spectrometry, but they are not accounted for in proteomic searches. Consequently, the impact of haplotypic variation on the results of proteomic searches and the discoverability of peptides specific to haplotypes remain unknown. FINDINGS Here, we study how common genetic haplotypes influence the proteomic search space and investigate the possibility to match peptides containing multiple amino acid substitutions to a publicly available data set of mass spectra. We found that for 12.42% of the discoverable amino acid substitutions encoded by common haplotypes, 2 or more substitutions may co-occur in the same peptide after tryptic digestion of the protein haplotypes. We identified 352 spectra that matched to such multivariant peptides, and out of the 4,582 amino acid substitutions identified, 6.37% were covered by multivariant peptides. However, the evaluation of the reliability of these matches remains challenging, suggesting that refined error rate estimation procedures are needed for such complex proteomic searches. CONCLUSIONS As these procedures become available and the ability to analyze protein haplotypes increases, we anticipate that proteomics will provide new information on the consequences of common variation, across tissues and time.
Collapse
Affiliation(s)
- Jakub Vašíček
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Dafni Skiadopoulou
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Ksenia G Kuznetsova
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
| | - Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Stefan Johansson
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Department of Medical Genetics, Haukeland University Hospital, Bergen 5021, Norway
| | - Pål R Njølstad
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Children and Youth Clinic, Haukeland University Hospital, Bergen 5021, Norway
| | - Stefan Bruckner
- Chair of Visual Analytics, Institute for Visual and Analytic Computing, University of Rostock, Rostock 18051, Germany
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH–Royal Institute of Technology, Solna 17121, Sweden
| | - Marc Vaudel
- Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen 5008, Norway
- Department of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, Oslo 0473, Norway
| |
Collapse
|
14
|
Agüero-Chapin G, Galpert-Cañizares D, Domínguez-Pérez D, Marrero-Ponce Y, Pérez-Machado G, Teijeira M, Antunes A. Emerging Computational Approaches for Antimicrobial Peptide Discovery. Antibiotics (Basel) 2022; 11:antibiotics11070936. [PMID: 35884190 PMCID: PMC9311958 DOI: 10.3390/antibiotics11070936] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 07/01/2022] [Accepted: 07/08/2022] [Indexed: 02/05/2023] Open
Abstract
In the last two decades many reports have addressed the application of artificial intelligence (AI) in the search and design of antimicrobial peptides (AMPs). AI has been represented by machine learning (ML) algorithms that use sequence-based features for the discovery of new peptidic scaffolds with promising biological activity. From AI perspective, evolutionary algorithms have been also applied to the rational generation of peptide libraries aimed at the optimization/design of AMPs. However, the literature has scarcely dedicated to other emerging non-conventional in silico approaches for the search/design of such bioactive peptides. Thus, the first motivation here is to bring up some non-standard peptide features that have been used to build classical ML predictive models. Secondly, it is valuable to highlight emerging ML algorithms and alternative computational tools to predict/design AMPs as well as to explore their chemical space. Another point worthy of mention is the recent application of evolutionary algorithms that actually simulate sequence evolution to both the generation of diversity-oriented peptide libraries and the optimization of hit peptides. Last but not least, included here some new considerations in proteogenomic analyses currently incorporated into the computational workflow for unravelling AMPs in natural sources.
Collapse
Affiliation(s)
- Guillermin Agüero-Chapin
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Correspondence: (G.A.-C.); (A.A.); Tel.: +351-22-340-1813 (G.A.-C. & A.A.)
| | - Deborah Galpert-Cañizares
- Departamento de Ciencia de la Computación, Universidad Central Marta Abreu de Las Villas (UCLV), Santa Clara 54830, Cuba;
| | - Dany Domínguez-Pérez
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Proquinorte, Unipessoal, Lda, Avenida 5 de Outubro, 124, 7º Piso, Avenidas Novas, 1050-061 Lisboa, Portugal
| | - Yovani Marrero-Ponce
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Translacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas and Instituto de Simulación Computacional (ISC-USFQ), Diego de Robles y vía Interoceánica, Quito 170157, Ecuador;
| | - Gisselle Pérez-Machado
- EpiDisease S.L—Spin-Off of Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), 46980 Valencia, Spain;
| | - Marta Teijeira
- Departamento de Química Orgánica, Facultade de Química, Universidade de Vigo, 36310 Vigo, Spain;
- Instituto de Investigación Sanitaria Galicia Sur, Hospital Álvaro Cunqueiro, 36213 Vigo, Spain
| | - Agostinho Antunes
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal;
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
- Correspondence: (G.A.-C.); (A.A.); Tel.: +351-22-340-1813 (G.A.-C. & A.A.)
| |
Collapse
|
15
|
Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, Kundu D, Prakash A, Frericks-Zipper A, Eisenacher M, Walzer M, Wang S, Brazma A, Vizcaíno J. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 2022; 50:D543-D552. [PMID: 34723319 PMCID: PMC8728295 DOI: 10.1093/nar/gkab1038] [Citation(s) in RCA: 3952] [Impact Index Per Article: 1317.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 12/12/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David García-Seisdedos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anika Frericks-Zipper
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|