1
|
Chouhan U, Sahu RK, Bhatt S, Kurmi S, Choudhari JK. Emerging Trends in Big Data Analysis in Computational Biology and Bioinformatics in Health Informatics: A Case Study on Epilepsy and Seizures. Methods Mol Biol 2024; 2719:99-119. [PMID: 37803114 DOI: 10.1007/978-1-0716-3461-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
Advanced technology innovations allow cost-effective, high-throughput profiling of biological systems. It enabled genome sequencing in days using advanced technologies (e.g., next-generation sequencing, microarrays, and mass spectrometry). Since technology has been developed, massive biological data (e.g., genomics, proteomics) has been produced cheaply, allowing the "big data" era to create new opportunities to solve medical and biological complications in many disciplines-preventive medicine, biology, Personalized Medicine, gene sequencing, healthcare, and industry. Computational biology and bioinformatics are interdisciplinary fields that develop and apply computational methods (e.g., analytical methods, mathematical modeling, and simulation) to analyze large collections of biological data, such as genetic sequences, cell populations, or protein samples, to make new predictions or discover new biology. Biological data storage, mining, and analysis have challenges because data is much more heterogeneous. In this study, the big data resources of genomics, proteomics, and metabolomics have been explored to solve biological problems using big data analysis approaches. The goal is to build a network of relationship-based gene-disease associations to prioritize phenotypes common to epilepsy and seizure disease. Through network analysis, The 10 seed genes, 22 associated genes, 132 microRNAs, and 38 transcription factors have been identified that have a direct effect on all forms of epilepsy and seizures. The majority of seed genes, according to the results of a functional analysis of seed genes, are involved in the acetylcholine-gated channel complex (10%) and the heterotrimeric G-protein complex (10%) pathways related to cellular components, followed by a role in the regulation of action potential (20%) and positive regulation of vascular endothelial growth factor production (20%) in Epilepsy and Seizures pathways related to biological processes. This study might provide insight into the workings of the disease and shows the importance of continued research into epilepsy and other conditions that can trigger seizure activity.
Collapse
Affiliation(s)
- Usha Chouhan
- Department of Mathematics, Bioinformatics & Computer Applications, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
| | - Rakesh Kumar Sahu
- Department of Biotechnology, Government V.Y.T. Post Graduate Autonomous College, Durg, Chhattisgarh, India
| | - Shaifali Bhatt
- Department of Mathematics, Bioinformatics & Computer Applications, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
| | - Sonu Kurmi
- Department of Mathematics, Bioinformatics & Computer Applications, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
| | - Jyoti Kant Choudhari
- Department of Mathematics, Bioinformatics & Computer Applications, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India
| |
Collapse
|
2
|
Wu L, Hoque A, Lam H. Spectroscape enables real-time query and visualization of a spectral archive in proteomics. Nat Commun 2023; 14:6267. [PMID: 37805652 PMCID: PMC10560257 DOI: 10.1038/s41467-023-42006-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/26/2023] [Indexed: 10/09/2023] Open
Abstract
In proteomics, spectral archives organize the enormous amounts of publicly available peptide tandem mass spectra by similarity, offering opportunities for error correction and novel discoveries. Here we adapt an indexing algorithm developed by Facebook for organizing online multimedia resources to tandem mass spectra and achieve practically instantaneous retrieval and clustering of approximate nearest neighbors in a large spectral archive. An interactive web-based graphical user interface enables the user to view a query spectrum in its clustered neighborhood, which facilitates contextual validation of peptide identifications and exploration of the dark proteome.
Collapse
Affiliation(s)
- Long Wu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
- Department of Electrical and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Ayman Hoque
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong.
| |
Collapse
|
3
|
Mohammed Y, Goodlett D, Borchers CH. Bioinformatics Tools and Knowledgebases to Assist Generating Targeted Assays for Plasma Proteomics. Methods Mol Biol 2023; 2628:557-577. [PMID: 36781806 DOI: 10.1007/978-1-0716-2978-9_32] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
In targeted proteomics experiments, selecting the appropriate proteotypic peptides as surrogate for the target protein is a crucial pre-acquisition step. This step is largely a bioinformatics exercise that involves integrating information on the peptides and proteins and using various software tools and knowledgebases. We present here a few resources that automate and simplify the selection process to a great degree. These tools and knowledgebases were developed primarily to streamline targeted proteomics assay development and include PeptidePicker, PeptidePickerDB, MRMAssayDB, MouseQuaPro, and PeptideTracker. We have used these tools to develop and document thousands of targeted proteomics assays, many of them for plasma proteins with focus on human and mouse. An important aspect in all these resources is the integrative approach on which they are based. Using these tools in the first steps of designing a singleplexed or multiplexed targeted proteomic experiment can reduce the necessary experimental steps tremendously. All the tools and knowledgebases we describe here are Web-based and freely accessible so scientists can query the information conveniently from the browser. This chapter provides an overview of these software tools and knowledgebases, their content, and how to use them for targeted plasma proteomics. We further demonstrate how to use them with the results of the HUPO Human Plasma Proteome Project to produce a new database of 3.8 k targeted assays for known human plasma proteins. Upon experimental validation, these assays should help in the further quantitative characterizing of the plasma proteome.
Collapse
Affiliation(s)
- Yassene Mohammed
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, ZA, Netherlands. .,University of Victoria - Genome BC Proteomics Centre, Victoria, BC, Canada. .,Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada.
| | - David Goodlett
- University of Victoria - Genome BC Proteomics Centre, Victoria, BC, Canada.,Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada.,University of Gdansk, International Centre for Cancer Vaccine Science, Gdansk, Poland
| | - Christoph H Borchers
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, QC, Canada.,Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, QC, Canada.,Division of Experimental Medicine, McGill University, Montreal, QC, Canada.,Department of Pathology, McGill University, Montreal, QC, Canada
| |
Collapse
|
4
|
Srivastava N, Sarethy IP, Jeevanandam J, Danquah M. Emerging strategies for microbial screening of novel chemotherapeutics. J Mol Struct 2022. [DOI: 10.1016/j.molstruc.2022.132419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
5
|
Network Biology and Artificial Intelligence Drive the Understanding of the Multidrug Resistance Phenotype in Cancer. Drug Resist Updat 2022; 60:100811. [DOI: 10.1016/j.drup.2022.100811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/22/2022] [Accepted: 01/24/2022] [Indexed: 02/07/2023]
|
6
|
Halder A, Verma A, Biswas D, Srivastava S. Recent advances in mass-spectrometry based proteomics software, tools and databases. DRUG DISCOVERY TODAY. TECHNOLOGIES 2021; 39:69-79. [PMID: 34906327 DOI: 10.1016/j.ddtec.2021.06.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/08/2021] [Accepted: 06/21/2021] [Indexed: 01/12/2023]
Abstract
The field of proteomics immensely depends on data generation and data analysis which are thoroughly supported by software and databases. There has been a massive advancement in mass spectrometry-based proteomics over the last 10 years which has compelled the scientific community to upgrade or develop algorithms, tools, and repository databases in the field of proteomics. Several standalone software, and comprehensive databases have aided the establishment of integrated omics pipeline and meta-analysis workflow which has contributed to understand the disease pathobiology, biomarker discovery and predicting new therapeutic modalities. For shotgun proteomics where Data Dependent Acquisition is performed, several user-friendly software are developed that can analyse the pre-processed data to provide mechanistic insights of the disease. Likewise, in Data Independent Acquisition, pipelines are emerged which can accomplish the task from building the spectral library to identify the therapeutic targets. Furthermore, in the age of big data analysis the implications of machine learning and cloud computing are appending robustness, rapidness and in-depth proteomics data analysis. The current review talks about the recent advancement, and development of software, tools, and database in the field of mass-spectrometry based proteomics.
Collapse
Affiliation(s)
- Ankit Halder
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Ayushi Verma
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Deeptarup Biswas
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India.
| |
Collapse
|
7
|
Methods for Proteomic Analyses of Mycobacteria. Methods Mol Biol 2021. [PMID: 34235669 DOI: 10.1007/978-1-0716-1460-0_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
The use of proteomic technologies to characterize and study the proteome of mycobacteria has provided important information in terms of function, diversity, protein-protein interactions, and host-pathogen interactions in Mycobacterium spp. There are many different mass spectrometry methodologies that can be applied to proteomics studies of mycobacteria and microorganisms in general. Sample processing and appropriate study design are critical to generating high-quality data regardless of the mass spectrometry method applied. Appropriate study design relies on statistical rigor and data curation using bioinformatics approaches that are widely applicable regardless of the organism or system studied. Sample processing, on the other hand, is often a niched process specific to the physiology of the organism or system under investigation. Therefore, in this chapter, we will provide protocols for processing mycobacterial protein samples for the specific application of Top-down and Bottom-up proteomic analyses.
Collapse
|
8
|
Sengupta A, Naresh G, Mishra A, Parashar D, Narad P. Proteome analysis using machine learning approaches and its applications to diseases. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:161-216. [PMID: 34340767 DOI: 10.1016/bs.apcsb.2021.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
With the tremendous developments in the fields of biological and medical technologies, huge amounts of data are generated in the form of genomic data, images in medical databases or as data on protein sequences, and so on. Analyzing this data through different tools sheds light on the particulars of the disease and our body's reactions to it, thus, aiding our understanding of the human health. Most useful of these tools is artificial intelligence and deep learning (DL). The artificially created neural networks in DL algorithms help extract viable data from the datasets, and further, to recognize patters in these complex datasets. Therefore, as a part of machine learning, DL helps us face all the various challenges that come forth during protein prediction, protein identification and their quantification. Proteomics is the study of such proteins, their structures, features, properties and so on. As a form of data science, Proteomics has helped us progress excellently in the field of genomics technologies. One of the major techniques used in proteomics studies is mass spectrometry (MS). However, MS is efficient with analysis of large datasets only with the added help of informatics approaches for data analysis and interpretation; these mainly include machine learning and deep learning algorithms. In this chapter, we will discuss in detail the applications of deep learning and various algorithms of machine learning in proteomics.
Collapse
Affiliation(s)
- Abhishek Sengupta
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | - G Naresh
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | - Astha Mishra
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | - Diksha Parashar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | - Priyanka Narad
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India.
| |
Collapse
|
9
|
Bhowmick P, Roome S, Borchers CH, Goodlett DR, Mohammed Y. An Update on MRMAssayDB: A Comprehensive Resource for Targeted Proteomics Assays in the Community. J Proteome Res 2021; 20:2105-2115. [PMID: 33683131 PMCID: PMC8041396 DOI: 10.1021/acs.jproteome.0c00961] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Precise multiplexed
quantification of proteins in biological samples
can be achieved by targeted proteomics using multiple or parallel
reaction monitoring (MRM/PRM). Combined with internal standards, the
method achieves very good repeatability and reproducibility enabling
excellent protein quantification and allowing longitudinal and cohort
studies. A laborious part of performing such experiments lies in the
preparation steps dedicated to the development and validation of individual
protein assays. Several public repositories host information on targeted
proteomics assays, including NCI’s Clinical Proteomic Tumor
Analysis Consortium assay portals, PeptideAtlas SRM Experiment Library,
SRMAtlas, PanoramaWeb, and PeptideTracker, with all offering varying
levels of details. We introduced MRMAssayDB in 2018 as an integrated
resource for targeted proteomics assays. The Web-based application
maps and links the assays from the repositories, includes comprehensive
up-to-date protein and sequence annotations, and provides multiple
visualization options on the peptide and protein level. We have extended
MRMAssayDB with more assays and extensive annotations. Currently it
contains >828 000 assays covering >51 000 proteins
from
94 organisms, of which >17 000 proteins are present in >2400
biological pathways, and >48 000 mapping to >21 000
Gene Ontology terms. This is an increase of about four times the number
of assays since introduction. We have expanded annotations of interaction,
biological pathways, and disease associations. A newly added visualization
module for coupled molecular structural annotation browsing allows
the user to interactively examine peptide sequence and any known PTMs
and disease mutations, and map all to available protein 3D structures.
Because of its integrative approach, MRMAssayDB enables a holistic
view of suitable proteotypic peptides and commonly used transitions
in empirical data. Availability: http://mrmassaydb.proteincentre.com.
Collapse
Affiliation(s)
- Pallab Bhowmick
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada
| | - Simon Roome
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada
| | - Christoph H Borchers
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, Quebec H3T 1E2, Canada.,Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, Quebec H3T 1E2, Canada.,Department of Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Nobel Street, Moscow 121205, Russia
| | - David R Goodlett
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada.,University of Gdansk, International Centre for Cancer Vaccine Science, 80-309 Gdansk, Poland
| | - Yassene Mohammed
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada.,Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZA Leiden, Netherlands
| |
Collapse
|
10
|
Schmidt T, Samaras P, Frejno M, Gessulat S, Barnert M, Kienegger H, Krcmar H, Schlegl J, Ehrlich HC, Aiche S, Kuster B, Wilhelm M. ProteomicsDB. Nucleic Acids Res 2019; 46:D1271-D1281. [PMID: 29106664 PMCID: PMC5753189 DOI: 10.1093/nar/gkx1029] [Citation(s) in RCA: 161] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/22/2017] [Indexed: 01/01/2023] Open
Abstract
ProteomicsDB (https://www.ProteomicsDB.org) is a protein-centric in-memory database for the exploration of large collections of quantitative mass spectrometry-based proteomics data. ProteomicsDB was first released in 2014 to enable the interactive exploration of the first draft of the human proteome. To date, it contains quantitative data from 78 projects totalling over 19k LC–MS/MS experiments. A standardized analysis pipeline enables comparisons between multiple datasets to facilitate the exploration of protein expression across hundreds of tissues, body fluids and cell lines. We recently extended the data model to enable the storage and integrated visualization of other quantitative omics data. This includes transcriptomics data from e.g. NCBI GEO, protein–protein interaction information from STRING, functional annotations from KEGG, drug-sensitivity/selectivity data from several public sources and reference mass spectra from the ProteomeTools project. The extended functionality transforms ProteomicsDB into a multi-purpose resource connecting quantification and meta-data for each protein. The rich user interface helps researchers to navigate all data sources in either a protein-centric or multi-protein-centric manner. Several options are available to download data manually, while our application programming interface enables accessing quantitative data systematically.
Collapse
Affiliation(s)
- Tobias Schmidt
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, 85354 Bavaria, Germany
| | - Patroklos Samaras
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, 85354 Bavaria, Germany
| | - Martin Frejno
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, 85354 Bavaria, Germany
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, 85354 Bavaria, Germany.,Innovation Center Network, SAP SE, Potsdam 14469, Germany
| | - Maximilian Barnert
- Chair for Information Systems, Technical University of Munich (TUM), Garching 85748, Germany.,SAP University Competence Center, Technical University of Munich (TUM), Garching 85748, Germany
| | - Harald Kienegger
- Chair for Information Systems, Technical University of Munich (TUM), Garching 85748, Germany.,SAP University Competence Center, Technical University of Munich (TUM), Garching 85748, Germany
| | - Helmut Krcmar
- Chair for Information Systems, Technical University of Munich (TUM), Garching 85748, Germany.,SAP University Competence Center, Technical University of Munich (TUM), Garching 85748, Germany
| | | | | | - Stephan Aiche
- Innovation Center Network, SAP SE, Potsdam 14469, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, 85354 Bavaria, Germany.,Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich (TUM), Freising, 85354 Bavaria, Germany
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, 85354 Bavaria, Germany
| |
Collapse
|
11
|
Ammar C, Berchtold E, Csaba G, Schmidt A, Imhof A, Zimmer R. Multi-Reference Spectral Library Yields Almost Complete Coverage of Heterogeneous LC-MS/MS Data Sets. J Proteome Res 2019; 18:1553-1566. [DOI: 10.1021/acs.jproteome.8b00819] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Constantin Ammar
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
| | - Evi Berchtold
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
| | - Gergely Csaba
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
| | - Andreas Schmidt
- Zentrallabor für Proteinanalytik (Protein Analysis Unit), Ludwig-Maximilians-Universität München, Grosshaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Axel Imhof
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
- Zentrallabor für Proteinanalytik (Protein Analysis Unit), Ludwig-Maximilians-Universität München, Grosshaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Ralf Zimmer
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
| |
Collapse
|
12
|
Ghosh D, Bernstein JA, Khurana Hershey GK, Rothenberg ME, Mersha TB. Leveraging Multilayered "Omics" Data for Atopic Dermatitis: A Road Map to Precision Medicine. Front Immunol 2018; 9:2727. [PMID: 30631320 PMCID: PMC6315155 DOI: 10.3389/fimmu.2018.02727] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 11/05/2018] [Indexed: 12/14/2022] Open
Abstract
Atopic dermatitis (AD) is a complex multifactorial inflammatory skin disease that affects ~280 million people worldwide. About 85% of AD cases begin in childhood, a significant portion of which can persist into adulthood. Moreover, a typical progression of children with AD to food allergy, asthma or allergic rhinitis has been reported (“allergic march” or “atopic march”). AD comprises highly heterogeneous sub-phenotypes/endotypes resulting from complex interplay between intrinsic and extrinsic factors, such as environmental stimuli, and genetic factors regulating cutaneous functions (impaired barrier function, epidermal lipid, and protease abnormalities), immune functions and the microbiome. Though the roles of high-throughput “omics” integrations in defining endotypes are recognized, current analyses are primarily based on individual omics data and using binary clinical outcomes. Although individual omics analysis, such as genome-wide association studies (GWAS), can effectively map variants correlated with AD, the majority of the heritability and the functional relevance of discovered variants are not explained or known by the identified variants. The limited success of singular approaches underscores the need for holistic and integrated approaches to investigate complex phenotypes using trans-omics data integration strategies. Integrating omics layers (e.g., genome, epigenome, transcriptome, proteome, metabolome, lipidome, exposome, microbiome), which often have complementary and synergistic effects, might provide the opportunity to capture the flow of information underlying AD disease manifestation. Overlapping genes/candidates derived from multiple omics types include FLG, SPINK5, S100A8, and SERPINB3 in AD pathogenesis. Overlapping pathways include macrophage, endothelial cell and fibroblast activation pathways, in addition to well-known Th1/Th2 and NFkB activation pathways. Interestingly, there was more multi-omics overlap at the pathway level than gene level. Further analysis of multi-omics overlap at the tissue level showed that among 30 tissue types from the GTEx database, skin and esophagus were significantly enriched, indicating the biological interconnection between AD and food allergy. The present work explores multi-omics integration and provides new biological insights to better define the biological basis of AD etiology and confirm previously reported AD genes/pathways. In this context, we also discuss opportunities and challenges introduced by “big omics data” and their integration.
Collapse
Affiliation(s)
- Debajyoti Ghosh
- Division of Immunology, Allergy & Rheumatology, Department of Internal Medicine, University of Cincinnati, Cincinnati, OH, United States
| | - Jonathan A Bernstein
- Division of Immunology, Allergy & Rheumatology, Department of Internal Medicine, University of Cincinnati, Cincinnati, OH, United States
| | - Gurjit K Khurana Hershey
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, United States
| | - Marc E Rothenberg
- Division of Allergy and Immunology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, United States
| | - Tesfaye B Mersha
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, United States
| |
Collapse
|
13
|
Li C, Chen T, He Q, Zhu Y, Li K. MRUniNovo: an efficient tool forde novopeptide sequencing utilizing the hadoop distributed computing framework. Bioinformatics 2016; 33:944-946. [DOI: 10.1093/bioinformatics/btw721] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 11/12/2016] [Indexed: 11/15/2022] Open
|
14
|
Aslam B, Basit M, Nisar MA, Khurshid M, Rasool MH. Proteomics: Technologies and Their Applications. J Chromatogr Sci 2016; 55:182-196. [PMID: 28087761 DOI: 10.1093/chromsci/bmw167] [Citation(s) in RCA: 507] [Impact Index Per Article: 56.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Revised: 07/25/2016] [Accepted: 09/08/2016] [Indexed: 12/12/2022]
Abstract
Proteomics involves the applications of technologies for the identification and quantification of overall proteins present content of a cell, tissue or an organism. It supplements the other "omics" technologies such as genomic and transcriptomics to expound the identity of proteins of an organism, and to cognize the structure and functions of a particular protein. Proteomics-based technologies are utilized in various capacities for different research settings such as detection of various diagnostic markers, candidates for vaccine production, understanding pathogenicity mechanisms, alteration of expression patterns in response to different signals and interpretation of functional protein pathways in different diseases. Proteomics is practically intricate because it includes the analysis and categorization of overall protein signatures of a genome. Mass spectrometry with LC-MS-MS and MALDI-TOF/TOF being widely used equipment is the central among current proteomics. However, utilization of proteomics facilities including the software for equipment, databases and the requirement of skilled personnel substantially increase the costs, therefore limit their wider use especially in the developing world. Furthermore, the proteome is highly dynamic because of complex regulatory systems that control the expression levels of proteins. This review efforts to describe the various proteomics approaches, the recent developments and their application in research and analysis.
Collapse
Affiliation(s)
- Bilal Aslam
- Department of Microbiology, Government College University, Faisalabad, Pakistan
| | - Madiha Basit
- Department of Microbiology, Government College University, Faisalabad, Pakistan
| | - Muhammad Atif Nisar
- Department of Microbiology, Government College University, Faisalabad, Pakistan
| | - Mohsin Khurshid
- Department of Microbiology, Government College University, Faisalabad, Pakistan .,College of Allied Health Professionals, Directorate of Medical Sciences, Government College University, Faisalabad, Pakistan
| | | |
Collapse
|
15
|
May JC, McLean JA. Advanced Multidimensional Separations in Mass Spectrometry: Navigating the Big Data Deluge. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:387-409. [PMID: 27306312 PMCID: PMC5763907 DOI: 10.1146/annurev-anchem-071015-041734] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Hybrid analytical instrumentation constructed around mass spectrometry (MS) is becoming the preferred technique for addressing many grand challenges in science and medicine. From the omics sciences to drug discovery and synthetic biology, multidimensional separations based on MS provide the high peak capacity and high measurement throughput necessary to obtain large-scale measurements used to infer systems-level information. In this article, we describe multidimensional MS configurations as technologies that are big data drivers and review some new and emerging strategies for mining information from large-scale datasets. We discuss the information content that can be obtained from individual dimensions, as well as the unique information that can be derived by comparing different levels of data. Finally, we summarize some emerging data visualization strategies that seek to make highly dimensional datasets both accessible and comprehensible.
Collapse
Affiliation(s)
- Jody C May
- Department of Chemistry, Center for Innovative Technology, Vanderbilt Institute for Chemical Biology, Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, Tennessee 37235;
| | - John A McLean
- Department of Chemistry, Center for Innovative Technology, Vanderbilt Institute for Chemical Biology, Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, Tennessee 37235;
| |
Collapse
|
16
|
Perez-Riverol Y, Alpi E, Wang R, Hermjakob H, Vizcaíno JA. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 2015; 15:930-49. [PMID: 25158685 PMCID: PMC4409848 DOI: 10.1002/pmic.201400302] [Citation(s) in RCA: 141] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 08/06/2014] [Accepted: 08/22/2014] [Indexed: 01/10/2023]
Abstract
Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | |
Collapse
|
17
|
ProCon — PROteomics CONversion tool. J Proteomics 2015; 129:56-62. [DOI: 10.1016/j.jprot.2015.06.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Revised: 05/19/2015] [Accepted: 06/28/2015] [Indexed: 11/22/2022]
|
18
|
Toprak UH, Gillet LC, Maiolica A, Navarro P, Leitner A, Aebersold R. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol Cell Proteomics 2014; 13:2056-71. [PMID: 24623587 PMCID: PMC4125737 DOI: 10.1074/mcp.o113.036475] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/26/2014] [Indexed: 12/21/2022] Open
Abstract
Quantifying the similarity of spectra is an important task in various areas of spectroscopy, for example, to identify a compound by comparing sample spectra to those of reference standards. In mass spectrometry based discovery proteomics, spectral comparisons are used to infer the amino acid sequence of peptides. In targeted proteomics by selected reaction monitoring (SRM) or SWATH MS, predetermined sets of fragment ion signals integrated over chromatographic time are used to identify target peptides in complex samples. In both cases, confidence in peptide identification is directly related to the quality of spectral matches. In this study, we used sets of simulated spectra of well-controlled dissimilarity to benchmark different spectral comparison measures and to develop a robust scoring scheme that quantifies the similarity of fragment ion spectra. We applied the normalized spectral contrast angle score to quantify the similarity of spectra to objectively assess fragment ion variability of tandem mass spectrometric datasets, to evaluate portability of peptide fragment ion spectra for targeted mass spectrometry across different types of mass spectrometers and to discriminate target assays from decoys in targeted proteomics. Altogether, this study validates the use of the normalized spectral contrast angle as a sensitive spectral similarity measure for targeted proteomics, and more generally provides a methodology to assess the performance of spectral comparisons and to support the rational selection of the most appropriate similarity measure. The algorithms used in this study are made publicly available as an open source toolset with a graphical user interface.
Collapse
Affiliation(s)
- Umut H Toprak
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Ludovic C Gillet
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Alessio Maiolica
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Pedro Navarro
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Alexander Leitner
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Ruedi Aebersold
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland; §Faculty of Science, University of Zurich, Zurich, 8093 Zurich, Switzerland
| |
Collapse
|
19
|
Scalbert A, Brennan L, Manach C, Andres-Lacueva C, Dragsted LO, Draper J, Rappaport SM, van der Hooft JJJ, Wishart DS. The food metabolome: a window over dietary exposure. Am J Clin Nutr 2014; 99:1286-308. [PMID: 24760973 DOI: 10.3945/ajcn.113.076133] [Citation(s) in RCA: 350] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The food metabolome is defined as the part of the human metabolome directly derived from the digestion and biotransformation of foods and their constituents. With >25,000 compounds known in various foods, the food metabolome is extremely complex, with a composition varying widely according to the diet. By its very nature it represents a considerable and still largely unexploited source of novel dietary biomarkers that could be used to measure dietary exposures with a high level of detail and precision. Most dietary biomarkers currently have been identified on the basis of our knowledge of food compositions by using hypothesis-driven approaches. However, the rapid development of metabolomics resulting from the development of highly sensitive modern analytic instruments, the availability of metabolite databases, and progress in (bio)informatics has made agnostic approaches more attractive as shown by the recent identification of novel biomarkers of intakes for fruit, vegetables, beverages, meats, or complex diets. Moreover, examples also show how the scrutiny of the food metabolome can lead to the discovery of bioactive molecules and dietary factors associated with diseases. However, researchers still face hurdles, which slow progress and need to be resolved to bring this emerging field of research to maturity. These limits were discussed during the First International Workshop on the Food Metabolome held in Glasgow. Key recommendations made during the workshop included more coordination of efforts; development of new databases, software tools, and chemical libraries for the food metabolome; and shared repositories of metabolomic data. Once achieved, major progress can be expected toward a better understanding of the complex interactions between diet and human health.
Collapse
Affiliation(s)
- Augustin Scalbert
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| | - Lorraine Brennan
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| | - Claudine Manach
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| | - Cristina Andres-Lacueva
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| | - Lars O Dragsted
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| | - John Draper
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| | - Stephen M Rappaport
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| | - Justin J J van der Hooft
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| | - David S Wishart
- From the International Agency for Research on Cancer, Lyon, France (AS); University College Dublin, Dublin, Ireland (LB); the Institut National de la Recherche Agronomique, Clermont-Ferrand, France (CM); Clermont University, Clermont-Ferrand, France (CM); the University of Barcelona, Barcelona, Spain (CA-L); the University of Copenhagen, Frederiksberg, Denmark (LOD); Aberystwyth University, Aberystwyth, United Kingdom (JD); the University of California, Berkeley, CA (SMR); the University of Glasgow, Glasgow, United Kingdom (JJJvdH); and the University of Alberta, Edmonton, Canada (DSW)
| |
Collapse
|
20
|
Abstract
Most biochemical reactions in a cell are regulated by highly specialized proteins, which are the prime mediators of the cellular phenotype. Therefore the identification, quantitation and characterization of all proteins in a cell are of utmost importance to understand the molecular processes that mediate cellular physiology. With the advent of robust and reliable mass spectrometers that are able to analyze complex protein mixtures within a reasonable timeframe, the systematic analysis of all proteins in a cell becomes feasible. Besides the ongoing improvements of analytical hardware, standardized methods to analyze and study all proteins have to be developed that allow the generation of testable new hypothesis based on the enormous pre-existing amount of biological information. Here we discuss current strategies on how to gather, filter and analyze proteomic data sates using available software packages.
Collapse
|
21
|
Muth T, Benndorf D, Reichl U, Rapp E, Martens L. Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. MOLECULAR BIOSYSTEMS 2013; 9:578-85. [PMID: 23238088 DOI: 10.1039/c2mb25415h] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In the past years the integral study of microbial communities of varying complexity has gained increasing research interest. Mass spectrometry-driven metaproteomics enables the analysis of such communities on the functional level, but this fledgling field still faces various technical and semantic challenges regarding experimental data analysis and interpretation. In the present review, we outline the hurdles involved and attempt to cover the most valuable methods and software implementations available to researchers in the field today. Beyond merely focusing on protein identification, we provide an overview on different data pre- and post-processing steps, such as metabolic pathway analysis, that can be useful in a typical metaproteomics workflow. Finally, we briefly discuss directions for future work.
Collapse
Affiliation(s)
- Thilo Muth
- Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Magdeburg, Germany
| | | | | | | | | |
Collapse
|
22
|
A large, consistent plasma proteomics data set from prospectively collected breast cancer patient and healthy volunteer samples. J Transl Med 2011; 9:80. [PMID: 21619653 PMCID: PMC3120690 DOI: 10.1186/1479-5876-9-80] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2010] [Accepted: 05/27/2011] [Indexed: 01/22/2023] Open
Abstract
Background Variability of plasma sample collection and of proteomics technology platforms has been detrimental to generation of large proteomic profile datasets from human biospecimens. Methods We carried out a clinical trial-like protocol to standardize collection of plasma from 204 healthy and 216 breast cancer patient volunteers. The breast cancer patients provided follow up samples at 3 month intervals. We generated proteomics profiles from these samples with a stable and reproducible platform for differential proteomics that employs a highly consistent nanofabricated ChipCube™ chromatography system for peptide detection and quantification with fast, single dimension mass spectrometry (LC-MS). Protein identification is achieved with subsequent LC-MS/MS analysis employing the same ChipCube™ chromatography system. Results With this consistent platform, over 800 LC-MS plasma proteomic profiles from prospectively collected samples of 420 individuals were obtained. Using a web-based data analysis pipeline for LC-MS profiling data, analyses of all peptide peaks from these plasma LC-MS profiles reveals an average coefficient of variability of less than 15%. Protein identification of peptide peaks of interest has been achieved with subsequent LC-MS/MS analyses and by referring to a spectral library created from about 150 discrete LC-MS/MS runs. Verification of peptide quantity and identity is demonstrated with several Multiple Reaction Monitoring analyses. These plasma proteomic profiles are publicly available through ProteomeCommons. Conclusion From a large prospective cohort of healthy and breast cancer patient volunteers and using a nano-fabricated chromatography system, a consistent LC-MS proteomics dataset has been generated that includes more than 800 discrete human plasma profiles. This large proteomics dataset provides an important resource in support of breast cancer biomarker discovery and validation efforts.
Collapse
|
23
|
Vizcaíno JA, Foster JM, Martens L. Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research. J Proteomics 2010; 73:2136-46. [PMID: 20615486 PMCID: PMC2958306 DOI: 10.1016/j.jprot.2010.06.008] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2010] [Revised: 06/15/2010] [Accepted: 06/18/2010] [Indexed: 01/19/2023]
Abstract
Despite the fact that data deposition is not a generalised fact yet in the field of proteomics, several mass spectrometry (MS) based proteomics repositories are publicly available for the scientific community. The main existing resources are: the Global Proteome Machine Database (GPMDB), PeptideAtlas, the PRoteomics IDEntifications database (PRIDE), Tranche, and NCBI Peptidome. In this review the capabilities of each of these will be described, paying special attention to four key properties: data types stored, applicable data submission strategies, supported formats, and available data mining and visualization tools. Additionally, the data contents from model organisms will be enumerated for each resource. There are other valuable smaller and/or more specialized repositories but they will not be covered in this review. Finally, the concept behind the ProteomeXchange consortium, a collaborative effort among the main resources in the field, will be introduced.
Collapse
Key Words
- cv, controlled vocabulary
- hgnc, hugo gene nomenclature committee
- mcp, molecular and cellular proteomics
- mrm, multiple reaction monitoring
- nih, national institutes of health
- ols, ontology lookup service
- picr, protein identifier cross-referencing
- psi, proteomics standards initiative
- qc, quality control
- srm, selected reaction monitoring
- sbeams, systems biology experiment analysis management system
- tpp, trans proteomics pipeline.
- proteomics
- databases
- bioinformatics
- data standards
- repositories
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Joseph M. Foster
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| |
Collapse
|
24
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 372] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
25
|
Bendixen E, Danielsen M, Larsen K, Bendixen C. Advances in porcine genomics and proteomics--a toolbox for developing the pig as a model organism for molecular biomedical research. Brief Funct Genomics 2010; 9:208-19. [DOI: 10.1093/bfgp/elq004] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
|
26
|
Current awareness on yeast. Yeast 2010. [DOI: 10.1002/yea.1716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|