1
|
ProInfer: An interpretable protein inference tool leveraging on biological networks. PLoS Comput Biol 2023; 19:e1010961. [PMID: 36930671 PMCID: PMC10057851 DOI: 10.1371/journal.pcbi.1010961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 03/29/2023] [Accepted: 02/20/2023] [Indexed: 03/18/2023] Open
Abstract
In mass spectrometry (MS)-based proteomics, protein inference from identified peptides (protein fragments) is a critical step. We present ProInfer (Protein Inference), a novel protein assembly method that takes advantage of information in biological networks. ProInfer assists recovery of proteins supported only by ambiguous peptides (a peptide which maps to more than one candidate protein) and enhances the statistical confidence for proteins supported by both unique and ambiguous peptides. Consequently, ProInfer rescues weakly supported proteins thereby improving proteome coverage. Evaluated across THP1 cell line, lung cancer and RAW267.4 datasets, ProInfer always infers the most numbers of true positives, in comparison to mainstream protein inference tools Fido, EPIFANY and PIA. ProInfer is also adept at retrieving differentially expressed proteins, signifying its usefulness for functional analysis and phenotype profiling. Source codes of ProInfer are available at https://github.com/PennHui2016/ProInfer.
Collapse
|
2
|
A Critical Review of Bottom-Up Proteomics: The Good, the Bad, and the Future of this Field. Proteomes 2020; 8:proteomes8030014. [PMID: 32640657 PMCID: PMC7564415 DOI: 10.3390/proteomes8030014] [Citation(s) in RCA: 121] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 06/25/2020] [Accepted: 07/01/2020] [Indexed: 02/07/2023] Open
Abstract
Proteomics is the field of study that includes the analysis of proteins, from either a basic science prospective or a clinical one. Proteins can be investigated for their abundance, variety of proteoforms due to post-translational modifications (PTMs), and their stable or transient protein–protein interactions. This can be especially beneficial in the clinical setting when studying proteins involved in different diseases and conditions. Here, we aim to describe a bottom-up proteomics workflow from sample preparation to data analysis, including all of its benefits and pitfalls. We also describe potential improvements in this type of proteomics workflow for the future.
Collapse
|
3
|
Software-aided detection and structural characterization of cyclic peptide metabolites in biological matrix by high-resolution mass spectrometry. J Pharm Anal 2020; 10:240-246. [PMID: 32612870 PMCID: PMC7322757 DOI: 10.1016/j.jpha.2020.05.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 05/25/2020] [Accepted: 05/25/2020] [Indexed: 11/21/2022] Open
Abstract
Compared to their linear counterparts, cyclic peptides show better biological activities, such as antibacterial, immunosuppressive, and anti-tumor activities, and pharmaceutical properties due to their conformational rigidity. However, cyclic peptides could form numerous putative metabolites from potential hydrolytic cleavages and their fragments are very difficult to interpret. These characteristics pose a great challenge when analyzing metabolites of cyclic peptides by mass spectrometry. This study was to assess and apply a software-aided analytical workflow for the detection and structural characterization of cyclic peptide metabolites. Insulin and atrial natriuretic peptide (ANP) as model cyclic peptides were incubated with trypsin/chymotrypsin and/or rat liver S9, followed by data acquisition using TripleTOF® 5600. Resultant full-scan MS and MS/MS datasets were automatically processed through a combination of targeted and untargeted peak finding strategies. MS/MS spectra of predicted metabolites were interrogated against putative metabolite sequences, in light of a, b, y and internal fragment series. The resulting fragment assignments led to the confirmation and ranking of the metabolite sequences and identification of metabolic modification. As a result, 29 metabolites with linear or cyclic structures were detected in the insulin incubation with the hydrolytic enzymes. Sequences of twenty insulin metabolites were further determined, which were consistent with the hydrolytic sites of these enzymes. In the same manner, multiple metabolites of insulin and ANP formed in rat liver S9 incubation were detected and structurally characterized, some of which have not been previously reported. The results demonstrated the utility of software-aided data processing tool in detection and identification of cyclic peptide metabolites. A software-aided workflow enabling detection and characterization of cyclic peptide metabolites by LC/HRMS. Automatically data processing through a combination of targeted and untargeted peak finding strategies. MS/MS spectra of predicted metabolites interrogated against putative metabolite sequences. Rapidly determining metabolite profiles of insulin and atrial natriuretic peptide in rat liver S9. Potentially applicable to metabolic soft spot analysis and in vitro metabolism across species in drug discovery.
Collapse
|
4
|
Developing Well-Annotated Species-Specific Protein Databases Using Comparative Proteogenomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1140:389-400. [PMID: 31347060 DOI: 10.1007/978-3-030-15950-4_22] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Proteomics is a mass spectrometry-based discipline that aims to analyze proteomes and their functions. Many proteomic studies require well-developed protein databases for reference. However, most proteomes are not well-annotated, aside from model organisms. Techniques like six-frame translation, ab initio gene prediction, and EST databases can aid in maximizing the amount of proteins identified in proteomics experiments, however, each of these has its downfalls. Proteogenomics is a term used to describe the union of proteomics, genomics and transcriptomics to assist in the identification of peptides which would help build better annotated proteome databases. Here, current proteomic and proteogenomic methods will be reviewed, and an example of a comparative proteomics method using lake trout liver samples will be described.
Collapse
|
5
|
Elpa DP, Prabhu GRD, Wu SP, Tay KS, Urban PL. Automation of mass spectrometric detection of analytes and related workflows: A review. Talanta 2019; 208:120304. [PMID: 31816721 DOI: 10.1016/j.talanta.2019.120304] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 08/26/2019] [Accepted: 08/28/2019] [Indexed: 12/13/2022]
Abstract
The developments in mass spectrometry (MS) in the past few decades reveal the power and versatility of this technology. MS methods are utilized in routine analyses as well as research activities involving a broad range of analytes (elements and molecules) and countless matrices. However, manual MS analysis is gradually becoming a thing of the past. In this article, the available MS automation strategies are critically evaluated. Automation of analytical workflows culminating with MS detection encompasses involvement of automated operations in any of the steps related to sample handling/treatment before MS detection, sample introduction, MS data acquisition, and MS data processing. Automated MS workflows help to overcome the intrinsic limitations of MS methodology regarding reproducibility, throughput, and the expertise required to operate MS instruments. Such workflows often comprise automated off-line and on-line steps such as sampling, extraction, derivatization, and separation. The most common instrumental tools include autosamplers, multi-axis robots, flow injection systems, and lab-on-a-chip. Prototyping customized automated MS systems is a way to introduce non-standard automated features to MS workflows. The review highlights the enabling role of automated MS procedures in various sectors of academic research and industry. Examples include applications of automated MS workflows in bioscience, environmental studies, and exploration of the outer space.
Collapse
Affiliation(s)
- Decibel P Elpa
- Department of Applied Chemistry, National Chiao Tung University, 1001 University Rd., Hsinchu, 300, Taiwan; Department of Chemistry, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu, 30013, Taiwan
| | - Gurpur Rakesh D Prabhu
- Department of Applied Chemistry, National Chiao Tung University, 1001 University Rd., Hsinchu, 300, Taiwan; Department of Chemistry, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu, 30013, Taiwan
| | - Shu-Pao Wu
- Department of Applied Chemistry, National Chiao Tung University, 1001 University Rd., Hsinchu, 300, Taiwan.
| | - Kheng Soo Tay
- Department of Chemistry, Faculty of Science, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Pawel L Urban
- Department of Chemistry, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu, 30013, Taiwan; Frontier Research Center on Fundamental and Applied Sciences of Matters, National Tsing Hua University, 101, Section 2, Kuang-Fu Rd., Hsinchu, 30013, Taiwan.
| |
Collapse
|
6
|
Rangel-Zúñiga OA, Camargo A, Marin C, Peña-Orihuela P, Pérez-Martínez P, Delgado-Lista J, González-Guardia L, Yubero-Serrano EM, Tinahones FJ, Malagón MM, Pérez-Jiménez F, Roche HM, López-Miranda J. Proteome from patients with metabolic syndrome is regulated by quantity and quality of dietary lipids. BMC Genomics 2015; 16:509. [PMID: 26152126 PMCID: PMC4493955 DOI: 10.1186/s12864-015-1725-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 06/26/2015] [Indexed: 01/22/2023] Open
Abstract
Background Metabolic syndrome is a multi-component disorder associated to a high risk of cardiovascular disease. Its etiology is the result of a complex interaction between genetic and environmental factors, including dietary habits. We aimed to identify the target proteins modulated by the long-term consumption of four diets differing in the quality and quantity of lipids in the whole proteome of peripheral blood mononuclear cells (PBMC). Results A randomized, controlled trial conducted within the LIPGENE study assigned 24 MetS patients for 12 weeks each to 1 of 4 diets: a) high-saturated fatty acid (HSFA), b) high-monounsaturated fatty acid (HMUFA), c) low-fat, high-complex carbohydrate diets supplemented with placebo (LFHCC) and d) low-fat, high-complex carbohydrate diets supplemented with long chain (LC) n-3 polyunsaturated fatty acids (PUFA) (LFHCC n-3). We analyzed the changes induced in the proteome of both nuclear and cytoplasmic fractions of PBMC using 2-D proteomic analysis. Sixty-seven proteins were differentially expressed after the long-term consumption of the four diets. The HSFA diet induced the expression of proteins responding to oxidative stress, degradation of ubiquitinated proteins and DNA repair. However, HMUFA, LFHCC and LFHCC n-3 diets down-regulated pro-inflammatory and oxidative stress-related proteins and DNA repairing proteins. Conclusion The long-term consumption of HSFA, compared to HMUFA, LFHCC and LFHCC n-3, seems to increase the cardiovascular disease (CVD) risk factors associated with metabolic syndrome, such as inflammation and oxidative stress, and seem lead to DNA damage as a consequence of high oxidative stress. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1725-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Oriol Alberto Rangel-Zúñiga
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Antonio Camargo
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Carmen Marin
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Patricia Peña-Orihuela
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Pablo Pérez-Martínez
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Javier Delgado-Lista
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Lorena González-Guardia
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Elena M Yubero-Serrano
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Francisco J Tinahones
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain. .,Endocrinology and Nutrition Service, Hospital Virgen de la Victoria, Málaga, Spain.
| | - María M Malagón
- CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain. .,Department of Cell Biology, Physiology, and Immunology, IMIBIC/Reina Sofia University Hospital/University of Córdoba, Cordoba, Spain.
| | - Francisco Pérez-Jiménez
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| | - Helen M Roche
- UCD Institute of Food & Health/UCD Conway Institute, School of Public Health and Population Sciences, University College Dublin, Dublin, Ireland.
| | - José López-Miranda
- Lipids and Atherosclerosis Research Unit, IMIBIC/Reina Sofia University Hospital, University of Cordoba, Av. Menendez Pidal s/n. 14004, Córdoba, Spain. .,CIBER Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain.
| |
Collapse
|
7
|
Becher D, Bernhardt J, Fuchs S, Riedel K. Metaproteomics to unravel major microbial players in leaf litter and soil environments: challenges and perspectives. Proteomics 2014; 13:2895-909. [PMID: 23894095 DOI: 10.1002/pmic.201300095] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Revised: 05/03/2013] [Accepted: 05/13/2013] [Indexed: 11/06/2022]
Abstract
Soil- and litter-borne microorganisms vitally contribute to biogeochemical cycles. However, changes in environmental parameters but also human interferences may alter species composition and elicit alterations in microbial activities. Soil and litter metaproteomics, implying the assignment of soil and litter proteins to specific phylogenetic and functional groups, has a great potential to provide essential new insights into the impact of microbial diversity on soil ecosystem functioning. This article will illuminate challenges and perspectives of current soil and litter metaproteomics research, starting with an introduction to an appropriate experimental design and state-of-the-art proteomics methodologies. This will be followed by a summary of important studies aimed at (i) the discovery of the major biotic drivers of leaf litter decomposition, (ii) metaproteomics analyses of rhizosphere-inhabiting microbes, and (iii) global approaches to study bioremediation processes. The review will be closed by a brief outlook on future developments and some concluding remarks, which should assist the reader to develop successful concepts for soil and litter metaproteomics studies.
Collapse
Affiliation(s)
- Dörte Becher
- Ernst-Moritz-Arndt-University of Greifswald, Institute of Microbiology, Greifswald, Germany
| | | | | | | |
Collapse
|
8
|
Kertész-Farkas A, Reiz B, Vera R, Myers MP, Pongor S. PTMTreeSearch: a novel two-stage tree-search algorithm with pruning rules for the identification of post-translational modification of proteins in MS/MS spectra. ACTA ACUST UNITED AC 2013; 30:234-41. [PMID: 24215026 DOI: 10.1093/bioinformatics/btt642] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
MOTIVATION Tandem mass spectrometry has become a standard tool for identifying post-translational modifications (PTMs) of proteins. Algorithmic searches for PTMs from tandem mass spectrum data (MS/MS) tend to be hampered by noisy data as well as by a combinatorial explosion of search space. This leads to high uncertainty and long search-execution times. RESULTS To address this issue, we present PTMTreeSearch, a new algorithm that uses a large database of known PTMs to identify PTMs from MS/MS data. For a given peptide sequence, PTMTreeSearch builds a computational tree wherein each path from the root to the leaves is labeled with the amino acids of a peptide sequence. Branches then represent PTMs. Various empirical tree pruning rules have been designed to decrease the search-execution time by eliminating biologically unlikely solutions. PTMTreeSearch first identifies a relatively small set of high confidence PTM types, and in a second stage, performs a more exhaustive search on this restricted set using relaxed search parameter settings. An analysis of experimental data shows that using the same criteria for false discovery, PTMTreeSearch annotates more peptides than the current state-of-the-art methods and PTM identification algorithms, and achieves this at roughly the same execution time. PTMTreeSearch is implemented as a plugable scoring function in the X!Tandem search engine. AVAILABILITY The source code of PTMTreeSearch and a demo server application can be found at http://net.icgeb.org/ptmtreesearch
Collapse
Affiliation(s)
- Attila Kertész-Farkas
- Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, AREA Research Park, 99 Padriciano, Trieste, Italy, 34149, Institute of Biophysics, Biological Research Centre, Temesvari krt. 62, H-6727 Szeged, Hungary, Protein Networks Group, International Centre for Genetic Engineering and Biotechnology, AREA Research Park, Padriciano 99, 34149 Trieste, Italy and Faculty of Information Technology, Pázmány Péter Catholic University, Práter u. 50/a, H-1083 Budapest, Hungary
| | | | | | | | | |
Collapse
|
9
|
Guingab-Cagmat JD, Cagmat EB, Hayes RL, Anagli J. Integration of proteomics, bioinformatics, and systems biology in traumatic brain injury biomarker discovery. Front Neurol 2013; 4:61. [PMID: 23750150 PMCID: PMC3668328 DOI: 10.3389/fneur.2013.00061] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 05/12/2013] [Indexed: 01/18/2023] Open
Abstract
Traumatic brain injury (TBI) is a major medical crisis without any FDA-approved pharmacological therapies that have been demonstrated to improve functional outcomes. It has been argued that discovery of disease-relevant biomarkers might help to guide successful clinical trials for TBI. Major advances in mass spectrometry (MS) have revolutionized the field of proteomic biomarker discovery and facilitated the identification of several candidate markers that are being further evaluated for their efficacy as TBI biomarkers. However, several hurdles have to be overcome even during the discovery phase which is only the first step in the long process of biomarker development. The high-throughput nature of MS-based proteomic experiments generates a massive amount of mass spectral data presenting great challenges in downstream interpretation. Currently, different bioinformatics platforms are available for functional analysis and data mining of MS-generated proteomic data. These tools provide a way to convert data sets to biologically interpretable results and functional outcomes. A strategy that has promise in advancing biomarker development involves the triad of proteomics, bioinformatics, and systems biology. In this review, a brief overview of how bioinformatics and systems biology tools analyze, transform, and interpret complex MS datasets into biologically relevant results is discussed. In addition, challenges and limitations of proteomics, bioinformatics, and systems biology in TBI biomarker discovery are presented. A brief survey of researches that utilized these three overlapping disciplines in TBI biomarker discovery is also presented. Finally, examples of TBI biomarkers and their applications are discussed.
Collapse
|
10
|
Alves G, Yu YK. Improving peptide identification sensitivity in shotgun proteomics by stratification of search space. J Proteome Res 2013; 12:2571-81. [PMID: 23668635 DOI: 10.1021/pr301139y] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Because of its high specificity, trypsin is the enzyme of choice in shotgun proteomics. Nonetheless, several publications do report the identification of semitryptic and nontryptic peptides. Many of these peptides are thought to be signaling peptides or to have formed during sample preparation. It is known that only a small fraction of tandem mass spectra from a trypsin-digested protein mixture can be confidently matched to tryptic peptides. If other possibilities such as post-translational modifications and single-amino acid polymorphisms are ignored, this suggests that many unidentified spectra originate from semitryptic and nontryptic peptides. To include them in database searches, however, may not improve overall peptide identification because of the possible sensitivity reduction from search space expansion. To circumvent this issue for E-value-based search methods, we have designed a scheme that categorizes qualified peptides (i.e., peptides whose differences in molecular weight from the parent ion are within a specified error tolerance) into three tiers: tryptic, semitryptic, and nontryptic. This classification allows peptides that belong to different tiers to have different Bonferroni correction factors. Our results show that this scheme can significantly improve retrieval performance compared to those of search strategies that assign equal Bonferroni correction factors to all qualified peptides.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, United States
| | | |
Collapse
|
11
|
Gonzalez-Galarza FF, Qi D, Fan J, Bessant C, Jones AR. A tutorial for software development in quantitative proteomics using PSI standard formats. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:88-97. [PMID: 23584085 PMCID: PMC4008935 DOI: 10.1016/j.bbapap.2013.04.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 02/22/2013] [Accepted: 04/05/2013] [Indexed: 01/21/2023]
Abstract
The Human Proteome Organisation — Proteomics Standards Initiative (HUPO-PSI) has been working for ten years on the development of standardised formats that facilitate data sharing and public database deposition. In this article, we review three HUPO-PSI data standards — mzML, mzIdentML and mzQuantML, which can be used to design a complete quantitative analysis pipeline in mass spectrometry (MS)-based proteomics. In this tutorial, we briefly describe the content of each data model, sufficient for bioinformaticians to devise proteomics software. We also provide guidance on the use of recently released application programming interfaces (APIs) developed in Java for each of these standards, which makes it straightforward to read and write files of any size. We have produced a set of example Java classes and a basic graphical user interface to demonstrate how to use the most important parts of the PSI standards, available from http://code.google.com/p/psi-standard-formats-tutorial. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. A tutorial to help software developers use PSI standard formats. A description of programming interfaces and tools available. Code snippets and a basic graphical interface to assist understanding.
Collapse
|
12
|
Reiz B, Kertész-Farkas A, Pongor S, Myers MP. Chemical rule-based filtering of MS/MS spectra. Bioinformatics 2013; 29:925-32. [DOI: 10.1093/bioinformatics/btt061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
13
|
Niedermeyer THJ, Strohalm M. mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PLoS One 2012; 7:e44913. [PMID: 23028676 PMCID: PMC3441486 DOI: 10.1371/journal.pone.0044913] [Citation(s) in RCA: 202] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/09/2012] [Indexed: 11/19/2022] Open
Abstract
Natural or synthetic cyclic peptides often possess pronounced bioactivity. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and the complex fragmentation patterns observed. Even though several software tools for cyclic peptide tandem mass spectra annotation have been published, these tools are still unable to annotate a majority of the signals observed in experimentally obtained mass spectra. They are thus not suitable for extensive mass spectrometric characterization of these compounds. This lack of advanced and user-friendly software tools has motivated us to extend the fragmentation module of a freely available open-source software, mMass (http://www.mmass.org), to allow for cyclic peptide tandem mass spectra annotation and interpretation. The resulting software has been tested on several cyanobacterial and other naturally occurring peptides. It has been found to be superior to other currently available tools concerning both usability and annotation extensiveness. Thus it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.
Collapse
|
14
|
Wright P, Noirel J, Ow SY, Fazeli A. A review of current proteomics technologies with a survey on their widespread use in reproductive biology investigations. Theriogenology 2012; 77:738-765.e52. [DOI: 10.1016/j.theriogenology.2011.11.012] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2011] [Revised: 11/08/2011] [Accepted: 11/11/2011] [Indexed: 12/27/2022]
|
15
|
Cannon WR, Rawlins MM, Baxter DJ, Callister SJ, Lipton MS, Bryant DA. Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. J Proteome Res 2011; 10:2306-17. [PMID: 21391700 DOI: 10.1021/pr101130b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57-147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.
Collapse
Affiliation(s)
- William R Cannon
- Computational Biology and Bioinformatics Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.
| | | | | | | | | | | |
Collapse
|
16
|
Herr MM, Fries KM, Upton LG, Edsberg LE. Potential biomarkers of temporomandibular joint disorders. J Oral Maxillofac Surg 2011; 69:41-7. [PMID: 21163381 DOI: 10.1016/j.joms.2010.05.013] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2009] [Revised: 03/09/2010] [Accepted: 05/18/2010] [Indexed: 12/27/2022]
Abstract
PURPOSE The purpose of this study was to identify protein markers present in subjects with temporomandibular joint disorders (TMDs) and clicking compared with the levels in controls. MATERIALS AND METHODS This was a pilot case-control study, and we report the preliminary results. Samples of joint aspirate collected from patients with TMDs and controls who had undergone surgery for a problem other than TMDs were analyzed using isobaric tags for relative and absolute quantitation (iTRAQ) and biotin-labeled-based protein arrays. The data obtained from these techniques were used to identify the proteins of interest, which were then quantitated using enzyme-linked immunosorbent assay (ELISA). The patient samples studied included joint aspirate collected clinically from the controls and patients and included samples from both the right and the left sides of each patient with a TMD. RESULTS The 8 TMJ aspirate samples from 6 subjects included 5 aspirate samples from 4 patients and 3 from 2 controls. The greatest standardized protein concentration of endocrine gland-derived vascular endothelial growth factor/prokineticin-1 (EG-VEGF/PK1) and D6 was found in both joints of the controls compared with the levels from the joints of the patients. With 1 exception, the standardized protein concentration was significantly lower in the patients than in the controls. The lower levels of EG-VEGF/PK1 and D6 in the patients compared with the controls suggest that these cytokines might be possible biomarkers for TMDs. CONCLUSION In the present pilot study, greater levels of EG-VEGF/PK1 and D6 were found in the controls than in the patients with TMDs. Proteomic analysis of the proteins present in the diseased joints compared with those in the controls might help to identify proteins present when pain or degeneration of the joint occurs. The proteomic information might be useful in the development of future therapies.
Collapse
Affiliation(s)
- Megan M Herr
- Natural Sciences Department, Daemen College, Amherst, NY, USA.
| | | | | | | |
Collapse
|
17
|
Webb-Robertson BJM, Cannon WR, Oehmen CS, Shah AR, Gurumoorthi V, Lipton MS, Waters KM. A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics. Bioinformatics 2010; 26:1677-83. [PMID: 20568665 DOI: 10.1093/bioinformatics/btq251] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). RESULTS We present a support vector machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of approximately 0.83 with an SD of <0.038. Furthermore, we demonstrate that these results are achievable with a small set of 13 variables and can achieve high proteome coverage. AVAILABILITY http://omics.pnl.gov/software/STEPP.php CONTACT bj@pnl.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
18
|
Abstract
The peptide identification problem lies at the heart of modern proteomic methodology, from which the presence of a particular protein or proteins in a sample may be inferred. The challenge is to find the most likely amino acid sequence, which corresponds to each tandem mass spectrum that has been collected, and produce some kind of score and associated statistical measure that the putative identification is correct. This approach assumes that the peptide (and parent protein) sequence in question is known and is present in the database which is to be searched, as opposed to de novo methods, which seek to identify the peptide ab initio. This chapter will provide an overview of the methods that common, popular software tools employ to search protein sequence databases to provide the non-expert reader with sufficient background to appreciate the choices they can make. This will cover the approaches used to compare experimental and theoretical spectra and some of the methods used to validate and provide higher confidence in the assignments.
Collapse
Affiliation(s)
- Simon J Hubbard
- Faculty of Life Sciences, University of Manchester, Michael Smith Building, Manchester, UK.
| |
Collapse
|
19
|
Abstract
Toxoplasma gondii is a ubiquitous, Apicomplexan parasite that, in humans, can cause several clinical syndromes, including encephalitis, chorioretinitis and congenital infection. T. gondii was described a little over 100 years ago in the tissues of the gundi (Ctenodoactylus gundi). There are a large number of applicable experimental techniques available for this pathogen and it has become a model organism for the study of intracellular pathogens. With the completion of the genomes for a type I (GT-1), type II (ME49) and type III (VEG) strains, proteomic studies on this organism have been greatly facilitated. Several subcellular proteomic studies have been completed on this pathogen. These studies have helped elucidate specialized invasion organelles and their composition, as well as proteins associated with the cytoskeleton. Global proteomic studies are leading to improved strategies for genome annotation in this organism and an improved understanding of protein regulation in this pathogen. Web-based resources, such as EPIC-DB and ToxoDB, provide proteomic data and support for studies on T. gondii. This review will summarize the current status of proteomic research on T. gondii.
Collapse
Affiliation(s)
- Louis M Weiss
- Division of Infectious Diseases, Department of Medicine, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Forchheimer 504, 1300 Morris Park Avenue, Bronx, NY 10461, USA.
| | | | | | | |
Collapse
|
20
|
Nebrich G, Herrmann M, Hartl D, Diedrich M, Kreitler T, Wierling C, Klose J, Giavalisco P, Zabel C, Mao L. PROTEOMER: A workflow-optimized laboratory information management system for 2-D electrophoresis-centered proteomics. Proteomics 2009; 9:1795-808. [DOI: 10.1002/pmic.200800522] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
21
|
Webb-Robertson BJM, Cannon WR, Oehmen CS, Shah AR, Gurumoorthi V, Lipton MS, Waters KM. A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics. ACTA ACUST UNITED AC 2008; 24:1503-9. [PMID: 18453551 DOI: 10.1093/bioinformatics/btn218] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic). RESULTS We present a support vector machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of 0.8 with a SD of <0.025. Furthermore, we demonstrate that these results are achievable with a small set of 12 variables and can achieve high proteome coverage. AVAILABILITY http://omics.pnl.gov/software/STEPP.php. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|