1
|
Fuchs S, Engelmann S. Small proteins in bacteria - Big challenges in prediction and identification. Proteomics 2023; 23:e2200421. [PMID: 37609810 DOI: 10.1002/pmic.202200421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 08/03/2023] [Accepted: 08/10/2023] [Indexed: 08/24/2023]
Abstract
Proteins with up to 100 amino acids have been largely overlooked due to the challenges associated with predicting and identifying them using traditional methods. Recent advances in bioinformatics and machine learning, DNA sequencing, RNA and Ribo-seq technologies, and mass spectrometry (MS) have greatly facilitated the detection and characterisation of these elusive proteins in recent years. This has revealed their crucial role in various cellular processes including regulation, signalling and transport, as toxins and as folding helpers for protein complexes. Consequently, the systematic identification and characterisation of these proteins in bacteria have emerged as a prominent field of interest within the microbial research community. This review provides an overview of different strategies for predicting and identifying these proteins on a large scale, leveraging the power of these advanced technologies. Furthermore, the review offers insights into the future developments that may be expected in this field.
Collapse
Affiliation(s)
- Stephan Fuchs
- Genome Competence Center (MF1), Department MFI, Robert-Koch-Institut, Berlin, Germany
| | - Susanne Engelmann
- Institute for Microbiology, Technische Universität Braunschweig, Braunschweig, Germany
- Microbial Proteomics, Helmholtzzentrum für Infektionsforschung GmbH, Braunschweig, Germany
| |
Collapse
|
2
|
Abdul-Khalek N, Wimmer R, Overgaard MT, Gregersen Echers S. Insight on physicochemical properties governing peptide MS1 response in HPLC-ESI-MS/MS: A deep learning approach. Comput Struct Biotechnol J 2023; 21:3715-3727. [PMID: 37560124 PMCID: PMC10407266 DOI: 10.1016/j.csbj.2023.07.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 08/11/2023] Open
Abstract
Accurate and absolute quantification of peptides in complex mixtures using quantitative mass spectrometry (MS)-based methods requires foreground knowledge and isotopically labeled standards, thereby increasing analytical expenses, time consumption, and labor, thus limiting the number of peptides that can be accurately quantified. This originates from differential ionization efficiency between peptides and thus, understanding the physicochemical properties that influence the ionization and response in MS analysis is essential for developing less restrictive label-free quantitative methods. Here, we used equimolar peptide pool repository data to develop a deep learning model capable of identifying amino acids influencing the MS1 response. By using an encoder-decoder with an attention mechanism and correlating attention weights with amino acid physicochemical properties, we obtain insight on properties governing the peptide-level MS1 response within the datasets. While the problem cannot be described by one single set of amino acids and properties, distinct patterns were reproducibly obtained. Properties are grouped in three main categories related to peptide hydrophobicity, charge, and structural propensities. Moreover, our model can predict MS1 intensity output under defined conditions based solely on peptide sequence input. Using a refined training dataset, the model predicted log-transformed peptide MS1 intensities with an average error of 9.7 ± 0.5% based on 5-fold cross validation, and outperformed random forest and ridge regression models on both log-transformed and real scale data. This work demonstrates how deep learning can facilitate identification of physicochemical properties influencing peptide MS1 responses, but also illustrates how sequence-based response prediction and label-free peptide-level quantification may impact future workflows within quantitative proteomics.
Collapse
Affiliation(s)
- Naim Abdul-Khalek
- Department of Chemistry and Bioscience, Aalborg University, Aalborg 9220, Denmark
| | - Reinhard Wimmer
- Department of Chemistry and Bioscience, Aalborg University, Aalborg 9220, Denmark
| | | | | |
Collapse
|
3
|
Pauletti BA, Granato DC, M Carnielli C, Câmara GA, Normando AGC, Telles GP, Leme AFP. Typic: A Practical and Robust Tool to Rank Proteotypic Peptides for Targeted Proteomics. J Proteome Res 2023; 22:539-545. [PMID: 36480281 DOI: 10.1021/acs.jproteome.2c00585] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The selection of a suitable proteotypic peptide remains a challenge for designing a targeted quantitative proteomics assay. Although the criteria are well-established in the literature, the selection of these peptides is often performed in a subjective and time-consuming manner. Here, we have developed a practical and semiautomated workflow implemented in an open-source program named Typic. Typic is designed to run in a command line and a graphical interface to help selecting a list of proteotypic peptides for targeted quantitation. The tool combines the input data and downloads additional data from public repositories to produce a file per protein as output. Each output file includes relevant information to the selection of proteotypic peptides organized in a table, a colored ranking of peptides according to their potential value as targets for quantitation and auxiliary plots to assist users in the task of proteotypic peptides selection. Taken together, Typic leads to a practical and straightforward data extraction from multiple data sets, allowing the identification of most suitable proteotypic peptides based on established criteria, in an unbiased and standardized manner, ultimately leading to a more robust targeted proteomics assay.
Collapse
Affiliation(s)
- Bianca A Pauletti
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Daniela C Granato
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Carolina M Carnielli
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Guilherme A Câmara
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Ana Gabriela C Normando
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| | - Guilherme P Telles
- Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas, 13083-852 São Paulo, Brazil
| | - Adriana F Paes Leme
- Laboratório de Espectrometria de Massas, Laboratório Nacional de Biociências (LNBio), Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas, 13083-970 São Paulo, Brazil
| |
Collapse
|
4
|
Rusilowicz M, Newman DW, Creamer DR, Johnson J, Adair K, Harman VM, Grant CM, Beynon RJ, Hubbard SJ. AlacatDesigner─Computational Design of Peptide Concatamers for Protein Quantitation. J Proteome Res 2023; 22:594-604. [PMID: 36688735 PMCID: PMC9903321 DOI: 10.1021/acs.jproteome.2c00608] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Protein quantitation via mass spectrometry relies on peptide proxies for the parent protein from which abundances are estimated. Owing to the variability in signal from individual peptides, accurate absolute quantitation usually relies on the addition of an external standard. Typically, this involves stable isotope-labeled peptides, delivered singly or as a concatenated recombinant protein. Consequently, the selection of the most appropriate surrogate peptides and the attendant design in recombinant proteins termed QconCATs are challenges for proteome science. QconCATs can now be built in a "a-la-carte" assembly method using synthetic biology: ALACATs. To assist their design, we present "AlacatDesigner", a tool that supports the peptide selection for recombinant protein standards based on the user's target protein. The user-customizable tool considers existing databases, occurrence in the literature, potential post-translational modifications, predicted miscleavage, predicted divergence of the peptide and protein quantifications, and ionization potential within the mass spectrometer. We show that peptide selections are enriched for good proteotypic and quantotypic candidates compared to empirical data. The software is freely available to use either via a web interface AlacatDesigner, downloaded as a Desktop application or imported as a Python package for the command line interface or in scripts.
Collapse
Affiliation(s)
- Martin Rusilowicz
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - David W. Newman
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - Declan R. Creamer
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - James Johnson
- GeneMill,
Institute of Systems Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Kareena Adair
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Victoria M. Harman
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Chris M. Grant
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - Robert J. Beynon
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Simon J. Hubbard
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom,
| |
Collapse
|
5
|
PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability. Int J Mol Sci 2022; 23:ijms232012385. [PMID: 36293242 PMCID: PMC9604182 DOI: 10.3390/ijms232012385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 12/03/2022] Open
Abstract
Peptide detectability is defined as the probability of identifying a peptide from a mixture of standard samples, which is a key step in protein identification and analysis. Exploring effective methods for predicting peptide detectability is helpful for disease treatment and clinical research. However, most existing computational methods for predicting peptide detectability rely on a single information. With the increasing complexity of feature representation, it is necessary to explore the influence of multivariate information on peptide detectability. Thus, we propose an ensemble deep learning method, PD-BertEDL. Bidirectional encoder representations from transformers (BERT) is introduced to capture the context information of peptides. Context information, sequence information, and physicochemical information of peptides were combined to construct the multivariate feature space of peptides. We use different deep learning methods to capture the high-quality features of different categories of peptides information and use the average fusion strategy to integrate three model prediction results to solve the heterogeneity problem and to enhance the robustness and adaptability of the model. The experimental results show that PD-BertEDL is superior to the existing prediction methods, which can effectively predict peptide detectability and provide strong support for protein identification and quantitative analysis, as well as disease treatment.
Collapse
|
6
|
Bovine alpha-lactalbumin particulates for controlled delivery: Impact of dietary fibers on stability, digestibility, and gastro-intestinal release of capsaicin. Food Hydrocoll 2022. [DOI: 10.1016/j.foodhyd.2022.107536] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
7
|
Yang Y, Lin L, Qiao L. Deep learning approaches for data-independent acquisition proteomics. Expert Rev Proteomics 2021; 18:1031-1043. [PMID: 34918987 DOI: 10.1080/14789450.2021.2020654] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
INTRODUCTION Data-independent acquisition (DIA) is an emerging technology for large-scale proteomic studies. DIA data analysis methods are evolving rapidly, and deep learning has cut a conspicuous figure in this field. AREAS COVERED This review discusses and provides an overview of the deep learning methods that are used for DIA data analysis, including spectral library prediction, feature scoring, and statistical control in peptide-centric analysis, as well as de novo peptide sequencing. Literature searches were performed for articles, including preprints, up to December 2021 from PubMed, Scopus, and Web of Science databases. EXPERT OPINION While spectral library prediction has broken through the limitation on proteome coverage of experimental libraries, the statistical burden due to the large query space is the remaining challenge of utilizing proteome-wide predicted libraries. Analysis of post-translational modifications is another promising direction of deep learning-based DIA methods.
Collapse
Affiliation(s)
- Yi Yang
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Ling Lin
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Liang Qiao
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| |
Collapse
|
8
|
Bielajew BJ, Hu JC, Athanasiou KA. Methodology to Quantify Collagen Subtypes and Crosslinks: Application in Minipig Cartilages. Cartilage 2021; 13:1742S-1754S. [PMID: 34823380 PMCID: PMC8804780 DOI: 10.1177/19476035211060508] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 10/26/2021] [Accepted: 10/27/2021] [Indexed: 01/19/2023] Open
Abstract
INTRODUCTION This study develops assays to quantify collagen subtypes and crosslinks with liquid chromatography-mass spectrometry (LC-MS) and characterizes the cartilages in the Yucatan minipig. METHODS For collagen subtyping, liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis was performed on tissues digested in trypsin. For collagen crosslinks, LC-MS analysis was performed on hydrolysates. Samples were also examined histologically and with bottom-up proteomics. Ten cartilages (femoral condyle, femoral head, facet joint, floating rib, true rib, auricular cartilage, annulus fibrosus, 2 meniscus locations, and temporomandibular joint disc) were analyzed. RESULTS The collagen subtyping assay quantified collagen types I and II. The collagen crosslinks assay quantified mature and immature crosslinks. Collagen subtyping revealed that collagen type I predominates in fibrocartilages and collagen type II in hyaline cartilages, as expected. Elastic cartilage and fibrocartilages had more mature collagen crosslink profiles than hyaline cartilages. Bottom-up proteomics revealed a spectrum of ratios between collagen types I and II, and quantified 42 proteins, including 24 collagen alpha-chains and 12 minor collagen types. DISCUSSION The novel assays developed in this work are sensitive, inexpensive, and use a low operator time relative to other collagen analysis methods. Unlike the current collagen assays, these assays quantify collagen subtypes and crosslinks without an antibody-based approach or lengthy chromatography. They apply to any collagenous tissue, with broad applications in tissue characterization and tissue engineering. For example, a novel finding of this work was the presence of a large quantity of collagen type III in the white-white knee meniscus and a spectrum of hyaline and fibrous cartilages.
Collapse
Affiliation(s)
- Benjamin J. Bielajew
- Department of Biomedical Engineering,
University of California, Irvine, Irvine, CA, USA
| | - Jerry C. Hu
- Department of Biomedical Engineering,
University of California, Irvine, Irvine, CA, USA
| | | |
Collapse
|
9
|
Ru X, Ye X, Sakurai T, Zou Q. Application of learning to rank in bioinformatics tasks. Brief Bioinform 2021; 22:6102666. [PMID: 33454758 DOI: 10.1093/bib/bbaa394] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 11/09/2020] [Accepted: 11/24/2020] [Indexed: 12/17/2022] Open
Abstract
Over the past decades, learning to rank (LTR) algorithms have been gradually applied to bioinformatics. Such methods have shown significant advantages in multiple research tasks in this field. Therefore, it is necessary to summarize and discuss the application of these algorithms so that these algorithms are convenient and contribute to bioinformatics. In this paper, the characteristics of LTR algorithms and their strengths over other types of algorithms are analyzed based on the application of multiple perspectives in bioinformatics. Finally, the paper further discusses the shortcomings of the LTR algorithms, the methods and means to better use the algorithms and some open problems that currently exist.
Collapse
Affiliation(s)
| | - Xiucai Ye
- Department of Computer Science and Center for Artificial Intelligence Research (C-AIR), University of Tsukuba
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| |
Collapse
|
10
|
Arbeitman CR, Auge G, Blaustein M, Bredeston L, Corapi ES, Craig PO, Cossio LA, Dain L, D’Alessio C, Elias F, Fernández NB, Gándola YB, Gasulla J, Gorojovsky N, Gudesblat GE, Herrera MG, Ibañez LI, Idrovo T, Rando MI, Kamenetzky L, Nadra AD, Noseda DG, Paván CH, Pavan MF, Pignataro MF, Roman E, Ruberto LAM, Rubinstein N, Santos J, Velazquez F, Zelada AM. Structural and functional comparison of SARS-CoV-2-spike receptor binding domain produced in Pichia pastoris and mammalian cells. Sci Rep 2020; 10:21779. [PMID: 33311634 PMCID: PMC7732851 DOI: 10.1038/s41598-020-78711-6] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 11/25/2020] [Indexed: 12/13/2022] Open
Abstract
The yeast Pichia pastoris is a cost-effective and easily scalable system for recombinant protein production. In this work we compared the conformation of the receptor binding domain (RBD) from severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) Spike protein expressed in P. pastoris and in the well established HEK-293T mammalian cell system. RBD obtained from both yeast and mammalian cells was properly folded, as indicated by UV-absorption, circular dichroism and tryptophan fluorescence. They also had similar stability, as indicated by temperature-induced unfolding (observed Tm were 50 °C and 52 °C for RBD produced in P. pastoris and HEK-293T cells, respectively). Moreover, the stability of both variants was similarly reduced when the ionic strength was increased, in agreement with a computational analysis predicting that a set of ionic interactions may stabilize RBD structure. Further characterization by high-performance liquid chromatography, size-exclusion chromatography and mass spectrometry revealed a higher heterogeneity of RBD expressed in P. pastoris relative to that produced in HEK-293T cells, which disappeared after enzymatic removal of glycans. The production of RBD in P. pastoris was scaled-up in a bioreactor, with yields above 45 mg/L of 90% pure protein, thus potentially allowing large scale immunizations to produce neutralizing antibodies, as well as the large scale production of serological tests for SARS-CoV-2.
Collapse
|
11
|
Purple: A Computational Workflow for Strategic Selection of Peptides for Viral Diagnostics Using MS-Based Targeted Proteomics. Viruses 2019; 11:v11060536. [PMID: 31181768 PMCID: PMC6630961 DOI: 10.3390/v11060536] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 06/03/2019] [Accepted: 06/04/2019] [Indexed: 01/26/2023] Open
Abstract
Emerging virus diseases present a global threat to public health. To detect viral pathogens in time-critical scenarios, accurate and fast diagnostic assays are required. Such assays can now be established using mass spectrometry-based targeted proteomics, by which viral proteins can be rapidly detected from complex samples down to the strain-level with high sensitivity and reproducibility. Developing such targeted assays involves tedious steps of peptide candidate selection, peptide synthesis, and assay optimization. Peptide selection requires extensive preprocessing by comparing candidate peptides against a large search space of background proteins. Here we present Purple (Picking unique relevant peptides for viral experiments), a software tool for selecting target-specific peptide candidates directly from given proteome sequence data. It comes with an intuitive graphical user interface, various parameter options and a threshold-based filtering strategy for homologous sequences. Purple enables peptide candidate selection across various taxonomic levels and filtering against backgrounds of varying complexity. Its functionality is demonstrated using data from different virus species and strains. Our software enables to build taxon-specific targeted assays and paves the way to time-efficient and robust viral diagnostics using targeted proteomics.
Collapse
|
12
|
Gao Z, Chang C, Yang J, Zhu Y, Fu Y. AP3: An Advanced Proteotypic Peptide Predictor for Targeted Proteomics by Incorporating Peptide Digestibility. Anal Chem 2019; 91:8705-8711. [DOI: 10.1021/acs.analchem.9b02520] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Zhiqiang Gao
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Cheng Chang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Jinghan Yang
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
- Anhui Medical University, Hefei 230032, China
| | - Yan Fu
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
13
|
Zimmer D, Schneider K, Sommer F, Schroda M, Mühlhaus T. Artificial Intelligence Understands Peptide Observability and Assists With Absolute Protein Quantification. FRONTIERS IN PLANT SCIENCE 2018; 9:1559. [PMID: 30483279 PMCID: PMC6242780 DOI: 10.3389/fpls.2018.01559] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 10/04/2018] [Indexed: 05/20/2023]
Abstract
Targeted mass spectrometry has become the method of choice to gain absolute quantification information of high quality, which is essential for a quantitative understanding of biological systems. However, the design of absolute protein quantification assays remains challenging due to variations in peptide observability and incomplete knowledge about factors influencing peptide detectability. Here, we present a deep learning algorithm for peptide detectability prediction, d::pPop, which allows the informed selection of synthetic proteotypic peptides for the successful design of targeted proteomics quantification assays. The deep neural network is able to learn a regression model that relates the physicochemical properties of a peptide to its ion intensity detected by mass spectrometry. The approach makes use of experimentally detected deviations from the assumed equimolar abundance of all peptides derived from a given protein. Trained on extensive proteomics datasets, d::pPop's plant and non-plant specific models can predict the quality of proteotypic peptides for not yet experimentally identified proteins. Interrogating the deep neural network after learning from ~76,000 peptides per model organism allows to investigate the impact of different physicochemical properties on the observability of a peptide, thus providing insights into peptide observability as a multifaceted process. Empirical evaluation with rank accuracy metrics showed that our prediction approach outperforms existing algorithms. We circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the need for selecting the top most promising peptides for targeting a protein of interest. Further, we used an artificial QconCAT protein to experimentally validate the observability prediction. Our proteotypic peptide prediction approach not only facilitates the design of absolute protein quantification assays via a user-friendly web interface but also enables the selection of proteotypic peptides for not yet observed proteins, hence rendering the tool especially useful for plant research.
Collapse
Affiliation(s)
- David Zimmer
- Computational Systems BiologyTU Kaiserslautern, Kaiserslautern, Germany
| | - Kevin Schneider
- Computational Systems BiologyTU Kaiserslautern, Kaiserslautern, Germany
| | - Frederik Sommer
- Molekulare Biotechnologie & SystembiologieTU Kaiserslautern, Kaiserslautern, Germany
| | - Michael Schroda
- Molekulare Biotechnologie & SystembiologieTU Kaiserslautern, Kaiserslautern, Germany
| | - Timo Mühlhaus
- Computational Systems BiologyTU Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
14
|
Manes NP, Nita-Lazar A. Application of targeted mass spectrometry in bottom-up proteomics for systems biology research. J Proteomics 2018; 189:75-90. [PMID: 29452276 DOI: 10.1016/j.jprot.2018.02.008] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 01/25/2018] [Accepted: 02/07/2018] [Indexed: 02/08/2023]
Abstract
The enormous diversity of proteoforms produces tremendous complexity within cellular proteomes, facilitates intricate networks of molecular interactions, and constitutes a formidable analytical challenge for biomedical researchers. Currently, quantitative whole-proteome profiling often relies on non-targeted liquid chromatography-mass spectrometry (LC-MS), which samples proteoforms broadly, but can suffer from lower accuracy, sensitivity, and reproducibility compared with targeted LC-MS. Recent advances in bottom-up proteomics using targeted LC-MS have enabled previously unachievable identification and quantification of target proteins and posttranslational modifications within complex samples. Consequently, targeted LC-MS is rapidly advancing biomedical research, especially systems biology research in diverse areas that include proteogenomics, interactomics, kinomics, and biological pathway modeling. With the recent development of targeted LC-MS assays for nearly the entire human proteome, targeted LC-MS is positioned to enable quantitative proteomic profiling of unprecedented quality and accessibility to support fundamental and clinical research. Here we review recent applications of bottom-up proteomics using targeted LC-MS for systems biology research. SIGNIFICANCE: Advances in targeted proteomics are rapidly advancing systems biology research. Recent applications include systems-level investigations focused on posttranslational modifications (such as phosphoproteomics), protein conformation, protein-protein interaction, kinomics, proteogenomics, and metabolic and signaling pathways. Notably, absolute quantification of metabolic and signaling pathway proteins has enabled accurate pathway modeling and engineering. Integration of targeted proteomics with other technologies, such as RNA-seq, has facilitated diverse research such as the identification of hundreds of "missing" human proteins (genes and transcripts that appear to encode proteins but direct experimental evidence was lacking).
Collapse
Affiliation(s)
- Nathan P Manes
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Aleksandra Nita-Lazar
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
15
|
Wang JR, Huang WL, Tsai MJ, Hsu KT, Huang HL, Ho SY. ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives. Bioinformatics 2017; 33:661-668. [PMID: 28062441 DOI: 10.1093/bioinformatics/btw701] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 11/08/2016] [Indexed: 01/20/2023] Open
Abstract
Motivation Numerous ubiquitination sites remain undiscovered because of the limitations of mass spectrometry-based methods. Existing prediction methods use randomly selected non-validated sites as non-ubiquitination sites to train ubiquitination site prediction models. Results We propose an evolutionary screening algorithm (ESA) to select effective negatives among non-validated sites and an ESA-based prediction method, ESA-UbiSite, to identify human ubiquitination sites. The ESA selects non-validated sites least likely to be ubiquitination sites as training negatives. Moreover, the ESA and ESA-UbiSite use a set of well-selected physicochemical properties together with a support vector machine for accurate prediction. Experimental results show that ESA-UbiSite with effective negatives achieved 0.92 test accuracy and a Matthews's correlation coefficient of 0.48, better than existing prediction methods. The ESA increased ESA-UbiSite's test accuracy from 0.75 to 0.92 and can improve other post-translational modification site prediction methods. Availability and Implementation An ESA-UbiSite-based web server has been established at http://iclab.life.nctu.edu.tw/iclab_webtools/ESAUbiSite/ . Contact syho@mail.nctu.edu.tw. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jyun-Rong Wang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Wen-Lin Huang
- Department and Institute of Industrial Engineering and Management, Minghsin University of Science and Technology, Hsinchu 300, Taiwan
| | - Ming-Ju Tsai
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Kai-Ti Hsu
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Hui-Ling Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan.,Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan.,Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|
16
|
Omasits U, Varadarajan AR, Schmid M, Goetze S, Melidis D, Bourqui M, Nikolayeva O, Québatte M, Patrignani A, Dehio C, Frey JE, Robinson MD, Wollscheid B, Ahrens CH. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics. Genome Res 2017; 27:2083-2095. [PMID: 29141959 PMCID: PMC5741054 DOI: 10.1101/gr.218255.116] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 10/25/2017] [Indexed: 12/18/2022]
Abstract
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
Collapse
Affiliation(s)
- Ulrich Omasits
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Adithi R Varadarajan
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland.,Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Michael Schmid
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Sandra Goetze
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Damianos Melidis
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Marc Bourqui
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Olga Nikolayeva
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | | | - Andrea Patrignani
- Functional Genomics Center Zurich, ETH & UZH Zurich, CH-8057 Zurich, Switzerland
| | | | - Juerg E Frey
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| | - Mark D Robinson
- Institute for Molecular Life Sciences & SIB Swiss Institute of Bioinformatics, University of Zurich, CH-8057 Zurich, Switzerland
| | - Bernd Wollscheid
- Department of Health Sciences and Technology, Institute of Molecular Systems Biology, Swiss Federal Institute of Technology Zurich, CH-8093 Zurich, Switzerland
| | - Christian H Ahrens
- Agroscope, Research Group Molecular Diagnostics, Genomics and Bioinformatics & SIB Swiss Institute of Bioinformatics, CH-8820 Wädenswil, Switzerland
| |
Collapse
|
17
|
Yuan Q, Gao J, Wu D, Zhang S, Mamitsuka H, Zhu S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics 2017; 32:i18-i27. [PMID: 27307615 PMCID: PMC4908328 DOI: 10.1093/bioinformatics/btw244] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability:http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact:zhusf@fudan.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingjun Yuan
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Junning Gao
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Dongliang Wu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China
| | - Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan Department of Computer Science, Aalto University, Finland
| | - Shanfeng Zhu
- School of Computer Science, Fudan University, Shanghai, China Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China Centre for Computational System Biology, Fudan University, Shanghai, China
| |
Collapse
|
18
|
Matsuda F, Tomita A, Shimizu H. Prediction of Hopeless Peptides Unlikely to be Selected for Targeted Proteome Analysis. ACTA ACUST UNITED AC 2017; 6:A0056. [PMID: 28580222 PMCID: PMC5451515 DOI: 10.5702/massspectrometry.a0056] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 04/23/2017] [Indexed: 12/03/2022]
Abstract
In targeted proteomics using liquid chromatography-tandem triple quadrupole mass spectrometry (LC/MS/MS) in the selected reaction monitoring (SRM) mode, selecting the best observable or visible peptides is a key step in the development of SRM assay methods of target proteins. A direct comparison of signal intensities among all candidate peptides by brute-force LC/MS/MS analysis is a concrete approach for peptide selection. However, the analysis requires an SRM method with hundreds of transitions. This study reports on the development of a method for predicting and identifying hopeless peptides to reduce the number of candidate peptides needed for brute-force experiments. Hopeless peptides are proteotypic peptides that are unlikely to be selected for targets in SRM analysis owing to their poor ionization characteristics. Targeted proteomics data from Escherichia coli demonstrated that the relative ionization efficiency between two peptides could be predicted from sequences of two peptides, when a multivariate regression model is used. Validation of the method showed that >20% of the candidate peptides could be successfully eliminated as hopeless peptides with a false positive rate of less than 2%.
Collapse
Affiliation(s)
- Fumio Matsuda
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University.,RIKEN Center for Sustainable Resource Science
| | - Atsumi Tomita
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University
| | - Hiroshi Shimizu
- Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University
| |
Collapse
|
19
|
Casas V, Rodríguez-Asiain A, Pinto-Llorente R, Vadillo S, Carrascal M, Abian J. Brachyspira hyodysenteriae and B. pilosicoli Proteins Recognized by Sera of Challenged Pigs. Front Microbiol 2017; 8:723. [PMID: 28522991 PMCID: PMC5415613 DOI: 10.3389/fmicb.2017.00723] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 04/07/2017] [Indexed: 11/13/2022] Open
Abstract
The spirochetes Brachyspira hyodysenteriae and B. pilosicoli are pig intestinal pathogens that are the causative agents of swine dysentery (SD) and porcine intestinal spirochaetosis (PIS), respectively. Although some inactivated bacterin and recombinant vaccines have been explored as prophylactic treatments against these species, no effective vaccine is yet available. Immunoproteomics approaches hold the potential for the identification of new, suitable candidates for subunit vaccines against SD and PIS. These strategies take into account the gene products actually expressed and present in the cells, and thus susceptible of being targets of immune recognition. In this context, we have analyzed the immunogenic pattern of two B. pilosicoli porcine isolates (the Spanish farm isolate OLA9 and the commercial P43/6/78 strain) and one B. hyodysenteriae isolate (the Spanish farm V1). The proteins from the Brachyspira lysates were fractionated by preparative isoelectric focusing, and the fractions were analyzed by Western blot with hyperimmune sera from challenged pigs. Of the 28 challenge-specific immunoreactive bands detected, 21 were identified as single proteins by MS, while the other 7 were shown to contain several major proteins. None of these proteins were detected in the control immunoreactive bands. The proteins identified included 11 from B. hyodysenteriae and 28 from the two B. pilosicoli strains. Eight proteins were common to the B. pilosicoli strains (i.e., elongation factor G, aspartyl-tRNA synthase, biotin lipoyl, TmpB outer membrane protein, flagellar protein FlaA, enolase, PEPCK, and VspD), and enolase and PEPCK were common to both species. Many of the identified proteins were flagellar proteins or predicted to be located on the cell surface and some of them had been previously described as antigenic or as bacterial virulence factors. Here we report on the identification and semiquantitative data of these immunoreactive proteins which constitute a unique antigen collection from these bacteria.
Collapse
Affiliation(s)
- Vanessa Casas
- CSIC/UAB Proteomics Laboratory, IIBB-CSIC, IDIBAPSBarcelona, Spain.,Faculty of Medicine, Autonomous University of BarcelonaBarcelona, Spain
| | | | | | - Santiago Vadillo
- Departamento Sanidad Animal, Facultad de Veterinaria, Universidad de ExtremaduraCáceres, Spain
| | | | - Joaquin Abian
- CSIC/UAB Proteomics Laboratory, IIBB-CSIC, IDIBAPSBarcelona, Spain.,Faculty of Medicine, Autonomous University of BarcelonaBarcelona, Spain
| |
Collapse
|
20
|
Otte KA, Schlötterer C. Polymorphism-aware protein databases - a prerequisite for an unbiased proteomic analysis of natural populations. Mol Ecol Resour 2017; 17:1148-1155. [DOI: 10.1111/1755-0998.12656] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Revised: 01/12/2017] [Accepted: 01/20/2017] [Indexed: 11/30/2022]
Affiliation(s)
- Kathrin A. Otte
- Institut für Populationsgenetik; Vetmeduni Vienna; Veterinärplatz 1 1210 Vienna Austria
| | - Christian Schlötterer
- Institut für Populationsgenetik; Vetmeduni Vienna; Veterinärplatz 1 1210 Vienna Austria
| |
Collapse
|
21
|
Xian F, Zi J, Wang Q, Lou X, Sun H, Lin L, Hou G, Rao W, Yin C, Wu L, Li S, Liu S. Peptide Biosynthesis with Stable Isotope Labeling from a Cell-free Expression System for Targeted Proteomics with Absolute Quantification. Mol Cell Proteomics 2016; 15:2819-28. [PMID: 27234506 DOI: 10.1074/mcp.o115.056507] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Indexed: 11/06/2022] Open
Abstract
Because of its specificity and sensitivity, targeted proteomics using mass spectrometry for multiple reaction monitoring is a powerful tool to detect and quantify pre-selected peptides from a complex background and facilitates the absolute quantification of peptides using isotope-labeled forms as internal standards. How to generate isotope-labeled peptides remains an urgent challenge for accurately quantitative targeted proteomics on a large scale. Herein, we propose that isotope-labeled peptides fused with a quantitative tag could be synthesized through an expression system in vitro, and the homemade peptides could be enriched by magnetic beads with tag-affinity and globally quantified based on the corresponding multiple reaction monitoring signals provided by the fused tag. An Escherichia coli cell-free protein expression system, protein synthesis using recombinant elements, was adopted for the synthesis of isotope-labeled peptides fused with Strep-tag. Through a series of optimizations, we enabled efficient expression of the labeled peptides such that, after Strep-Tactin affinity enrichment, the peptide yield was acceptable in scale for quantification, and the peptides could be completely digested by trypsin to release the Strep-tag for quantification. Moreover, these recombinant peptides could be employed in the same way as synthetic peptides for multiple reaction monitoring applications and are likely more economical and useful in a laboratory for the scale of targeted proteomics. As an application, we synthesized four isotope-labeled glutathione S-transferase (GST) peptides and added them to mouse sera pre-treated with GST affinity resin as internal standards. A quantitative assay of the synthesized GST peptides confirmed the absolute GST quantification in mouse sera to be measurable and reproducible.
Collapse
Affiliation(s)
- Feng Xian
- From the ‡CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China; §BGI-Shenzhen, Shenzhen, 518083, China; ¶Sino-Danish Center for Education and Research, University of the Chinese Academy of Sciences, Beijing, 100049, China
| | - Jin Zi
- §BGI-Shenzhen, Shenzhen, 518083, China
| | - Quanhui Wang
- From the ‡CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China; §BGI-Shenzhen, Shenzhen, 518083, China
| | - Xiaomin Lou
- From the ‡CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Haidan Sun
- From the ‡CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Liang Lin
- §BGI-Shenzhen, Shenzhen, 518083, China
| | - Guixue Hou
- From the ‡CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China; §BGI-Shenzhen, Shenzhen, 518083, China
| | | | | | - Lin Wu
- From the ‡CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuwei Li
- ‖Institute for Bioscience and Biotechnology Research, University of Maryland College Park, Rockville, Maryland 20850;
| | - Siqi Liu
- From the ‡CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China; §BGI-Shenzhen, Shenzhen, 518083, China; ¶Sino-Danish Center for Education and Research, University of the Chinese Academy of Sciences, Beijing, 100049, China;
| |
Collapse
|
22
|
Liu F, Koval M, Ranganathan S, Fanayan S, Hancock WS, Lundberg EK, Beavis RC, Lane L, Duek P, McQuade L, Kelleher NL, Baker MS. Systems Proteomics View of the Endogenous Human Claudin Protein Family. J Proteome Res 2016; 15:339-59. [PMID: 26680015 DOI: 10.1021/acs.jproteome.5b00769] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Claudins are the major transmembrane protein components of tight junctions in human endothelia and epithelia. Tissue-specific expression of claudin members suggests that this protein family is not only essential for sustaining the role of tight junctions in cell permeability control but also vital in organizing cell contact signaling by protein-protein interactions. How this protein family is collectively processed and regulated is key to understanding the role of junctional proteins in preserving cell identity and tissue integrity. The focus of this review is to first provide a brief overview of the functional context, on the basis of the extensive body of claudin biology research that has been thoroughly reviewed, for endogenous human claudin members and then ascertain existing and future proteomics techniques that may be applicable to systematically characterizing the chemical forms and interacting protein partners of this protein family in human. The ability to elucidate claudin-based signaling networks may provide new insight into cell development and differentiation programs that are crucial to tissue stability and manipulation.
Collapse
Affiliation(s)
| | - Michael Koval
- Department of Medicine, Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, and Department of Cell Biology, Emory University School of Medicine , 205 Whitehead Biomedical Research Building, 615 Michael Street, Atlanta, Georgia 30322, United States
| | | | | | - William S Hancock
- Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University , Boston, Massachusetts 02115, United States
| | - Emma K Lundberg
- SciLifeLab, School of Biotechnology, Royal Institute of Technology (KTH) , SE-171 21 Solna, Stockholm, Sweden
| | - Ronald C Beavis
- Department of Biochemistry and Medical Genetics, University of Manitoba , 744 Bannatyne Avenue, Winnipeg, Manitoba R3E 0W3, Canada
| | - Lydie Lane
- SIB-Swiss Institute of Bioinformatics , CMU - Rue Michel-Servet 1, 1211 Geneva, Switzerland
| | - Paula Duek
- SIB-Swiss Institute of Bioinformatics , CMU - Rue Michel-Servet 1, 1211 Geneva, Switzerland
| | | | - Neil L Kelleher
- Department of Chemistry, Department of Molecular Biosciences, and Proteomics Center of Excellence, Northwestern University , 2145 North Sheridan Road, Evanston, Illinois 60208, United States
| | | |
Collapse
|
23
|
Wang H, Shi T, Qian WJ, Liu T, Kagan J, Srivastava S, Smith RD, Rodland KD, Camp DG. The clinical impact of recent advances in LC-MS for cancer biomarker discovery and verification. Expert Rev Proteomics 2015; 13:99-114. [PMID: 26581546 DOI: 10.1586/14789450.2016.1122529] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Mass spectrometry (MS) -based proteomics has become an indispensable tool with broad applications in systems biology and biomedical research. With recent advances in liquid chromatography (LC) and MS instrumentation, LC-MS is making increasingly significant contributions to clinical applications, especially in the area of cancer biomarker discovery and verification. To overcome challenges associated with analyses of clinical samples (for example, a wide dynamic range of protein concentrations in bodily fluids and the need to perform high throughput and accurate quantification of candidate biomarker proteins), significant efforts have been devoted to improve the overall performance of LC-MS-based clinical proteomics platforms. Reviewed here are the recent advances in LC-MS and its applications in cancer biomarker discovery and quantification, along with the potentials, limitations and future perspectives.
Collapse
Affiliation(s)
- Hui Wang
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Tujin Shi
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Wei-Jun Qian
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Tao Liu
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Jacob Kagan
- b Division of Cancer Prevention , National Cancer Institute (NCI) , Rockville , MD , USA
| | - Sudhir Srivastava
- b Division of Cancer Prevention , National Cancer Institute (NCI) , Rockville , MD , USA
| | - Richard D Smith
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - Karin D Rodland
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| | - David G Camp
- a Biological Sciences Division , Pacific Northwest National Laboratory , Richland , WA , USA
| |
Collapse
|
24
|
Depke M, Michalik S, Rabe A, Surmann K, Brinkmann L, Jehmlich N, Bernhardt J, Hecker M, Wollscheid B, Sun Z, Moritz RL, Völker U, Schmidt F. A peptide resource for the analysis of Staphylococcus aureus in host-pathogen interaction studies. Proteomics 2015. [PMID: 26224020 DOI: 10.1002/pmic.201500091] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Staphylococcus aureus is an opportunistic human pathogen, which can cause life-threatening disease. Proteome analyses of the bacterium can provide new insights into its pathophysiology and important facets of metabolic adaptation and, thus, aid the recognition of targets for intervention. However, the value of such proteome studies increases with their comprehensiveness. We present an MS-driven, proteome-wide characterization of the strain S. aureus HG001. Combining 144 high precision proteomic data sets, we identified 19 109 peptides from 2088 distinct S. aureus HG001 proteins, which account for 72% of the predicted ORFs. Peptides were further characterized concerning pI, GRAVY, and detectability scores in order to understand the low peptide coverage of 8.7% (19 109 out of 220 245 theoretical peptides). The high quality peptide-centric spectra have been organized into a comprehensive peptide fragmentation library (SpectraST) and used for identification of S. aureus-typic peptides in highly complex host-pathogen interaction experiments, which significantly improved the number of identified S. aureus proteins compared to a MASCOT search. This effort now allows the elucidation of crucial pathophysiological questions in S. aureus-specific host-pathogen interaction studies through comprehensive proteome analysis. The S. aureus-specific spectra resource developed here also represents an important spectral repository for SRM or for data-independent acquisition MS approaches. All MS data have been deposited in the ProteomeXchange with identifier PXD000702 (http://proteomecentral.proteomexchange.org/dataset/PXD000702).
Collapse
Affiliation(s)
- Maren Depke
- ZIK-FunGene Junior Research Group "Applied Proteomics", Interfaculty Institute for Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Stephan Michalik
- ZIK-FunGene Junior Research Group "Applied Proteomics", Interfaculty Institute for Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Alexander Rabe
- ZIK-FunGene Junior Research Group "Applied Proteomics", Interfaculty Institute for Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Kristin Surmann
- Interfaculty Institute for Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Lars Brinkmann
- Interfaculty Institute for Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Nico Jehmlich
- Interfaculty Institute for Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Jörg Bernhardt
- Institute for Microbiology, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany
| | - Michael Hecker
- Institute for Microbiology, Ernst-Moritz-Arndt-University Greifswald, Greifswald, Germany
| | - Bernd Wollscheid
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Zhi Sun
- Institute for Systems Biology (ISB), Seattle, WA, USA
| | | | - Uwe Völker
- Interfaculty Institute for Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Frank Schmidt
- ZIK-FunGene Junior Research Group "Applied Proteomics", Interfaculty Institute for Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| |
Collapse
|
25
|
Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc 2015; 10:426-41. [PMID: 25675208 DOI: 10.1038/nprot.2015.015] [Citation(s) in RCA: 220] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 d to complete, depending on the extent of the library and the computational resources available.
Collapse
|
26
|
Demeure K, Duriez E, Domon B, Niclou SP. PeptideManager: a peptide selection tool for targeted proteomic studies involving mixed samples from different species. Front Genet 2014; 5:305. [PMID: 25228907 PMCID: PMC4151198 DOI: 10.3389/fgene.2014.00305] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 08/16/2014] [Indexed: 02/02/2023] Open
Abstract
The search for clinically useful protein biomarkers using advanced mass spectrometry approaches represents a major focus in cancer research. However, the direct analysis of human samples may be challenging due to limited availability, the absence of appropriate control samples, or the large background variability observed in patient material. As an alternative approach, human tumors orthotopically implanted into a different species (xenografts) are clinically relevant models that have proven their utility in pre-clinical research. Patient derived xenografts for glioblastoma have been extensively characterized in our laboratory and have been shown to retain the characteristics of the parental tumor at the phenotypic and genetic level. Such models were also found to adequately mimic the behavior and treatment response of human tumors. The reproducibility of such xenograft models, the possibility to identify their host background and perform tumor-host interaction studies, are major advantages over the direct analysis of human samples. At the proteome level, the analysis of xenograft samples is challenged by the presence of proteins from two different species which, depending on tumor size, type or location, often appear at variable ratios. Any proteomics approach aimed at quantifying proteins within such samples must consider the identification of species specific peptides in order to avoid biases introduced by the host proteome. Here, we present an in-house methodology and tool developed to select peptides used as surrogates for protein candidates from a defined proteome (e.g., human) in a host proteome background (e.g., mouse, rat) suited for a mass spectrometry analysis. The tools presented here are applicable to any species specific proteome, provided a protein database is available. By linking the information from both proteomes, PeptideManager significantly facilitates and expedites the selection of peptides used as surrogates to analyze proteins of interest.
Collapse
Affiliation(s)
- Kevin Demeure
- NorLux Neuro-Oncology Laboratory, Department of Oncology, Centre de Recherche Public de la Santé Luxembourg, Luxembourg
| | - Elodie Duriez
- LCP, Luxembourg Clinical Proteomics Center, Centre de Recherche Public de la Santé Strassen, Luxembourg
| | - Bruno Domon
- LCP, Luxembourg Clinical Proteomics Center, Centre de Recherche Public de la Santé Strassen, Luxembourg
| | - Simone P Niclou
- NorLux Neuro-Oncology Laboratory, Department of Oncology, Centre de Recherche Public de la Santé Luxembourg, Luxembourg
| |
Collapse
|