1
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
2
|
Li Z, Li K, Xu B, Chen J, Zhang Y, Guo L, Xie J. Identification evidence unraveled by strict proteomics rules toward forensic samples. Electrophoresis 2023; 44:337-348. [PMID: 35906925 DOI: 10.1002/elps.202200051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 06/18/2022] [Accepted: 07/14/2022] [Indexed: 02/01/2023]
Abstract
Snake venom is a complex mixture of proteins and peptides secreted by venomous snakes from their poison glands. Although proteomics for snake venom composition, interspecific differences, and developmental evolution has been developed for a decade, current diagnosis or identification techniques of snake venom in clinical intoxication and forensic science applications are mainly dependent on morphological and immunoassay. It could be expected that the proteomics techniques directly offer great help. This work applied a bottom-up proteomics method to identify proteins' types and species attribution in suspected snake venom samples using ultrahigh-performance liquid chromatography-quadrupole-electrostatic field Orbitrap tandem mass spectrometric technique, and cytotoxicity assay was amended to provide a direct evidence of toxicity. Toward the suspicious samples seized in the security control, sample pretreatment (in-sol and in-gel digestion) and data acquisition (nontargeted and targeted screening) modes complemented and validated each other. We have implemented two consequent approaches in identifying the species source of proteins in the samples via the points of venom proteomics and strict forensic identification. First, we completed a workflow consisting of a proteomics database match toward an entire SWISS-PROT (date 2018-11-22) database and a result-directed specific taxonomy database. The latter was a helpful hint to compare master protein kinds and reveal the insufficiency of specific venom proteomics characterization rules. Second, we suggested strict rules for protein identification to meet the requirements of forensic science on improved identification correctness, that is, (1) peptide spectrum matches confidence, peptide confidence, and protein confidence were both high (with the false-discovery ratio less than 1%); (2) the number of unique peptides was more than or equal to two in one protein, and (3) within unique peptides, which at least 75% of the ∆m/z of the matched y and b ions were less than 5 ppm. We identified these samples as cobra venom containing 10 highly abundant proteins (P00597, P82463, P60770, Q9YGI4, P62375, P49123, P80245, P60302, P01442, and P60304) from two snake venom protein families (acid phospholipase A2 and three-finger toxins), and the most abundant proteins were cytotoxins.
Collapse
Affiliation(s)
- Zehua Li
- State Key Laboratory of Toxicology and Medical Countermeasures, and Laboratory of Toxicant Analysis, Institute of Pharmacology and Toxicology, Academy of Military Medical Sciences, Beijing, P. R. China
| | - Kexin Li
- State Key Laboratory of Toxicology and Medical Countermeasures, and Laboratory of Toxicant Analysis, Institute of Pharmacology and Toxicology, Academy of Military Medical Sciences, Beijing, P. R. China
| | - Bin Xu
- State Key Laboratory of Toxicology and Medical Countermeasures, and Laboratory of Toxicant Analysis, Institute of Pharmacology and Toxicology, Academy of Military Medical Sciences, Beijing, P. R. China
| | - Jia Chen
- State Key Laboratory of Toxicology and Medical Countermeasures, and Laboratory of Toxicant Analysis, Institute of Pharmacology and Toxicology, Academy of Military Medical Sciences, Beijing, P. R. China
| | - Ying Zhang
- Forensic Science Service of Beijing Public Security Bureau, Key Laboratory of Forensic Toxicology, Ministry of Public Security, Beijing, P. R. China
| | - Lei Guo
- State Key Laboratory of Toxicology and Medical Countermeasures, and Laboratory of Toxicant Analysis, Institute of Pharmacology and Toxicology, Academy of Military Medical Sciences, Beijing, P. R. China
| | - Jianwei Xie
- State Key Laboratory of Toxicology and Medical Countermeasures, and Laboratory of Toxicant Analysis, Institute of Pharmacology and Toxicology, Academy of Military Medical Sciences, Beijing, P. R. China
| |
Collapse
|
3
|
Network Biology and Artificial Intelligence Drive the Understanding of the Multidrug Resistance Phenotype in Cancer. Drug Resist Updat 2022; 60:100811. [DOI: 10.1016/j.drup.2022.100811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/22/2022] [Accepted: 01/24/2022] [Indexed: 02/07/2023]
|
4
|
Jennings ME, Silveira JR, Treier KM, Tracy PB, Matthews DE. Total Retention Liquid Chromatography-Mass Spectrometry to Achieve Maximum Protein Sequence Coverage. Anal Chem 2021; 93:5054-5060. [PMID: 33724001 DOI: 10.1021/acs.analchem.0c04292] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Peptide identification by liquid chromatography-mass spectrometry (LC-MS) requires retention and elution of peptides from the LC column. Although medium and hydrophobic peptides are readily retained by the C18 columns that are commonly used in proteomics, short and hydrophilic peptides are not retained nor measured by MS due to their elution in the void volume after sample injection. These nonretained peptides can possess important post-translational modifications, such as glycosylation or phosphorylation. We describe a total retention LC-MS method that employs a reverse phase C18 column and porous graphitic carbon (PGC) column to retain both hydrophobic and hydrophilic peptides for LC-MS analysis. Our setup uses a single valve with a trapping column and two LC pumps run at low microliter/minute flow rates to deliver separate gradients to parallel capillary C18 and PGC columns. Our capillary LC system balances the need for high sensitivity with ease of implementation as compared to other 2D LC systems that use nanocolumns with multiple trapping columns and multiport valves. We demonstrate the utility of the method identifying hydrophilic peptides that went undetected when only a C18 nanocolumn was used. These missed hydrophilic peptides include tripeptides and N-glycosylated species.
Collapse
|
5
|
Sengupta A, Naresh G, Mishra A, Parashar D, Narad P. Proteome analysis using machine learning approaches and its applications to diseases. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:161-216. [PMID: 34340767 DOI: 10.1016/bs.apcsb.2021.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
With the tremendous developments in the fields of biological and medical technologies, huge amounts of data are generated in the form of genomic data, images in medical databases or as data on protein sequences, and so on. Analyzing this data through different tools sheds light on the particulars of the disease and our body's reactions to it, thus, aiding our understanding of the human health. Most useful of these tools is artificial intelligence and deep learning (DL). The artificially created neural networks in DL algorithms help extract viable data from the datasets, and further, to recognize patters in these complex datasets. Therefore, as a part of machine learning, DL helps us face all the various challenges that come forth during protein prediction, protein identification and their quantification. Proteomics is the study of such proteins, their structures, features, properties and so on. As a form of data science, Proteomics has helped us progress excellently in the field of genomics technologies. One of the major techniques used in proteomics studies is mass spectrometry (MS). However, MS is efficient with analysis of large datasets only with the added help of informatics approaches for data analysis and interpretation; these mainly include machine learning and deep learning algorithms. In this chapter, we will discuss in detail the applications of deep learning and various algorithms of machine learning in proteomics.
Collapse
Affiliation(s)
- Abhishek Sengupta
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | - G Naresh
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | - Astha Mishra
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | - Diksha Parashar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
| | - Priyanka Narad
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India.
| |
Collapse
|
6
|
Takan S, Allmer J. DNMSO; an ontology for representing de novo sequencing results from Tandem-MS data. PeerJ 2020; 8:e10216. [PMID: 33150092 PMCID: PMC7585381 DOI: 10.7717/peerj.10216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 09/28/2020] [Indexed: 11/20/2022] Open
Abstract
For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdentML is the current community standard for data representation. There is no community standard for representing de novo sequencing results, but we previously proposed the de novo markup language (DNML). At the moment, each de novo sequencing solution uses different data representation, complicating downstream data integration, which is crucial since ensemble predictions may be more useful than predictions of a single tool. We here propose the de novo MS Ontology (DNMSO), which can, for example, provide many-to-many mappings between spectra and peptide predictions. Additionally, an application programming interface (API) that supports any file operation necessary for de novo sequencing from spectra input to reading, writing, creating, of the DNMSO format, as well as conversion from many other file formats, has been implemented. This API removes all overhead from the production of de novo sequencing tools and allows developers to concentrate on algorithm development completely. We make the API and formal descriptions of the format freely available at https://github.com/savastakan/dnmso.
Collapse
Affiliation(s)
- Savaş Takan
- Department of Computer Engineering, Faculty of Engineering, Izmir Institute of Technology, Izmir, Turkey
| | - Jens Allmer
- Hochschule Ruhr West, University of Applied Sciences, Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Mülheim an der Ruhr, Germany
| |
Collapse
|
7
|
Gerdle B, Ghafouri B. Proteomic studies of common chronic pain conditions - a systematic review and associated network analyses. Expert Rev Proteomics 2020; 17:483-505. [DOI: 10.1080/14789450.2020.1797499] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Björn Gerdle
- Pain and Rehabilitation Centre, and Department of Health, Medicine and Caring Sciences, Linköping University, Linköping, Sweden
| | - Bijar Ghafouri
- Pain and Rehabilitation Centre, and Department of Health, Medicine and Caring Sciences, Linköping University, Linköping, Sweden
| |
Collapse
|
8
|
The Power of Three in Cannabis Shotgun Proteomics: Proteases, Databases and Search Engines. Proteomes 2020; 8:proteomes8020013. [PMID: 32549361 PMCID: PMC7356525 DOI: 10.3390/proteomes8020013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 06/12/2020] [Accepted: 06/12/2020] [Indexed: 11/29/2022] Open
Abstract
Cannabis research has taken off since the relaxation of legislation, yet proteomics is still lagging. In 2019, we published three proteomics methods aimed at optimizing protein extraction, protein digestion for bottom-up and middle-down proteomics, as well as the analysis of intact proteins for top-down proteomics. The database of Cannabis sativa proteins used in these studies was retrieved from UniProt, the reference repositories for proteins, which is incomplete and therefore underrepresents the genetic diversity of this non-model species. In this fourth study, we remedy this shortcoming by searching larger databases from various sources. We also compare two search engines, the oldest, SEQUEST, and the most popular, Mascot. This shotgun proteomics experiment also utilizes the power of parallel digestions with orthogonal proteases of increasing selectivity, namely chymotrypsin, trypsin/Lys-C and Asp-N. Our results show that the larger the database the greater the list of accessions identified but the longer the duration of the search. Using orthogonal proteases and different search algorithms increases the total number of proteins identified, most of them common despite differing proteases and algorithms, but many of them unique as well.
Collapse
|
9
|
Ambrosino L, Colantuono C, Diretto G, Fiore A, Chiusano ML. Bioinformatics Resources for Plant Abiotic Stress Responses: State of the Art and Opportunities in the Fast Evolving -Omics Era. PLANTS 2020; 9:plants9050591. [PMID: 32384671 PMCID: PMC7285221 DOI: 10.3390/plants9050591] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 04/24/2020] [Accepted: 04/29/2020] [Indexed: 12/13/2022]
Abstract
Abiotic stresses are among the principal limiting factors for productivity in agriculture. In the current era of continuous climate changes, the understanding of the molecular aspects involved in abiotic stress response in plants is a priority. The rise of -omics approaches provides key strategies to promote effective research in the field, facilitating the investigations from reference models to an increasing number of species, tolerant and sensitive genotypes. Integrated multilevel approaches, based on molecular investigations at genomics, transcriptomics, proteomics and metabolomics levels, are now feasible, expanding the opportunities to clarify key molecular aspects involved in responses to abiotic stresses. To this aim, bioinformatics has become fundamental for data production, mining and integration, and necessary for extracting valuable information and for comparative efforts, paving the way to the modeling of the involved processes. We provide here an overview of bioinformatics resources for research on plant abiotic stresses, describing collections from -omics efforts in the field, ranging from raw data to complete databases or platforms, highlighting opportunities and still open challenges in abiotic stress research based on -omics technologies.
Collapse
Affiliation(s)
- Luca Ambrosino
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici (Na), Italy; (L.A.); (C.C.)
- Department of Research Infrastructures for Marine Biological Resources (RIMAR), 80121 Naples, Italy
| | - Chiara Colantuono
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici (Na), Italy; (L.A.); (C.C.)
- Department of Research Infrastructures for Marine Biological Resources (RIMAR), 80121 Naples, Italy
| | - Gianfranco Diretto
- Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), 00123 Rome, Italy; (G.D.); (A.F.)
| | - Alessia Fiore
- Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), 00123 Rome, Italy; (G.D.); (A.F.)
| | - Maria Luisa Chiusano
- Department of Agricultural Sciences, University of Naples Federico II, 80055 Portici (Na), Italy; (L.A.); (C.C.)
- Department of Research Infrastructures for Marine Biological Resources (RIMAR), 80121 Naples, Italy
- Correspondence: ; Tel.: +39-081-253-9492
| |
Collapse
|
10
|
de Anda-Jáuregui G, Hernández-Lemus E. Computational Oncology in the Multi-Omics Era: State of the Art. Front Oncol 2020; 10:423. [PMID: 32318338 PMCID: PMC7154096 DOI: 10.3389/fonc.2020.00423] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 03/10/2020] [Indexed: 12/24/2022] Open
Abstract
Cancer is the quintessential complex disease. As technologies evolve faster each day, we are able to quantify the different layers of biological elements that contribute to the emergence and development of malignancies. In this multi-omics context, the use of integrative approaches is mandatory in order to gain further insights on oncological phenomena, and to move forward toward the precision medicine paradigm. In this review, we will focus on computational oncology as an integrative discipline that incorporates knowledge from the mathematical, physical, and computational fields to further the biomedical understanding of cancer. We will discuss the current roles of computation in oncology in the context of multi-omic technologies, which include: data acquisition and processing; data management in the clinical and research settings; classification, diagnosis, and prognosis; and the development of models in the research setting, including their use for therapeutic target identification. We will discuss the machine learning and network approaches as two of the most promising emerging paradigms, in computational oncology. These approaches provide a foundation on how to integrate different layers of biological description into coherent frameworks that allow advances both in the basic and clinical settings.
Collapse
Affiliation(s)
- Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Cátedras Conacyt Para Jóvenes Investigadores, National Council on Science and Technology, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
11
|
Ma Q, Adua E, Boyce MC, Li X, Ji G, Wang W. IMass Time: The Future, in Future! OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2018; 22:679-695. [PMID: 30457467 DOI: 10.1089/omi.2018.0162] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Joseph John Thomson discovered and proved the existence of electrons through a series of experiments. His work earned him a Nobel Prize in 1906 and initiated the era of mass spectrometry (MS). In the intervening time, other researchers have also been awarded the Nobel Prize for significant advances in MS technology. The development of soft ionization techniques was central to the application of MS to large biological molecules and led to an unprecedented interest in the study of biomolecules such as proteins (proteomics), metabolites (metabolomics), carbohydrates (glycomics), and lipids (lipidomics), allowing a better understanding of the molecular underpinnings of health and disease. The interest in large molecules drove improvements in MS resolution and now the challenge is in data deconvolution, intelligent exploitation of heterogeneous data, and interpretation, all of which can be ameliorated with a proposed IMass technology. We define IMass as a combination of MS and artificial intelligence, with each performing a specific role. IMass will offer advantages such as improving speed, sensitivity, and analyses of large data that are presently not possible with MS alone. In this study, we present an overview of the MS considering historical perspectives and applications, challenges, as well as insightful highlights of IMass.
Collapse
Affiliation(s)
- Qingwei Ma
- 1 Bioyong (Beijing) Technology Co., Ltd. , Beijing, China
| | - Eric Adua
- 2 School of Medical and Health Sciences, Edith Cowan University , Joondalup, Australia
| | - Mary C Boyce
- 3 School of Science, Edith Cowan University , Joondalup, Australia
| | - Xingang Li
- 2 School of Medical and Health Sciences, Edith Cowan University , Joondalup, Australia
| | - Guang Ji
- 4 China-Canada Centre of Research for Digestive Diseases, University of Ottawa , Ottawa, Canada
- 5 Institute of Digestive Diseases, Longhua Hospital, Shanghai University of Traditional Chinese Medicine , Shanghai, China
| | - Wei Wang
- 2 School of Medical and Health Sciences, Edith Cowan University , Joondalup, Australia
- 6 School of Public Health, Taishan Medical University , Taian, China
| |
Collapse
|
12
|
Martens S, Landuyt A, Espeel P, Devreese B, Dawyndt P, Du Prez F. Multifunctional sequence-defined macromolecules for chemical data storage. Nat Commun 2018; 9:4451. [PMID: 30367037 PMCID: PMC6203848 DOI: 10.1038/s41467-018-06926-3] [Citation(s) in RCA: 123] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 10/03/2018] [Indexed: 12/16/2022] Open
Abstract
Sequence-defined macromolecules consist of a defined chain length (single mass), end-groups, composition and topology and prove promising in application fields such as anti-counterfeiting, biological mimicking and data storage. Here we show the potential use of multifunctional sequence-defined macromolecules as a storage medium. As a proof-of-principle, we describe how short text fragments (human-readable data) and QR codes (machine-readable data) are encoded as a collection of oligomers and how the original data can be reconstructed. The amide-urethane containing oligomers are generated using an automated protecting-group free, two-step iterative protocol based on thiolactone chemistry. Tandem mass spectrometry techniques have been explored to provide detailed analysis of the oligomer sequences. We have developed the generic software tools Chemcoder for encoding/decoding binary data as a collection of multifunctional macromolecules and Chemreader for reconstructing oligomer sequences from mass spectra to automate the process of chemical writing and reading.
Collapse
Affiliation(s)
- Steven Martens
- Department of Organic and Macromolecular Chemistry, Polymer Chemistry Research Group, Centre of Macromolecular Chemistry (CMaC), Ghent University, Krijgslaan 281 S4bis, 9000, Ghent, Belgium
| | - Annelies Landuyt
- Department of Organic and Macromolecular Chemistry, Polymer Chemistry Research Group, Centre of Macromolecular Chemistry (CMaC), Ghent University, Krijgslaan 281 S4bis, 9000, Ghent, Belgium
| | - Pieter Espeel
- Department of Organic and Macromolecular Chemistry, Polymer Chemistry Research Group, Centre of Macromolecular Chemistry (CMaC), Ghent University, Krijgslaan 281 S4bis, 9000, Ghent, Belgium
| | - Bart Devreese
- Department of Biochemistry and Microbiology, Laboratory for Protein Biochemistry and Biomolecular Engineering, Ghent University, K.L. Ledeganckstraat 35, 9000, Ghent, Belgium
| | - Peter Dawyndt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan 281 S9, 9000, Ghent, Belgium
| | - Filip Du Prez
- Department of Organic and Macromolecular Chemistry, Polymer Chemistry Research Group, Centre of Macromolecular Chemistry (CMaC), Ghent University, Krijgslaan 281 S4bis, 9000, Ghent, Belgium.
| |
Collapse
|
13
|
Háda V, Bagdi A, Bihari Z, Timári SB, Fizil Á, Szántay C. Recent advancements, challenges, and practical considerations in the mass spectrometry-based analytics of protein biotherapeutics: A viewpoint from the biosimilar industry. J Pharm Biomed Anal 2018; 161:214-238. [PMID: 30205300 DOI: 10.1016/j.jpba.2018.08.024] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 08/08/2018] [Accepted: 08/10/2018] [Indexed: 01/22/2023]
Abstract
The extensive analytical characterization of protein biotherapeutics, especially of biosimilars, is a critical part of the product development and registration. High-resolution mass spectrometry became the primary analytical tool used for the structural characterization of biotherapeutics. Its high instrumental sensitivity and methodological versatility made it possible to use this technique to characterize both the primary and higher-order structure of these proteins. However, even by using high-end instrumentation, analysts face several challenges with regard to how to cope with industrial and regulatory requirements, that is, how to obtain accurate and reliable analytical data in a time- and cost-efficient way. New sample preparation approaches, measurement techniques and data evaluation strategies are available to meet those requirements. The practical considerations of these methods are discussed in the present review article focusing on hot topics, such as reliable and efficient sequencing strategies, minimization of artefact formation during sample preparation, quantitative peptide mapping, the potential of multi-attribute methodology, the increasing role of mass spectrometry in higher-order structure characterization and the challenges of MS-based identification of host cell proteins. On the basis of the opportunities in new instrumental techniques, methodological advancements and software-driven data evaluation approaches, for the future one can envision an even wider application area for mass spectrometry in the biopharmaceutical industry.
Collapse
Affiliation(s)
- Viktor Háda
- Analytical Department of Biotechnology, Gedeon Richter Plc, Hungary.
| | - Attila Bagdi
- Analytical Department of Biotechnology, Gedeon Richter Plc, Hungary
| | - Zsolt Bihari
- Analytical Department of Biotechnology, Gedeon Richter Plc, Hungary
| | | | - Ádám Fizil
- Analytical Department of Biotechnology, Gedeon Richter Plc, Hungary
| | - Csaba Szántay
- Spectroscopic Research Department, Gedeon Richter Plc, Hungary.
| |
Collapse
|
14
|
Identification of RNA-binding domains of RNA-binding proteins in cultured cells on a system-wide scale with RBDmap. Nat Protoc 2017; 12:2447-2464. [PMID: 29095441 DOI: 10.1038/nprot.2017.106] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
This protocol is an extension to: Nat. Protoc. 8, 491-500 (2013); doi:10.1038/nprot.2013.020; published online 14 February 2013RBDmap is a method for identifying, in a proteome-wide manner, the regions of RNA-binding proteins (RBPs) engaged in native interactions with RNA. In brief, cells are irradiated with UV light to induce protein-RNA cross-links. Following stringent denaturing washes, the resulting covalently linked protein-RNA complexes are purified with oligo(dT) magnetic beads. After elution, RBPs are subjected to partial proteolysis, in which the protein regions still bound to the RNA and those released to the supernatant are separated by a second oligo(dT) selection. After sample preparation and mass-spectrometric analysis, peptide intensity ratios between the RNA-bound and released fractions are used to determine the RNA-binding regions. As a Protocol Extension, this article describes an adaptation of an existing Protocol and offers additional applications. The earlier protocol (for the RNA interactome capture method) describes how to identify the active RBPs in cultured cells, whereas this Protocol Extension also enables the identification of the RNA-binding domains of RBPs. The experimental workflow takes 1 week plus 2 additional weeks for proteomics and data analysis. Notably, RBDmap presents numerous advantages over classic methods for determining RNA-binding domains: it produces proteome-wide, high-resolution maps of the protein regions contacting the RNA in a physiological context and can be adapted to different biological systems and conditions. Because RBDmap relies on the isolation of polyadenylated RNA via oligo(dT), it will not provide RNA-binding information on proteins interacting exclusively with nonpolyadenylated transcripts. Applied to HeLa cells, RBDmap uncovered 1,174 RNA-binding sites in 529 proteins, many of which were previously unknown.
Collapse
|
15
|
Hosp F, Mann M. A Primer on Concepts and Applications of Proteomics in Neuroscience. Neuron 2017; 96:558-571. [PMID: 29096073 DOI: 10.1016/j.neuron.2017.09.025] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 08/29/2017] [Accepted: 09/14/2017] [Indexed: 02/06/2023]
|
16
|
Barbieri R, Guryev V, Brandsma CA, Suits F, Bischoff R, Horvatovich P. Proteogenomics: Key Driver for Clinical Discovery and Personalized Medicine. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2017; 926:21-47. [PMID: 27686804 DOI: 10.1007/978-3-319-42316-6_3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Proteogenomics is a multi-omics research field that has the aim to efficiently integrate genomics, transcriptomics and proteomics. With this approach it is possible to identify new patient-specific proteoforms that may have implications in disease development, specifically in cancer. Understanding the impact of a large number of mutations detected at the genomics level is needed to assess the effects at the proteome level. Proteogenomics data integration would help in identifying molecular changes that are persistent across multiple molecular layers and enable better interpretation of molecular mechanisms of disease, such as the causal relationship between single nucleotide polymorphisms (SNPs) and the expression of transcripts and translation of proteins compared to mainstream proteomics approaches. Identifying patient-specific protein forms and getting a better picture of molecular mechanisms of disease opens the avenue for precision and personalized medicine. Proteogenomics is, however, a challenging interdisciplinary science that requires the understanding of sample preparation, data acquisition and processing for genomics, transcriptomics and proteomics. This chapter aims to guide the reader through the technology and bioinformatics aspects of these multi-omics approaches, illustrated with proteogenomics applications having clinical or biological relevance.
Collapse
Affiliation(s)
- Ruggero Barbieri
- Department of Gastroenterology and Hepatology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Corry-Anke Brandsma
- Department of Pathology & Medical Biology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Frank Suits
- IBM T.J. Watson Research Centre, 1101 Kitchawan Road, Yorktown Heights, New York, 10598, NY, USA
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Research Institute of Pharmacy, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, Research Institute of Pharmacy, University of Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
| |
Collapse
|
17
|
Shao S, Neely BA, Kao TC, Eckhaus J, Bourgeois J, Brooks J, Jones EE, Drake RR, Zhu K. Proteomic Profiling of Serial Prediagnostic Serum Samples for Early Detection of Colon Cancer in the U.S. Military. Cancer Epidemiol Biomarkers Prev 2016; 26:711-718. [PMID: 28003179 DOI: 10.1158/1055-9965.epi-16-0732] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 11/23/2016] [Accepted: 12/07/2016] [Indexed: 01/24/2023] Open
Abstract
Background: Serum proteomic biomarkers offer a promising approach for early detection of cancer. In this study, we aimed to identify proteomic profiles that could distinguish colon cancer cases from controls using serial prediagnostic serum samples.Methods: This was a nested case-control study of active duty military members. Cases consisted of 264 patients diagnosed with colon cancer between 2001 and 2009. Controls were matched to cases on age, gender, race, serum sample count, and collection date. We identified peaks that discriminated cases from controls using random forest data analysis with a 2/3 training and 1/3 validation dataset. We then included epidemiologic data to see whether further improvement of model performance was obtainable. Proteins that corresponded to discriminatory peaks were identified.Results: Peaks with m/z values of 3,119.32, 2,886.67, 2,939.23, and 5,078.81 were found to discriminate cases from controls with a sensitivity of 69% and a specificity of 67% in the year before diagnosis. When smoking status was included, sensitivity increased to 76% while histories of other cancer and tonsillectomy raised specificity to 76%. Peaks at 2,886.67 and 3,119.32 m/z were identified as histone acetyltransferases while 2,939.24 m/z was a transporting ATPase subunit.Conclusions: Proteomic profiles in the year before cancer diagnosis have the potential to discriminate colon cancer patients from controls, and the addition of epidemiologic information may increase the sensitivity and specificity of discrimination.Impact: Our findings indicate the potential value of using serum prediagnostic proteomic biomarkers in combination with epidemiologic data for early detection of colon cancer. Cancer Epidemiol Biomarkers Prev; 26(5); 711-8. ©2016 AACR.
Collapse
Affiliation(s)
- Stephanie Shao
- Division of Epidemiology and Biostatistics, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland.,John P. Murtha Cancer Center, Walter Reed National Military Medical Center, Bethesda, Maryland
| | - Benjamin A Neely
- Department of Cell and Molecular Pharmacology and Experimental Therapeutics and MUSC Proteomics Center, Medical University of South Carolina, Charleston, South Carolina
| | - Tzu-Cheg Kao
- Division of Epidemiology and Biostatistics, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Janet Eckhaus
- Division of Epidemiology and Biostatistics, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Jolie Bourgeois
- Division of Epidemiology and Biostatistics, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Jasmin Brooks
- Department of Cell and Molecular Pharmacology and Experimental Therapeutics and MUSC Proteomics Center, Medical University of South Carolina, Charleston, South Carolina
| | - Elizabeth E Jones
- Department of Cell and Molecular Pharmacology and Experimental Therapeutics and MUSC Proteomics Center, Medical University of South Carolina, Charleston, South Carolina
| | - Richard R Drake
- Department of Cell and Molecular Pharmacology and Experimental Therapeutics and MUSC Proteomics Center, Medical University of South Carolina, Charleston, South Carolina
| | - Kangmin Zhu
- Division of Epidemiology and Biostatistics, Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland. .,John P. Murtha Cancer Center, Walter Reed National Military Medical Center, Bethesda, Maryland
| |
Collapse
|
18
|
Giorgianni F, Beranova-Giorgianni S. Phosphoproteome Discovery in Human Biological Fluids. Proteomes 2016; 4:proteomes4040037. [PMID: 28248247 PMCID: PMC5260970 DOI: 10.3390/proteomes4040037] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Revised: 11/11/2016] [Accepted: 11/23/2016] [Indexed: 01/07/2023] Open
Abstract
Phosphorylation plays a critical role in regulating protein function and thus influences a vast spectrum of cellular processes. With the advent of modern bioanalytical technologies, examination of protein phosphorylation on a global scale has become one of the major research areas. Phosphoproteins are found in biological fluids and interrogation of the phosphoproteome in biological fluids presents an exciting opportunity for discoveries that hold great potential for novel mechanistic insights into protein function in health and disease, and for translation to improved diagnostic and therapeutic approaches for the clinical setting. This review focuses on phosphoproteome discovery in selected human biological fluids: serum/plasma, urine, cerebrospinal fluid, saliva, and bronchoalveolar lavage fluid. Bioanalytical workflows pertinent to phosphoproteomics of biological fluids are discussed with emphasis on mass spectrometry-based approaches, and summaries of studies on phosphoproteome discovery in major fluids are presented.
Collapse
Affiliation(s)
- Francesco Giorgianni
- Department of Pharmaceutical Sciences, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
| | - Sarka Beranova-Giorgianni
- Department of Pharmaceutical Sciences, University of Tennessee Health Science Center, Memphis, TN 38163, USA.
| |
Collapse
|
19
|
Funke S, Markowitsch S, Schmelter C, Perumal N, Mwiiri FK, Gabel-Scheurich S, Pfeiffer N, Grus FH. In-Depth Proteomic Analysis of the Porcine Retina by Use of a four Step Differential Extraction Bottom up LC MS Platform. Mol Neurobiol 2016; 54:7262-7275. [PMID: 27796761 DOI: 10.1007/s12035-016-0172-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 09/27/2016] [Indexed: 01/09/2023]
Abstract
The eye of the house swine (Sus scrofa domestica Linnaeus, 1758) represents a promising model for the study of human eye diseases encircling neurodegenerative retina disorders that go along with proteomic changes. To provide an in-depth view into the "normal" (untreated & healthy) porcine retina proteome as an important reference, a proteomic strategy has been developed encircling stepwise/differential extraction, LC MS and peptide de novo sequencing. Accordingly, pooled porcine retina homogenates were processed by stepwise DDM, CHAPS, ASB14 and ACN/TFA extraction. Retinal proteins were fractionated by 1D-SDS PAGE and further analyzed by LC ESI MS following database and de novo sequencing related protein identification and functional analyses. In summary, >2000 retinal proteins (FDR < 1 %) could be identified by use of the highly reproducible and selective extraction procedure. Moreover, an identification surplus of 36 % comparing initial one step extraction to the four step method could be documented. Despite most proteins were identified in the DDM and CHAPS fraction, all extraction steps contributed exclusive proteins with nucleus proteins enriched in the final ACN/TFA fraction. Additionally, for the first time new non-annotated de novo peptides could be documented for the porcine retina. The generated porcine retina proteome reference map contributes importantly to the understanding of the pig eye proteome and the developed workflow has strong translational potential considering retina studies of various species.
Collapse
Affiliation(s)
- Sebastian Funke
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center, Mainz, Germany
| | - Sascha Markowitsch
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center, Mainz, Germany
| | - Carsten Schmelter
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center, Mainz, Germany
| | - Natarajan Perumal
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center, Mainz, Germany
| | - Francis Kamau Mwiiri
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center, Mainz, Germany
| | - Silke Gabel-Scheurich
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center, Mainz, Germany
| | - Norbert Pfeiffer
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center, Mainz, Germany
| | - Franz H Grus
- Experimental Ophthalmology, Department of Ophthalmology, University Medical Center, Mainz, Germany.
- Department of Experimental Ophthalmology, University Medical Center (Universitätsmedizin), Johannes Gutenberg University Mainz, Langenbeckstr. 1, 55131, Mainz, Germany.
| |
Collapse
|
20
|
Parker GJ, Leppert T, Anex DS, Hilmer JK, Matsunami N, Baird L, Stevens J, Parsawar K, Durbin-Johnson BP, Rocke DM, Nelson C, Fairbanks DJ, Wilson AS, Rice RH, Woodward SR, Bothner B, Hart BR, Leppert M. Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome. PLoS One 2016; 11:e0160653. [PMID: 27603779 PMCID: PMC5014411 DOI: 10.1371/journal.pone.0160653] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 07/21/2016] [Indexed: 12/28/2022] Open
Abstract
Human identification from biological material is largely dependent on the ability to characterize genetic polymorphisms in DNA. Unfortunately, DNA can degrade in the environment, sometimes below the level at which it can be amplified by PCR. Protein however is chemically more robust than DNA and can persist for longer periods. Protein also contains genetic variation in the form of single amino acid polymorphisms. These can be used to infer the status of non-synonymous single nucleotide polymorphism alleles. To demonstrate this, we used mass spectrometry-based shotgun proteomics to characterize hair shaft proteins in 66 European-American subjects. A total of 596 single nucleotide polymorphism alleles were correctly imputed in 32 loci from 22 genes of subjects' DNA and directly validated using Sanger sequencing. Estimates of the probability of resulting individual non-synonymous single nucleotide polymorphism allelic profiles in the European population, using the product rule, resulted in a maximum power of discrimination of 1 in 12,500. Imputed non-synonymous single nucleotide polymorphism profiles from European-American subjects were considerably less frequent in the African population (maximum likelihood ratio = 11,000). The converse was true for hair shafts collected from an additional 10 subjects with African ancestry, where some profiles were more frequent in the African population. Genetically variant peptides were also identified in hair shaft datasets from six archaeological skeletal remains (up to 260 years old). This study demonstrates that quantifiable measures of identity discrimination and biogeographic background can be obtained from detecting genetically variant peptides in hair shaft protein, including hair from bioarchaeological contexts.
Collapse
Affiliation(s)
- Glendon J. Parker
- Department of Biology, Utah Valley University, Orem, Utah, United States of America
- Protein-Based Identification Technologies L.L.C., Orem, Utah, United States of America
- * E-mail: parker64@llnl;
| | - Tami Leppert
- Protein-Based Identification Technologies L.L.C., Orem, Utah, United States of America
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| | - Deon S. Anex
- Forensic Science Center, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Jonathan K. Hilmer
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, Montana, United States of America
| | - Nori Matsunami
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| | - Lisa Baird
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| | - Jeffery Stevens
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| | - Krishna Parsawar
- Mass Spectrometry and Proteomics Core Facility, University of Utah, Salt Lake City, Utah, United States of America
| | - Blythe P. Durbin-Johnson
- Department of Public Health Sciences, University of California, Davis, California, United States of America
| | - David M. Rocke
- Department of Public Health Sciences, University of California, Davis, California, United States of America
| | - Chad Nelson
- Mass Spectrometry and Proteomics Core Facility, University of Utah, Salt Lake City, Utah, United States of America
| | - Daniel J. Fairbanks
- Department of Biology, Utah Valley University, Orem, Utah, United States of America
| | - Andrew S. Wilson
- School of Archaeological Sciences, University of Bradford, Bradford, United Kingdom
| | - Robert H. Rice
- Department of Environmental Toxicology, University of California, Davis, California, United States of America
| | - Scott R. Woodward
- Sorenson Molecular Genealogical Foundation, Salt Lake City, Utah, United States of America
| | - Brian Bothner
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, Montana, United States of America
| | - Bradley R. Hart
- Forensic Science Center, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Mark Leppert
- Department of Human Genetics, University of Utah, Salt Lake City, Utah, United States of America
| |
Collapse
|
21
|
Griss J. Spectral library searching in proteomics. Proteomics 2016; 16:729-40. [PMID: 26616598 DOI: 10.1002/pmic.201500296] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 10/15/2015] [Accepted: 10/29/2015] [Indexed: 12/12/2022]
Abstract
Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
22
|
Ning Z, Zhang X, Mayne J, Figeys D. Peptide-Centric Approaches Provide an Alternative Perspective To Re-Examine Quantitative Proteomic Data. Anal Chem 2016; 88:1973-8. [DOI: 10.1021/acs.analchem.5b04148] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Zhibin Ning
- Ottawa
Institute of Systems
Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario Canada, K1H8M5
| | - Xu Zhang
- Ottawa
Institute of Systems
Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario Canada, K1H8M5
| | - Janice Mayne
- Ottawa
Institute of Systems
Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario Canada, K1H8M5
| | - Daniel Figeys
- Ottawa
Institute of Systems
Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario Canada, K1H8M5
| |
Collapse
|
23
|
Perez-Riverol Y, Alpi E, Wang R, Hermjakob H, Vizcaíno JA. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 2015; 15:930-49. [PMID: 25158685 PMCID: PMC4409848 DOI: 10.1002/pmic.201400302] [Citation(s) in RCA: 141] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 08/06/2014] [Accepted: 08/22/2014] [Indexed: 01/10/2023]
Abstract
Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | |
Collapse
|
24
|
Donnelly MR, Ciborowski P. Proteomics, biomarkers, and HIV-1: A current perspective. Proteomics Clin Appl 2015; 10:110-25. [PMID: 26033875 PMCID: PMC4666820 DOI: 10.1002/prca.201500002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Revised: 03/17/2015] [Accepted: 05/27/2015] [Indexed: 01/24/2023]
Abstract
Despite more than three decades of extensive research, HIV‐1 infection although well controlled with cART, remains incurable. Multifactorial complexity of the viral life‐cycle poses great challenges in understanding molecular mechanisms underlying this infection and the development of biomarkers, which we hope will lead us to its eradication. For a more in‐depth understanding of how the virus interacts with host target cells, T cells and macrophages, proteomic profiling techniques that offer strategies to investigate the proteome in its entirety were employed. Here, we review proteomic studies related to HIV‐1 infection and discuss perspectives and limitations of proteomic and systems biology approaches in future studies.
Collapse
Affiliation(s)
- Maire Rose Donnelly
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Pawel Ciborowski
- Department of Pharmacology and Experimental Neuroscience, University of Nebraska Medical Center, Omaha, NE 68198, USA
| |
Collapse
|
25
|
Bioinformatics-Aided Venomics. Toxins (Basel) 2015; 7:2159-87. [PMID: 26110505 PMCID: PMC4488696 DOI: 10.3390/toxins7062159] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Revised: 06/03/2015] [Accepted: 06/05/2015] [Indexed: 12/12/2022] Open
Abstract
Venomics is a modern approach that combines transcriptomics and proteomics to explore the toxin content of venoms. This review will give an overview of computational approaches that have been created to classify and consolidate venomics data, as well as algorithms that have helped discovery and analysis of toxin nucleic acid and protein sequences, toxin three-dimensional structures and toxin functions. Bioinformatics is used to tackle specific challenges associated with the identification and annotations of toxins. Recognizing toxin transcript sequences among second generation sequencing data cannot rely only on basic sequence similarity because toxins are highly divergent. Mass spectrometry sequencing of mature toxins is challenging because toxins can display a large number of post-translational modifications. Identifying the mature toxin region in toxin precursor sequences requires the prediction of the cleavage sites of proprotein convertases, most of which are unknown or not well characterized. Tracing the evolutionary relationships between toxins should consider specific mechanisms of rapid evolution as well as interactions between predatory animals and prey. Rapidly determining the activity of toxins is the main bottleneck in venomics discovery, but some recent bioinformatics and molecular modeling approaches give hope that accurate predictions of toxin specificity could be made in the near future.
Collapse
|
26
|
Oveland E, Muth T, Rapp E, Martens L, Berven FS, Barsnes H. Viewing the proteome: how to visualize proteomics data? Proteomics 2015; 15:1341-55. [PMID: 25504833 DOI: 10.1002/pmic.201400412] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Revised: 10/23/2014] [Accepted: 12/05/2014] [Indexed: 01/18/2023]
Abstract
Proteomics has become one of the main approaches for analyzing and understanding biological systems. Yet similar to other high-throughput analysis methods, the presentation of the large amounts of obtained data in easily interpretable ways remains challenging. In this review, we present an overview of the different ways in which proteomics software supports the visualization and interpretation of proteomics data. The unique challenges and current solutions for visualizing the different aspects of proteomics data, from acquired spectra via protein identification and quantification to pathway analysis, are discussed, and examples of the most useful visualization approaches are highlighted. Finally, we offer our ideas about future directions for proteomics data visualization.
Collapse
Affiliation(s)
- Eystein Oveland
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway; KG Jebsen Centre for Multiple Sclerosis Research, Department of Clinical Medicine, University of Bergen, Bergen, Norway; Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | | | | | | | | | | |
Collapse
|
27
|
Samperi R, Capriotti AL, Cavaliere C, Colapicchioni V, Chiozzi RZ, Laganà A. Food Proteins and Peptides. ADVANCED MASS SPECTROMETRY FOR FOOD SAFETY AND QUALITY 2015. [DOI: 10.1016/b978-0-444-63340-8.00006-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
28
|
Link AJ, Washburn MP. Analysis of protein composition using multidimensional chromatography and mass spectrometry. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2014; 78:23.1.1-23.1.25. [PMID: 25367006 DOI: 10.1002/0471140864.ps2301s78] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Multidimensional liquid chromatography of peptides produced by protease digestion of complex protein mixtures followed by tandem mass spectrometry can be coupled with automated database searching to identify large numbers of proteins in complex samples. These methods avoid the limitations of gel electrophoresis and in-gel digestions by directly identifying protein mixtures in solution. One method used extensively is named Multidimensional Protein Identification Technology (MudPIT), where reversed-phase chromatography and strong cation-exchange chromatography are coupled directly in a microcapillary column. This column is then placed in line between an HPLC and a mass spectrometer for complex mixture analysis. MudPIT remains a powerful approach for analyzing complex mixtures like whole proteomes and protein complexes. MudPIT is used for quantitative proteomic analysis of complex mixtures to generate novel biological insights.
Collapse
Affiliation(s)
- Andrew J Link
- Vanderbilt University School of Medicine Nashville, Tennessee
| | | |
Collapse
|
29
|
Wilhelm T, Jones AME. Identification of related peptides through the analysis of fragment ion mass shifts. J Proteome Res 2014; 13:4002-11. [PMID: 25058668 DOI: 10.1021/pr500347e] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Mass spectrometry (MS) has become the method of choice to identify and quantify proteins, typically by fragmenting peptides and inferring protein identification by reference to sequence databases. Well-established programs have largely solved the problem of identifying peptides in complex mixtures. However, to prevent the search space from becoming prohibitively large, most search engines need a list of expected modifications. Therefore, unexpected modifications limit both the identification of proteins and peptide-based quantification. We developed mass spectrometry-peak shift analysis (MS-PSA) to rapidly identify related spectra in large data sets without reference to databases or specified modifications. Peptide identifications from established tools, such as MASCOT or SEQUEST, may be propagated onto MS-PSA results. Modification of a peptide alters the mass of the precursor ion and some of the fragmentation ions. MS-PSA identifies characteristic fragmentation masses from MS/MS spectra. Related spectra are identified by pattern matching of unchanged and mass-shifted fragment ions. We illustrate the use of MS-PSA with simple and complex mixtures with both high and low mass accuracy data sets. MS-PSA is not limited to the analysis of peptides but can be used for the identification of related groups of spectra in any set of fragmentation patterns.
Collapse
Affiliation(s)
- Thomas Wilhelm
- Institute of Food Research , Norwich Research Park, Norwich NR4 7UA, United Kingdom
| | | |
Collapse
|
30
|
Abstract
Most biochemical reactions in a cell are regulated by highly specialized proteins, which are the prime mediators of the cellular phenotype. Therefore the identification, quantitation and characterization of all proteins in a cell are of utmost importance to understand the molecular processes that mediate cellular physiology. With the advent of robust and reliable mass spectrometers that are able to analyze complex protein mixtures within a reasonable timeframe, the systematic analysis of all proteins in a cell becomes feasible. Besides the ongoing improvements of analytical hardware, standardized methods to analyze and study all proteins have to be developed that allow the generation of testable new hypothesis based on the enormous pre-existing amount of biological information. Here we discuss current strategies on how to gather, filter and analyze proteomic data sates using available software packages.
Collapse
|
31
|
Carapito C, Burel A, Guterl P, Walter A, Varrier F, Bertile F, Van Dorsselaer A. MSDA, a proteomics software suite for in-depth Mass Spectrometry Data Analysis using grid computing. Proteomics 2014; 14:1014-9. [DOI: 10.1002/pmic.201300415] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2013] [Revised: 01/15/2014] [Accepted: 01/15/2014] [Indexed: 12/20/2022]
Affiliation(s)
- Christine Carapito
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Alexandre Burel
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Patrick Guterl
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Alexandre Walter
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Fabrice Varrier
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Fabrice Bertile
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Alain Van Dorsselaer
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| |
Collapse
|
32
|
Goh WWB, Wong L. Computational proteomics: designing a comprehensive analytical strategy. Drug Discov Today 2014; 19:266-74. [DOI: 10.1016/j.drudis.2013.07.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 06/28/2013] [Accepted: 07/11/2013] [Indexed: 02/02/2023]
|
33
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Tanca A, Palomba A, Deligios M, Cubeddu T, Fraumene C, Biosa G, Pagnozzi D, Addis MF, Uzzau S. Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture. PLoS One 2013; 8:e82981. [PMID: 24349410 PMCID: PMC3857319 DOI: 10.1371/journal.pone.0082981] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 10/30/2013] [Indexed: 01/10/2023] Open
Abstract
Metaproteomics enables the investigation of the protein repertoire expressed by complex microbial communities. However, to unleash its full potential, refinements in bioinformatic approaches for data analysis are still needed. In this context, sequence databases selection represents a major challenge. This work assessed the impact of different databases in metaproteomic investigations by using a mock microbial mixture including nine diverse bacterial and eukaryotic species, which was subjected to shotgun metaproteomic analysis. Then, both the microbial mixture and the single microorganisms were subjected to next generation sequencing to obtain experimental metagenomic- and genomic-derived databases, which were used along with public databases (namely, NCBI, UniProtKB/SwissProt and UniProtKB/TrEMBL, parsed at different taxonomic levels) to analyze the metaproteomic dataset. First, a quantitative comparison in terms of number and overlap of peptide identifications was carried out among all databases. As a result, only 35% of peptides were common to all database classes; moreover, genus/species-specific databases provided up to 17% more identifications compared to databases with generic taxonomy, while the metagenomic database enabled a slight increment in respect to public databases. Then, database behavior in terms of false discovery rate and peptide degeneracy was critically evaluated. Public databases with generic taxonomy exhibited a markedly different trend compared to the counterparts. Finally, the reliability of taxonomic attribution according to the lowest common ancestor approach (using MEGAN and Unipept software) was assessed. The level of misassignments varied among the different databases, and specific thresholds based on the number of taxon-specific peptides were established to minimize false positives. This study confirms that database selection has a significant impact in metaproteomics, and provides critical indications for improving depth and reliability of metaproteomic results. Specifically, the use of iterative searches and of suitable filters for taxonomic assignments is proposed with the aim of increasing coverage and trustworthiness of metaproteomic data.
Collapse
Affiliation(s)
- Alessandro Tanca
- Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy
- Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
| | - Antonio Palomba
- Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
| | - Massimo Deligios
- Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy
- Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
| | | | | | - Grazia Biosa
- Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy
| | | | - Maria Filippa Addis
- Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy
- Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
- * E-mail: (MFA); (SU)
| | - Sergio Uzzau
- Porto Conte Ricerche Srl, Tramariglio, Alghero, Italy
- Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
- * E-mail: (MFA); (SU)
| |
Collapse
|
35
|
Horvatovich P, Franke L, Bischoff R. Proteomic studies related to genetic determinants of variability in protein concentrations. J Proteome Res 2013; 13:5-14. [PMID: 24237071 DOI: 10.1021/pr400765y] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Genetic variation has multiple effects on the proteome. It may influence the expression level of proteins, modify their sequences through single nucleotide polymorphisms, the occurrence of allelic variants, or alternative splicing (ASP) events. This perspective paper summarizes the major effects of genetic variability on protein expression and isoforms and provides an overview of proteomics techniques and methods that allow studying the effects of genetic variability at different levels of the proteome. The paper provides an overview of recent quantitative trait loci studies performed to explore the effect of genetic variation on protein expression (pQTL). Finally it gives a perspective view on advances in proteomics technology and the role of the Chromosome-Centric Human Proteome Project (C-HPP) by creating large-scale resources that may facilitate performing more comprehensive pQTL experiments in the future.
Collapse
Affiliation(s)
- Péter Horvatovich
- Analytical Biochemistry, Department of Pharmacy, University of Groningen , A. Deusinglaan 1, 9713 AV Groningen, The Netherlands
| | | | | |
Collapse
|