1
|
Taniguchi K, Miyaguchi H. COL1A2 Barcoding: Bone Species Identification via Shotgun Proteomics. J Proteome Res 2024; 23:377-385. [PMID: 38091499 DOI: 10.1021/acs.jproteome.3c00615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Abstract
Species identification of fragmentary bones remains a challenging task in archeology and forensics. A species identification method for such fragmentary bones that has recently attracted interest is the use of bone collagen proteins. Here, we describe a method similar to DNA barcoding that reads collagen protein sequences in bone and automatically determines the species by performing sequence database searches. The method is almost identical to conventional shotgun proteomics analysis of bone samples, except that the database used by the SEQUEST search engine consisted only of entries for collagen type 1 alpha 2 (COL1A2) proteins from various vertebrates. Accordingly, the COL1A2 peptides that differ in sequence among species act as species marker peptides. In SEQUEST-based shotgun proteomics, the protein entries that contain more marker peptide sequences are assigned higher scores; therefore, the highest-scoring protein entry will be the COL1A2 entry for the species from which the analyzed bone was derived. We tested our method using bone samples from 30 vertebrate species and found that all species were correctly identified. In conclusion, COL1A2 can be used as a bone protein barcode and can be read through shotgun proteomics, allowing for automatic bone species identification. Data are available via ProteomeXchange with the identifier PXD045402.
Collapse
Affiliation(s)
- Kei Taniguchi
- National Research Institute of Police Science, 6-3-1, Kashiwanoha, Kashiwa 277-0882, Chiba, Japan
| | - Hajime Miyaguchi
- National Research Institute of Police Science, 6-3-1, Kashiwanoha, Kashiwa 277-0882, Chiba, Japan
| |
Collapse
|
2
|
Singh SA, Kuraoka S, Pestana DVS, Nasir W, Delanghe B, Aikawa M. The RiboMaP Spectral Annotation Method Applied to Various ADP-Ribosylome Studies Including INF-γ-Stimulated Human Cells and Mouse Tissues. Front Cardiovasc Med 2022; 9:851351. [PMID: 35419443 PMCID: PMC8996112 DOI: 10.3389/fcvm.2022.851351] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 03/02/2022] [Indexed: 11/13/2022] Open
Abstract
ADP-ribosylation is a post-translational modification that is catalyzed by the ADP-ribosyltransferase enzyme family. Major emphasis to date has been ADP-ribosylation's role in cancer; however, there is growing interest in its role in inflammation and cardiovascular disease. Despite a recent boom in ADP-ribosylation mass spectrometry-based proteomics, there are limited computational resources to evaluate the quality of reported ADP-ribosylated (ADPr) proteins. We recently developed a novel mass spectral annotation strategy (RiboMaP) that facilitates identification and reporting of ADPr peptides and proteins. This strategy can monitor the fragmentation properties of ADPr peptide-unique fragment ions, termed m-ions and p-ions, that in turn provide spectral quality scores for candidate ADP-ribosyl peptides. In this study, we leveraged the availability of publicly available ADP-ribosylome data, acquired on various mass spectrometers, to evaluate the broader applicability of RiboMaP. We observed that fragmentation spectra of ADPr peptides vary considerably across datasets; nonetheless, RiboMaP improves ADPr peptide spectral annotation across all studies. We then reanalyzed our own previously published in vitro ADP-ribosylome data to determine common responses to the pro-inflammatory cytokine, IFN-γ. We conclude that despite these recent advances in the field of ADPr proteomics, studies in the context of inflammation and cardiovascular disease still require further bench-to-informatics workflow development in order to capture ADPr signaling events related to inflammatory pathways.
Collapse
Affiliation(s)
- Sasha A. Singh
- Department of Medicine, Center for Interdisciplinary Cardiovascular Sciences, Brigham Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Shiori Kuraoka
- Department of Medicine, Center for Interdisciplinary Cardiovascular Sciences, Brigham Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Diego Vinicius Santinelli Pestana
- Department of Medicine, Center for Interdisciplinary Cardiovascular Sciences, Brigham Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Waqas Nasir
- Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany
| | | | - Masanori Aikawa
- Department of Medicine, Center for Interdisciplinary Cardiovascular Sciences, Brigham Women's Hospital and Harvard Medical School, Boston, MA, United States
- Division of Cardiovascular Medicine, Center for Excellence in Vascular Biology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| |
Collapse
|
3
|
Kumar S, Saeed F. Communication-avoiding micro-architecture to compute Xcorr scores for peptide identification. Int Conf Field Program Log Appl 2021; 2021:99-103. [PMID: 35440952 PMCID: PMC9015013 DOI: 10.1109/fpl53798.2021.00024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Database algorithms play a crucial part in systems biology studies by identifying proteins from mass spectrometry data. Many of these database search algorithms incur huge computational costs by computing similarity scores for each pair of sparse experimental spectrum and candidate theoretical spectrum vectors. Modern MS instrumentation techniques which are capable of generating high-resolution spectrometry data require comparison against an enormous search space, further emphasizing the need of efficient accelerators. Recent research has shown that the overall cost of scoring, and deducing peptides is dominated by the communication costs between different hierarchies of memory and processing units. However, these communication costs are seldom considered in accelerator-based architectures leading to inefficient DRAM accesses, and poor data-utilization due to irregular memory access patterns. In this paper, we propose a novel communication-avoiding micro-architecture to compute cross-correlation based similarity score by utilizing efficient local cache, and peptide pre-fetching to minimize DRAM accesses, and a custom-designed peptide broadcast bus to allow input reuse. An efficient bus arbitration scheme was designed, and implemented to minimize synchronization cost and exploit parallelism of processing elements. Our simulation results show that the proposed micro-architecture performs on average 24x better than a CPU implementation running on a 3.6 GHz Intel i7-4970 processor with 16GB memory.
Collapse
Affiliation(s)
- Sumesh Kumar
- Knight Foundation School of Computing and Information Sciences, Florida International University (FIU), Miami, FL USA 33199
| | - Fahad Saeed
- Knight Foundation School of Computing and Information Sciences, Florida International University (FIU), Miami, FL USA 33199
| |
Collapse
|
4
|
Vincent D, Savin K, Rochfort S, Spangenberg G. The Power of Three in Cannabis Shotgun Proteomics: Proteases, Databases and Search Engines. Proteomes 2020; 8:13. [PMID: 32549361 DOI: 10.3390/proteomes8020013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 06/12/2020] [Accepted: 06/12/2020] [Indexed: 11/29/2022] Open
Abstract
Cannabis research has taken off since the relaxation of legislation, yet proteomics is still lagging. In 2019, we published three proteomics methods aimed at optimizing protein extraction, protein digestion for bottom-up and middle-down proteomics, as well as the analysis of intact proteins for top-down proteomics. The database of Cannabis sativa proteins used in these studies was retrieved from UniProt, the reference repositories for proteins, which is incomplete and therefore underrepresents the genetic diversity of this non-model species. In this fourth study, we remedy this shortcoming by searching larger databases from various sources. We also compare two search engines, the oldest, SEQUEST, and the most popular, Mascot. This shotgun proteomics experiment also utilizes the power of parallel digestions with orthogonal proteases of increasing selectivity, namely chymotrypsin, trypsin/Lys-C and Asp-N. Our results show that the larger the database the greater the list of accessions identified but the longer the duration of the search. Using orthogonal proteases and different search algorithms increases the total number of proteins identified, most of them common despite differing proteases and algorithms, but many of them unique as well.
Collapse
|
5
|
Abstract
The discovery of candidate biomarkers within the entire proteome is one of the most important and challenging goals in proteomic research. Mass spectrometry-based proteomics is a modern and promising technology for semiquantitative and qualitative assessment of proteins, enabling protein sequencing and identification with exquisite accuracy and sensitivity. For mass spectrometry analysis, protein extractions from tissues or body fluids and subsequent protein fractionation represent an important and unavoidable step in the workflow for biomarker discovery. Following extraction of proteins, the protein mixture must be digested, reduced, alkylated, and cleaned up prior to mass spectrometry. The aim of our chapter is to provide comprehensible and practical lab procedures for sample digestion, protein fractionation, and subsequent mass spectrometry analysis.
Collapse
Affiliation(s)
- Weidong Zhou
- Center for Applied Proteomics and Molecular Medicine, George Mason University, 10920 George Mason Circle, MS1A9, Manassas, VA, 20110, USA.
| | - Emanuel F Petricoin
- Center for Applied Proteomics and Molecular Medicine, George Mason University, 10920 George Mason Circle, MS1A9, Manassas, VA, 20110, USA
| | - Caterina Longo
- Center for Applied Proteomics and Molecular Medicine, George Mason University, 10920 George Mason Circle, MS1A9, Manassas, VA, 20110, USA
- Dermatology and Skin Cancer Unit, Arcispedale S Maria Nuova IRCCS, Reggio Emilia, Italy
| |
Collapse
|
6
|
Pascual J, Alegre S, Nagler M, Escandón M, Annacondia ML, Weckwerth W, Valledor L, Cañal MJ. The variations in the nuclear proteome reveal new transcription factors and mechanisms involved in UV stress response in Pinus radiata. J Proteomics 2016; 143:390-400. [PMID: 26961940 DOI: 10.1016/j.jprot.2016.03.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Revised: 02/25/2016] [Accepted: 03/01/2016] [Indexed: 12/31/2022]
Abstract
UNLABELLED The importance of UV stress and its side-effects over the loss of plant productivity in forest species demands a deeper understanding of how pine trees respond to UV irradiation. Although the response to UV stress has been characterized at system and cellular levels, the dynamics within the nuclear proteome triggered by UV is still unknown despite that they are essential for gene expression and regulation of plant physiology. To fill this gap this work aims to characterize the variations in the nuclear proteome as a response to UV irradiation by using state-of-the-art mass spectrometry-based methods combined with novel bioinformatics workflows. The combination of SEQUEST, de novo sequencing, and novel annotation pipelines allowed cover sensing and transduction pathways, endoplasmic reticulum-related mechanisms and the regulation of chromatin dynamism and gene expression by histones, histone-like NF-Ys, and other transcription factors previously unrelated to this stress source, as well as the role of alternative splicing and other mechanisms involved in RNA translation and protein synthesis. The determination of 33 transcription factors, including NF-YB13, Pp005698_3 (NF-YB) and Pr009668_2 (WD-40), which are correlated to stress responsive mechanisms like an increased accumulation of photoprotective pigments and reduced photosynthesis, pointing them as strong candidate biomarkers for breeding programs aimed to improve UV resistance of pine trees. SIGNIFICANCE The description of the nuclear proteome of Pinus radiata combining a classic approach based on the use of SEQUEST and the use of a mass accuracy precursor alignment (MAPA) allowed an unprecedented protein coverage. This workflow provided the methodological basis for characterizing the changes in the nuclear proteome triggered by UV irradiation, allowing the depiction of the nuclear events involved in stress response and adaption. The relevance of some of the discovered proteins will suppose a major advance in stress biology field, also providing a set of transcription factors that can be considered as strong biomarker candidates to select trees more tolerant to UV radiation in forest upgrade programs.
Collapse
|
7
|
Sadygov RG. Using SEQUEST with theoretically complete sequence databases. J Am Soc Mass Spectrom 2015; 26:1858-1864. [PMID: 26238326 PMCID: PMC4607654 DOI: 10.1007/s13361-015-1228-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 05/08/2015] [Accepted: 06/17/2015] [Indexed: 06/04/2023]
Abstract
SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Rovshan G Sadygov
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, 77555, USA.
- Sealy Center for Molecular Medicine, The University of Texas Medical Branch, Galveston, TX, 77555, USA.
| |
Collapse
|
8
|
Abstract
Since its introduction in 1994, SEQUEST has gained many important new capabilities, and a host of successor algorithms have built upon its successes. This Account and Perspective maps the evolution of this important tool and charts the relationships among contributions to the SEQUEST legacy. Many of the changes represented improvements in computing speed by clusters and graphics cards. Mass spectrometry innovations in mass accuracy and activation methods led to shifts in fragment modeling and scoring strategies. These changes, as well as the movement of laboratories and lab members, have led to great diversity among the members of the SEQUEST family. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- David L Tabb
- School of Medicine, Vanderbilt University, Nashville, TN, 37232-8575, USA.
| |
Collapse
|
9
|
Washburn MP. The H-index of 'an approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database'. J Am Soc Mass Spectrom 2015; 26:1799-1803. [PMID: 26091891 DOI: 10.1007/s13361-015-1181-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 04/28/2015] [Accepted: 04/29/2015] [Indexed: 06/04/2023]
Abstract
Over 20 years ago a remarkable paper was published in the Journal of American Society for Mass Spectrometry. This paper from Jimmy Eng, Ashley McCormack, and John Yates described the use of protein databases to drive the interpretation of tandem mass spectra of peptides. This paper now has over 3660 citations and continues to average more than 260 per year over the last decade. This is an amazing scientific achievement. The reason for this is the paper was a cutting edge development at the moment in time when genomes of organisms were being sequenced, protein and peptide mass spectrometry was growing into the field of proteomics, and the power of computing was growing quickly in accordance with Moore's law. This work by the Yates lab grew in importance as genomics, proteomics, and computation all advanced and eventually resulted in the widely used SEQUEST algorithm and platform for the analysis of tandem mass spectrometry data. This commentary provides an analysis of the impact of this paper by analyzing the citations it has generated and the impact of these citing papers. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Michael P Washburn
- Stowers Institute for Medical Research, Kansas City, MO, 64110, USA.
- Departments of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, 66160, USA.
| |
Collapse
|
10
|
Yates JR. Pivotal role of computers and software in mass spectrometry - SEQUEST and 20 years of tandem MS database searching. J Am Soc Mass Spectrom 2015; 26:1804-13. [PMID: 26286455 PMCID: PMC4625908 DOI: 10.1007/s13361-015-1220-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Revised: 06/17/2015] [Accepted: 06/20/2015] [Indexed: 05/15/2023]
Abstract
Advances in computer technology and software have driven developments in mass spectrometry over the last 50 years. Computers and software have been impactful in three areas: the automation of difficult calculations to aid interpretation, the collection of data and control of instruments, and data interpretation. As the power of computers has grown, so too has the utility and impact on mass spectrometers and their capabilities. This has been particularly evident in the use of tandem mass spectrometry data to search protein and nucleotide sequence databases to identify peptide and protein sequences. This capability has driven the development of many new approaches to study biological systems, including the use of "bottom-up shotgun proteomics" to directly analyze protein mixtures. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- John R Yates
- Chemical Physiology and Molecular and Cellular Neurobiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR302B, La Jolla, CA, 92037, USA.
| |
Collapse
|
11
|
Romero-Rodríguez MC, Pascual J, Valledor L, Jorrín-Novo J. Improving the quality of protein identification in non-model species. Characterization of Quercus ilex seed and Pinus radiata needle proteomes by using SEQUEST and custom databases. J Proteomics 2014; 105:85-91. [PMID: 24508333 DOI: 10.1016/j.jprot.2014.01.027] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2013] [Accepted: 01/27/2014] [Indexed: 01/10/2023]
Abstract
UNLABELLED Nowadays the most used pipeline for protein identification consists in the comparison of the MS/MS spectra to reference databases. Search algorithms compare obtained spectra to an in silico digestion of a sequence database to find exact matches. In this context, the database has a paramount importance and will determine in a great deal the number of identifications and its quality, being this especially relevant for non-model plant species. Using a single Viridiplantae database (NCBI, UniProt) and TAIR is not the best choice for non-model species since they are underrepresented in databases resulting in poor identification rates. We demonstrate how it is possible to improve the rate and quality of identifications in two orphan species, Quercus ilex and Pinus radiata, by using SEQUEST and a combination of public (Viridiplantae NCBI, UniProt) and a custom-built specific database which contained 593,294 and 455,096 peptide sequences (Quercus and Pinus, respectively). These databases were built after gathering and processing (trimming, contiging, 6-frame translation) publicly available RNA sequences, mostly ESTs and NGS reads. A total of 149 and 1533 proteins were identified from Quercus seeds and Pinus needles, representing a 3.1- or 1.5-fold increase in the number of protein identifications and scores compared to the use of a single database. Since this approach greatly improves the identification rate, and is not significantly more complicated or time consuming than other approaches, we recommend its routine use when working with non-model species. BIOLOGICAL SIGNIFICANCE In this work we demonstrate how the construction of a custom database (DB) gathering all available RNA sequences and its use in combination with Viridiplantae public DBs (NCBI, UniProt) significantly improve protein identification when working with non-model species. Protein identification rate and quality is higher to those obtained in routine procedures based on using only one database (commonly Viridiplantae from NCBI), as we demonstrated analyzing Quercus seeds and Pine needles. The proposed approach based on the building of a custom database is not difficult or time consuming, so we recommend its routine use when working with non-model species. This article is part of a Special Issue entitled: Proteomics of non-model organisms.
Collapse
Affiliation(s)
- M Cristina Romero-Rodríguez
- Agricultural and Plant Biochemistry and Proteomics Research Group, Dept. of Biochemistry and Molecular Biology, University of Córdoba, Spain
| | - Jesús Pascual
- Plant Physiology, Faculty of Biology, Dept. of Organisms and Systems Biology, University of Oviedo, Spain
| | - Luis Valledor
- Dept. of Biology & Centre for Environmental and Marine Studies, University of Aveiro, Aveiro, Portugal; GCRC, Adaption Biotechnologies, Academy of Sciences of the Czech Republic, Brno, Czech Republic.
| | - Jesús Jorrín-Novo
- Agricultural and Plant Biochemistry and Proteomics Research Group, Dept. of Biochemistry and Molecular Biology, University of Córdoba, Spain.
| |
Collapse
|