1
|
Allmer J. Comprehensive Peptide Mapping Is Crucial for Proteogenomics and Proteomics. Methods Mol Biol 2025; 2859:39-51. [PMID: 39436595 DOI: 10.1007/978-1-0716-4152-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Proteogenomics enables the confirmation and refinement of gene models, the detection of new ones, and the proposition of alternative transcripts using support at the protein level. Such evidence is usually generated using mass spectrometry and subsequent result mapping to various sequence databases. This workflow entails several problems: (1) To speed up the analysis, only a small set of expected proteins is searched; (2) database search tools generally do not provide mapping to the genome; and (3) upon new releases of the sequence databases, expensive rerunning of all results would need to be performed. Therefore, fast and accurate peptide mapping is needed as part of proteogenomic pipelines. Unfortunately, some available tools have technical shortcomings. Thus, a set of test cases was developed to allow tool developers to test their implementations comprehensively. The need for comprehensive testing is exemplified by PGx and PGM, two published tools that could only solve a subset of test cases. Lelantos passed all test cases. A set of comprehensive test cases has been developed to overcome these issues. Many unpublished peptide mapping tools are part of proteogenomic workflows, and such tools would also benefit from comprehensive testing. Finally, peptide mapping may also be crucial for proteomics because sequence databases change over time. In response, peptide remapping should be performed to ensure that peptides identifying a protein are proteotypic in a larger sequence context.
Collapse
Affiliation(s)
- Jens Allmer
- Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim adR., Germany.
| |
Collapse
|
2
|
Reilly L, Seddighi S, Singleton AB, Cookson MR, Ward ME, Qi YA. Variant biomarker discovery using mass spectrometry-based proteogenomics. FRONTIERS IN AGING 2023; 4:1191993. [PMID: 37168844 PMCID: PMC10165118 DOI: 10.3389/fragi.2023.1191993] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 04/13/2023] [Indexed: 05/13/2023]
Abstract
Genomic diversity plays critical roles in risk of disease pathogenesis and diagnosis. While genomic variants-including single nucleotide variants, frameshift variants, and mis-splicing isoforms-are commonly detected at the DNA or RNA level, their translated variant protein or polypeptide products are ultimately the functional units of the associated disease. These products are often released in biofluids and could be leveraged for clinical diagnosis and patient stratification. Recent emergence of integrated analysis of genomics with mass spectrometry-based proteomics for biomarker discovery, also known as proteogenomics, have significantly advanced the understanding disease risk variants, precise medicine, and biomarker discovery. In this review, we discuss variant proteins in the context of cancers and neurodegenerative diseases, outline current and emerging proteogenomic approaches for biomarker discovery, and provide a comprehensive proteogenomic strategy for detection of putative biomarker candidates in human biospecimens. This strategy can be implemented for proteogenomic studies in any field of enquiry. Our review timely addresses the need of biomarkers for aging related diseases.
Collapse
Affiliation(s)
- Luke Reilly
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - Sahba Seddighi
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - Andrew B. Singleton
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, United States
| | - Mark R. Cookson
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, United States
| | - Michael E. Ward
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| | - Yue A. Qi
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
3
|
Chen W, Liu X. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. J Proteome Res 2020; 20:261-269. [PMID: 33183009 DOI: 10.1021/acs.jproteome.0c00369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| |
Collapse
|
4
|
Schlaffner CN, Pirklbauer GJ, Bender A, Choudhary JS. Fast, Quantitative and Variant Enabled Mapping of Peptides to Genomes. Cell Syst 2019; 5:152-156.e4. [PMID: 28837811 PMCID: PMC5571441 DOI: 10.1016/j.cels.2017.07.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 03/24/2017] [Accepted: 07/26/2017] [Indexed: 12/24/2022]
Abstract
Current tools for visualization and integration of proteomics with other omics datasets are inadequate for large-scale studies and capture only basic sequence identity information. Furthermore, the frequent reformatting of annotations for reference genomes required by these tools is known to be highly error prone. We developed PoGo for mapping peptides identified through mass spectrometry to overcome these limitations. PoGo reduced runtime and memory usage by 85% and 20%, respectively, and exhibited overall superior performance over other tools on benchmarking with large-scale human tissue and cancer phosphoproteome datasets comprising ∼3 million peptides. In addition, extended functionality enables representation of single-nucleotide variants, post-translational modifications, and quantitative features. PoGo has been integrated in established frameworks such as the PRIDE tool suite and OpenMS, as well as a standalone tool with user-friendly graphical interface. With the rapid increase of quantitative high-resolution datasets capturing proteomes and global modifications to complement orthogonal genomics platforms, PoGo provides a central utility enabling large-scale visualization and interpretation of transomics datasets.
Collapse
Affiliation(s)
- Christoph N Schlaffner
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK; Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB2 1EW, UK.
| | - Georg J Pirklbauer
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, Cambridgeshire CB2 1EW, UK
| | - Jyoti S Choudhary
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| |
Collapse
|
5
|
Low TY, Mohtar MA, Ang MY, Jamal R. Connecting Proteomics to Next‐Generation Sequencing: Proteogenomics and Its Current Applications in Biology. Proteomics 2018; 19:e1800235. [DOI: 10.1002/pmic.201800235] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/09/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Teck Yew Low
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - M. Aiman Mohtar
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Mia Yang Ang
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| | - Rahman Jamal
- UKM Medical Molecular Biology Institute (UMBI)Universiti Kebangsaan Malaysia 56000 Kuala Lumpur Malaysia
| |
Collapse
|
6
|
Schlaffner CN, Pirklbauer GJ, Bender A, Steen JAJ, Choudhary JS. A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes. J Vis Exp 2018. [PMID: 29889196 PMCID: PMC6101353 DOI: 10.3791/57633] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Cross-talk between genes, transcripts, and proteins is the key to cellular responses; hence, analysis of molecular levels as distinct entities is slowly being extended to integrative studies to enhance the understanding of molecular dynamics within cells. Current tools for the visualization and integration of proteomics with other omics datasets are inadequate for large-scale studies. Furthermore, they only capture basic sequence identify, discarding post-translational modifications and quantitation. To address these issues, we developed PoGo to map peptides with associated post-translational modifications and quantification to reference genome annotation. In addition, the tool was developed to enable the mapping of peptides identified from customized sequence databases incorporating single amino acid variants. While PoGo is a command line tool, the graphical interface PoGoGUI enables non-bioinformatics researchers to easily map peptides to 25 species supported by Ensembl genome annotation. The generated output borrows file formats from the genomics field and, therefore, visualization is supported in most genome browsers. For large-scale studies, PoGo is supported by TrackHubGenerator to create web-accessible repositories of data mapped to genomes that also enable an easy sharing of proteogenomics data. With little effort, this tool can map millions of peptides to reference genomes within only a few minutes, outperforming other available sequence-identity based tools. This protocol demonstrates the best approaches for proteogenomics mapping through PoGo with publicly available datasets of quantitative and phosphoproteomics, as well as large-scale studies.
Collapse
Affiliation(s)
- Christoph N Schlaffner
- Department of Neurobiology, F. M. Kirby Neurobiology Center, Boston Children's Hospital, Harvard Medical School; Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Wellcome Genome Campus; Centre for Molecular Informatics, Department of Chemistry, University of Cambridge;
| | - Georg J Pirklbauer
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Wellcome Genome Campus
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge
| | - Judith A J Steen
- Department of Neurobiology, F. M. Kirby Neurobiology Center, Boston Children's Hospital, Harvard Medical School
| | - Jyoti S Choudhary
- Proteomic Mass Spectrometry, Wellcome Trust Sanger Institute, Wellcome Genome Campus; Functional Proteomics Group, Chester Beatty Laboratories, Institute of Cancer Research
| |
Collapse
|
7
|
Mostovenko E, Végvári Á, Rezeli M, Lichti CF, Fenyö D, Wang Q, Lang FF, Sulman EP, Sahlin KB, Marko-Varga G, Nilsson CL. Large Scale Identification of Variant Proteins in Glioma Stem Cells. ACS Chem Neurosci 2018; 9:73-79. [PMID: 29254333 PMCID: PMC6008157 DOI: 10.1021/acschemneuro.7b00362] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Glioblastoma (GBM), the most malignant of primary brain tumors, is a devastating and deadly disease, with a median survival of 14 months from diagnosis, despite standard regimens of radical brain tumor surgery, maximal safe radiation, and concomitant chemotherapy. GBM tumors nearly always re-emerge after initial treatment and frequently display resistance to current treatments. One theory that may explain GBM re-emergence is the existence of glioma stemlike cells (GSCs). We sought to identify variant protein features expressed in low passage GSCs derived from patient tumors. To this end, we developed a proteomic database that reflected variant and nonvariant sequences in the human proteome, and applied a novel retrograde proteomic workflow, to identify and validate the expression of 126 protein variants in 33 glioma stem cell strains. These newly identified proteins may harbor a subset of novel protein targets for future development of GBM therapy.
Collapse
Affiliation(s)
- Ekaterina Mostovenko
- Department of Anatomy and Neurobiology, Virginia Commonwealth University School of Medicine, 1217 E. Marshall St., Richmond, VA 23284
| | - Ákos Végvári
- Clinical Protein Science & Imaging, Biomedical Center, Department of Biomedical Engineering, Lund University, SE-221 84 Lund, Sweden
| | - Melinda Rezeli
- Clinical Protein Science & Imaging, Biomedical Center, Department of Biomedical Engineering, Lund University, SE-221 84 Lund, Sweden
| | - Cheryl F. Lichti
- Department of Anatomy and Neurobiology, Virginia Commonwealth University School of Medicine, 1217 E. Marshall St., Richmond, VA 23284
- Department of Pathology and Immunology, Washington University School of Medicine, 660 S. Euclid Ave., St. Louis, Missouri, 63110
| | - David Fenyö
- Department of Biochemistry and Molecular Pharmacology and Institute for Systems Genetics, New York University School of Medicine, New York City, New York 10016, United States
| | - Qianghu Wang
- Department of Genomic Medicine, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, United States
- Department of Radiation Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, United States
| | - Frederick F. Lang
- Department of Neurosurgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, United States
| | - Erik P. Sulman
- Department of Genomic Medicine, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, United States
- Department of Radiation Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, United States
- Translational Molecular Pathology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas 77030, United States
| | - K. Barbara Sahlin
- Clinical Protein Science & Imaging, Biomedical Center, Department of Biomedical Engineering, Lund University, SE-221 84 Lund, Sweden
| | - György Marko-Varga
- Clinical Protein Science & Imaging, Biomedical Center, Department of Biomedical Engineering, Lund University, SE-221 84 Lund, Sweden
| | - Carol L. Nilsson
- Center of Excellence in Biological and Medical Mass Spectrometry, Lund University, Klinikgatan 32, Lund, SE-221 84 Sweden
| |
Collapse
|
8
|
Dimitrakopoulos L, Prassas I, Diamandis EP, Charames GS. Onco-proteogenomics: Multi-omics level data integration for accurate phenotype prediction. Crit Rev Clin Lab Sci 2017; 54:414-432. [DOI: 10.1080/10408363.2017.1384446] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Affiliation(s)
- Lampros Dimitrakopoulos
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Ioannis Prassas
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
| | - Eleftherios P. Diamandis
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
- Department of Clinical Biochemistry, University Health Network, Toronto, ON, Canada
| | - George S. Charames
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Joseph and Wolf Lebovic Health Complex, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| |
Collapse
|
9
|
Wang X, Mooradian AD, Erdmann-Gilmore P, Zhang Q, Viner R, Davies SR, Huang KL, Bomgarden R, Van Tine BA, Shao J, Ding L, Li S, Ellis MJ, Rogers JC, Townsend RR, Fenyö D, Held JM. Breast tumors educate the proteome of stromal tissue in an individualized but coordinated manner. Sci Signal 2017; 10:10/491/eaam8065. [PMID: 28790197 DOI: 10.1126/scisignal.aam8065] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Cancer forms specialized microenvironmental niches that promote local invasion and colonization. Engrafted patient-derived xenografts (PDXs) locally invade and colonize naïve stroma in mice while enabling unambiguous molecular discrimination of human proteins in the tumor from mouse proteins in the microenvironment. To characterize how patient breast tumors form a niche and educate naïve stroma, subcutaneous breast cancer PDXs were globally profiled by species-specific quantitative proteomics. Regulation of PDX stromal proteins by breast tumors was extensive, with 35% of the stromal proteome altered by tumors consistently across different animals and passages. Differentially regulated proteins in the stroma clustered into six signatures, which included both known and previously unappreciated contributors to tumor invasion and colonization. Stromal proteomes were coordinately regulated; however, the sets of proteins altered by each tumor were highly distinct. Integrated analysis of tumor and stromal proteins, a comparison made possible in these xenograft models, indicated that the known hallmarks of cancer contribute pleiotropically to establishing and maintaining the microenvironmental niche of the tumor. Education of the stroma by the tumor is therefore an intrinsic property of breast tumors that is highly individualized, yet proceeds by consistent, nonrandom, and defined tumor-promoting molecular alterations.
Collapse
Affiliation(s)
- Xuya Wang
- Institute for Systems Genetics, New York University School of Medicine, New York, NY 10016, USA.,Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY 10016, USA
| | - Arshag D Mooradian
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - Petra Erdmann-Gilmore
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - Qiang Zhang
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - Rosa Viner
- Thermo Fisher Scientific, San Jose, CA 95134, USA
| | - Sherri R Davies
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - Kuan-Lin Huang
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | | | - Brian A Van Tine
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA.,Siteman Cancer Center, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - Jieya Shao
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA.,Siteman Cancer Center, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - Li Ding
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA.,Siteman Cancer Center, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA.,McDonnell Genome Institute, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - Shunqiang Li
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA.,Siteman Cancer Center, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - Matthew J Ellis
- Lester and Sue Smith Breast Center, Department of Oncology, and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - R Reid Townsend
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA.,Siteman Cancer Center, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| | - David Fenyö
- Institute for Systems Genetics, New York University School of Medicine, New York, NY 10016, USA. .,Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, NY 10016, USA
| | - Jason M Held
- Department of Medicine, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA. .,Siteman Cancer Center, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA.,Department of Anesthesiology, Washington University in Saint Louis Medical School, St. Louis, MO 63110, USA
| |
Collapse
|
10
|
Komor MA, Pham TV, Hiemstra AC, Piersma SR, Bolijn AS, Schelfhorst T, Delis-van Diemen PM, Tijssen M, Sebra RP, Ashby M, Meijer GA, Jimenez CR, Fijneman RJA. Identification of Differentially Expressed Splice Variants by the Proteogenomic Pipeline Splicify. Mol Cell Proteomics 2017; 16:1850-1863. [PMID: 28747380 DOI: 10.1074/mcp.tir117.000056] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Indexed: 12/20/2022] Open
Abstract
Proteogenomics, i.e. comprehensive integration of genomics and proteomics data, is a powerful approach identifying novel protein biomarkers. This is especially the case for proteins that differ structurally between disease and control conditions. As tumor development is associated with aberrant splicing, we focus on this rich source of cancer specific biomarkers. To this end, we developed a proteogenomic pipeline, Splicify, which can detect differentially expressed protein isoforms. Splicify is based on integrating RNA massive parallel sequencing data and tandem mass spectrometry proteomics data to identify protein isoforms resulting from differential splicing between two conditions. Proof of concept was obtained by applying Splicify to RNA sequencing and mass spectrometry data obtained from colorectal cancer cell line SW480, before and after siRNA-mediated downmodulation of the splicing factors SF3B1 and SRSF1. These analyses revealed 2172 and 149 differentially expressed isoforms, respectively, with peptide confirmation upon knock-down of SF3B1 and SRSF1 compared with their controls. Splice variants identified included RAC1, OSBPL3, MKI67, and SYK. One additional sample was analyzed by PacBio Iso-Seq full-length transcript sequencing after SF3B1 downmodulation. This analysis verified the alternative splicing identified by Splicify and in addition identified novel splicing events that were not represented in the human reference genome annotation. Therefore, Splicify offers a validated proteogenomic data analysis pipeline for identification of disease specific protein biomarkers resulting from mRNA alternative splicing. Splicify is publicly available on GitHub (https://github.com/NKI-TGO/SPLICIFY) and suitable to address basic research questions using pre-clinical model systems as well as translational research questions using patient-derived samples, e.g. allowing to identify clinically relevant biomarkers.
Collapse
Affiliation(s)
- Malgorzata A Komor
- From the ‡Translational Gastrointestinal Oncology, Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands.,§Oncoproteomics Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, the Netherlands
| | - Thang V Pham
- §Oncoproteomics Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, the Netherlands
| | - Annemieke C Hiemstra
- From the ‡Translational Gastrointestinal Oncology, Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Sander R Piersma
- §Oncoproteomics Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, the Netherlands
| | - Anne S Bolijn
- From the ‡Translational Gastrointestinal Oncology, Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Tim Schelfhorst
- §Oncoproteomics Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, the Netherlands
| | - Pien M Delis-van Diemen
- From the ‡Translational Gastrointestinal Oncology, Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Marianne Tijssen
- From the ‡Translational Gastrointestinal Oncology, Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Robert P Sebra
- ¶School of Medicine at Mount Sinai, Institute for Genomics and Multiscale Biology, New York, New York
| | | | - Gerrit A Meijer
- From the ‡Translational Gastrointestinal Oncology, Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Connie R Jimenez
- §Oncoproteomics Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, the Netherlands
| | - Remond J A Fijneman
- From the ‡Translational Gastrointestinal Oncology, Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands;
| |
Collapse
|
11
|
Choi S, Kim H, Paek E. ACTG: novel peptide mapping onto gene models. Bioinformatics 2017; 33:1218-1220. [PMID: 28031186 DOI: 10.1093/bioinformatics/btw787] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2016] [Accepted: 12/07/2016] [Indexed: 01/01/2023] Open
Abstract
Summary In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Availability and Implementation Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . Contact eunokpaek@hanyang.ac.kr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Seunghyuk Choi
- Department of Computer Science, Hanyang University, Seongdong-gu, Seoul, Korea
| | - Hyunwoo Kim
- Scientific Data Technology Lab, Korea Institute of Science and Technology Information, Yuseong-gu, Daejeon, 34141, Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seongdong-gu, Seoul, Korea
| |
Collapse
|
12
|
Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 2017; 16:959-981. [PMID: 28456751 DOI: 10.1074/mcp.mr117.000024] [Citation(s) in RCA: 95] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/20/2022] Open
Abstract
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications.
Collapse
Affiliation(s)
- Kelly V Ruggles
- From the ‡Department of Medicine, New York University School of Medicine, New York, New York 10016
| | - Karsten Krug
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Xiaojing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Karl R Clauser
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Jing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Samuel H Payne
- **Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354
| | - David Fenyö
- ‡‡Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016; .,§§Institute for Systems Genetics, New York University School of Medicine, New York, New York 10016
| | - Bing Zhang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030; .,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - D R Mani
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142;
| |
Collapse
|
13
|
|
14
|
Kohlbacher O, Vitek O, Weintraub ST. Challenges in Large-Scale Computational Mass Spectrometry and Multiomics. J Proteome Res 2016; 15:681-2. [DOI: 10.1021/acs.jproteome.6b00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Oliver Kohlbacher
- Center for Bioinformatics, Quantitative Biology Center,
Department of Computer Science and Faculty of Medicine, University
of Tübingen and Max Planck Institute for Developmental Biology
| | - Olga Vitek
- Sy and Laurie Sternberg Interdisciplinary Associate
Professor, College of Science College of Computer and Information
Science, Northeastern University
| | - Susan T. Weintraub
- Department of Biochemistry, The University of Texas
Health Science Center at San Antonio
| |
Collapse
|
15
|
Next Generation Sequencing Data and Proteogenomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:11-19. [DOI: 10.1007/978-3-319-42316-6_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
16
|
Proteogenomic Tools and Approaches to Explore Protein Coding Landscapes of Eukaryotic Genomes. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:1-10. [DOI: 10.1007/978-3-319-42316-6_1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|