1
|
Zhong J, Luo Y, Yang C, Yuan M, Wang S. ResNeXt-Based Rescoring Model for Proteoform Characterization in Top-Down Mass Spectra. Interdiscip Sci 2025:10.1007/s12539-025-00701-x. [PMID: 40381130 DOI: 10.1007/s12539-025-00701-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 03/02/2025] [Accepted: 03/04/2025] [Indexed: 05/19/2025]
Abstract
In top-down proteomics, the accurate identification and characterization of proteoform through mass spectrometry represents a critical objective. As a result, achieving accuracy in identification results is essential. Multiple primary structure alterations in proteins generate a diverse range of proteoforms, resulting in an exponential increase in potential proteoform. Moreover, the absence of a definitive reference set complicates the standardization of results. Therefore, enhancing the accuracy of proteoform characterization continues to be a significant challenge. We introduced a ResNeXt-based deep learning model, PrSMBooster, for rescoring proteoform spectrum matches (PrSM) during proteoform characterization. As an ensemble method, PrSMBooster integrates four machine learning models, logistic regression, XGBoost, decision tree, and support vector machine, as weak learners to obtain PrSM features. The basic and latent features of PrSM are subsequently input into the ResNeXt model for final rescoring. To verify the effect and accuracy of the PrSMBooster model in rescoring proteoform characterization, it was compared with the characterization algorithm TopPIC across 47 independent mass spectrometry datasets from various species. The experimental results indicate that in most mass spectrometry datasets, the number of PrSMs obtained after rescoring with PrSMBooster increases at a false discovery rate (FDR) of 1%. Further analysis of the experimental results confirmed that PrSMBooster improves the accuracy of PrSM scoring, generates more mass spectrometry characterization results, and demonstrates strong generalization ability.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Yicheng Luo
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Chen Yang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Maoqi Yuan
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410081, China
| | - Shaokai Wang
- Department of Mathematics, Hong Kong University of Science and Technology, 999077, Hong Kong SAR, China.
| |
Collapse
|
2
|
Knott SJ, Tucholski T, Josyer H, Inman D, Friedl A, Zhu Y, Ge Y, Ponik SM. Deciphering Proteoform Landscape of Mammary Carcinoma by Top-Down Proteomics. J Proteome Res 2025; 24:1425-1438. [PMID: 39936522 PMCID: PMC12006981 DOI: 10.1021/acs.jproteome.4c01044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2025]
Abstract
Defining the proteoform landscape of breast cancer can provide unique insights into the signaling pathways driving disease progression. While bottom-up proteomics has been utilized to profile breast cancer, it lacks the ability to capture intact proteoforms that may underpin the disease. Top-down proteomics is ideally suited to characterize intact proteoforms; however, most top-down proteomics studies have been limited to low molecular weight (MW) proteins (<50 kDa). Herein, we employed a two-dimensional (2D) liquid chromatography combining size exclusion chromatography (SEC) with reverse phase chromatography (RPC) followed by high-resolution mass spectrometry (MS) to expand the coverage for high MW proteoforms. Using this 2D-SEC-RPC-MS approach, we observed a 5-fold increase in the detection of high MW proteoforms (>50 kDa) compared to the conventional 1D-RPC-MS. SEC separation significantly enhanced the detection of high MW proteoforms (>104 kDa), including intermediate filament proteins, vimentin and keratins. Based on accurate mass measurements and MS/MS data, we identified 775 proteoforms from both TFA and HEPES extracts and detected PTMs, such as acetylation, glutathionylation, and myristoylation. Pathway analysis uncovered many proteoforms involved in processes dysregulated in cancer progression. Overall, our findings illustrate the power of top-down proteomics in defining the proteoform landscape of breast carcinoma.
Collapse
Affiliation(s)
- Samantha J. Knott
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave., Madison, Wisconsin 53706, USA
| | - Trisha Tucholski
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave., Madison, Wisconsin 53706, USA
| | - Harini Josyer
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, 1111 Highland Ave., Madison, Wisconsin 53705, USA
| | - David Inman
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, 1111 Highland Ave., Madison, Wisconsin 53705, USA
| | - Andreas Friedl
- Department of Pathology and Laboratory Medicine, University of Wisconsin-Madison, 1685 Highland Ave., Madison, Wisconsin 53705, USA
| | - Yanlong Zhu
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, 1111 Highland Ave., Madison, Wisconsin 53705, USA
- Human Proteomics Program, University of Wisconsin-Madison, 1111 Highland Ave., Madison, Wisconsin, 53705, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave., Madison, Wisconsin 53706, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, 1111 Highland Ave., Madison, Wisconsin 53705, USA
- Human Proteomics Program, University of Wisconsin-Madison, 1111 Highland Ave., Madison, Wisconsin, 53705, USA
| | - Suzanne M. Ponik
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, 1111 Highland Ave., Madison, Wisconsin 53705, USA
- Carbone Cancer Center, University of Wisconsin-Madison, 1111 Highland Ave., Madison, Wisconsin, 53705, USA
| |
Collapse
|
3
|
Palasser M, Breuker K. FAST MS: Software for the Automated Analysis of Top-Down Mass Spectra of Polymeric Molecules Including RNA, DNA, and Proteins. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2025; 36:247-257. [PMID: 39715325 PMCID: PMC11808778 DOI: 10.1021/jasms.4c00236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 12/02/2024] [Accepted: 12/10/2024] [Indexed: 12/25/2024]
Abstract
Top-down mass spectrometry (MS) enables comprehensive characterization of modified proteins and nucleic acids and, when native electrospray ionization (ESI) is used, binding site mapping of their complexes with native or therapeutic ligands. However, the high complexity of top-down MS spectra poses a serious challenge to both manual and automated data interpretation, even when the protein, RNA, or DNA sequence and the type of modification or the ligand are known. Here, we introduce FAST MS, a user-friendly software that identifies, assigns and relatively quantifies signals of molecular and fragment ions in MS and MS/MS spectra of biopolymers with known sequence and provides a toolbox for statistical analysis. FAST MS searches mass spectra for ion signals by comparing all signals in the spectrum with isotopic profiles calculated from known sequences, resulting in superior sensitivity and an increased number of assigned fragment ions compared to algorithms that rely on artificial monomer units while maintaining the false positive rate on a moderate level (<5%). FAST MS is an open-source, cross-platform software for the accurate identification, localization and relative quantification of modifications, even in complex mixtures of positional isomers of proteins, oligonucleotides, or any other user-defined linear polymer.
Collapse
Affiliation(s)
| | - Kathrin Breuker
- Institute of Organic Chemistry
and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, 6020 Innsbruck, Austria
| |
Collapse
|
4
|
Robey M, Durbin KR. Improving Top-Down Sequence Coverage with Targeted Fragment Matching. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:3296-3300. [PMID: 39437430 PMCID: PMC11623164 DOI: 10.1021/jasms.4c00161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 09/23/2024] [Accepted: 10/10/2024] [Indexed: 10/25/2024]
Abstract
Top-down mass spectrometry (TDMS) of intact proteins and antibodies enables direct determination of truncations, sequence variants, post-translational modifications, and disulfides without the need for any proteolytic cleavage. While mass deconvolution of top-down tandem mass spectra is typically used to identify fragment masses for matching to candidate proteoforms, larger molecules such as monoclonal antibodies can produce many fragment ions, making spectral interpretation challenging. Here, we explore an alternative approach for proteoform spectral matching that is better suited for larger protein analysis. This workflow uses direct matching of theoretical proteoform isotopic distributions to TDMS spectra, avoiding drawbacks of mass deconvolution such as poor sensitivity and problems differentiating overlapping distributions. Using a data set that analyzed an intact NIST monoclonal antibody across different fragmentation modes, we show that this isotope fitting strategy increased the sequence coverage of both light and heavy chain sequences >3-fold. We further found that isotope fitting is particularly amenable to identifying large fragments, including those near the hinge region that have been traditionally difficult to analyze by top-down methods. These advances in proteoform spectral matching can greatly increase the power of top-down analyses for intact biotherapeutics and other large molecules.
Collapse
|
5
|
Roberts DS, Loo JA, Tsybin YO, Liu X, Wu S, Chamot-Rooke J, Agar JN, Paša-Tolić L, Smith LM, Ge Y. Top-down proteomics. NATURE REVIEWS. METHODS PRIMERS 2024; 4:38. [PMID: 39006170 PMCID: PMC11242913 DOI: 10.1038/s43586-024-00318-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 04/24/2024] [Indexed: 07/16/2024]
Abstract
Proteoforms, which arise from post-translational modifications, genetic polymorphisms and RNA splice variants, play a pivotal role as drivers in biology. Understanding proteoforms is essential to unravel the intricacies of biological systems and bridge the gap between genotypes and phenotypes. By analysing whole proteins without digestion, top-down proteomics (TDP) provides a holistic view of the proteome and can decipher protein function, uncover disease mechanisms and advance precision medicine. This Primer explores TDP, including the underlying principles, recent advances and an outlook on the future. The experimental section discusses instrumentation, sample preparation, intact protein separation, tandem mass spectrometry techniques and data collection. The results section looks at how to decipher raw data, visualize intact protein spectra and unravel data analysis. Additionally, proteoform identification, characterization and quantification are summarized, alongside approaches for statistical analysis. Various applications are described, including the human proteoform project and biomedical, biopharmaceutical and clinical sciences. These are complemented by discussions on measurement reproducibility, limitations and a forward-looking perspective that outlines areas where the field can advance, including potential future applications.
Collapse
Affiliation(s)
- David S Roberts
- Department of Chemistry, Stanford University, Stanford, CA, USA
- Sarafan ChEM-H, Stanford University, Stanford, CA, USA
| | - Joseph A Loo
- Department of Chemistry and Biochemistry, Department of Biological Chemistry, University of California - Los Angeles, Los Angeles, CA, USA
| | | | - Xiaowen Liu
- Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, The University of Alabama, Tuscaloosa, AL, USA
| | | | - Jeffrey N Agar
- Departments of Chemistry and Chemical Biology and Pharmaceutical Sciences, Northeastern University, Boston, MA, USA
| | - Ljiljana Paša-Tolić
- Environmental and Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, WI, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin, Madison, WI, USA
- Department of Cell and Regenerative Biology, Human Proteomics Program, University of Wisconsin - Madison, Madison, WI, USA
| |
Collapse
|
6
|
Faizi M, Fellers RT, Lu D, Drown BS, Jambhekar A, Lahav G, Kelleher NL, Gunawardena J. MSModDetector: a tool for detecting mass shifts and post-translational modifications in individual ion mass spectrometry data. Bioinformatics 2024; 40:btae335. [PMID: 38796681 PMCID: PMC11157153 DOI: 10.1093/bioinformatics/btae335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 02/28/2024] [Accepted: 05/24/2024] [Indexed: 05/28/2024] Open
Abstract
MOTIVATION Post-translational modifications (PTMs) on proteins regulate protein structures and functions. A single protein molecule can possess multiple modification sites that can accommodate various PTM types, leading to a variety of different patterns, or combinations of PTMs, on that protein. Different PTM patterns can give rise to distinct biological functions. To facilitate the study of multiple PTMs on the same protein molecule, top-down mass spectrometry (MS) has proven to be a useful tool to measure the mass of intact proteins, thereby enabling even PTMs at distant sites to be assigned to the same protein molecule and allowing determination of how many PTMs are attached to a single protein. RESULTS We developed a Python module called MSModDetector that studies PTM patterns from individual ion mass spectrometry (I2MS) data. I2MS is an intact protein mass spectrometry approach that generates true mass spectra without the need to infer charge states. The algorithm first detects and quantifies mass shifts for a protein of interest and subsequently infers potential PTM patterns using linear programming. The algorithm is evaluated on simulated I2MS data and experimental I2MS data for the tumor suppressor protein p53. We show that MSModDetector is a useful tool for comparing a protein's PTM pattern landscape across different conditions. An improved analysis of PTM patterns will enable a deeper understanding of PTM-regulated cellular processes. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/marjanfaizi/MSModDetector.
Collapse
Affiliation(s)
- Marjan Faizi
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA 02115, United States
| | - Ryan T Fellers
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL 60208, United States
| | - Dan Lu
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA 02115, United States
| | - Bryon S Drown
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL 60208, United States
| | - Ashwini Jambhekar
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA 02115, United States
| | - Galit Lahav
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA 02115, United States
| | - Neil L Kelleher
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL 60208, United States
| | - Jeremy Gunawardena
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
7
|
Carr AV, Bollis NE, Pavek JG, Shortreed MR, Smith LM. Spectral averaging with outlier rejection algorithms to increase identifications in top-down proteomics. Proteomics 2024; 24:e2300234. [PMID: 38487981 PMCID: PMC11216233 DOI: 10.1002/pmic.202300234] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 02/15/2024] [Accepted: 02/29/2024] [Indexed: 04/05/2024]
Abstract
The identification of proteoforms by top-down proteomics requires both high quality fragmentation spectra and the neutral mass of the proteoform from which the fragments derive. Intact proteoform spectra can be highly complex and may include multiple overlapping proteoforms, as well as many isotopic peaks and charge states. The resulting lower signal-to-noise ratios for intact proteins complicates downstream analyses such as deconvolution. Averaging multiple scans is a common way to improve signal-to-noise, but mass spectrometry data contains artifacts unique to it that can degrade the quality of an averaged spectra. To overcome these limitations and increase signal-to-noise, we have implemented outlier rejection algorithms to remove outlier measurements efficiently and robustly in a set of MS1 scans prior to averaging. We have implemented averaging with rejection algorithms in the open-source, freely available, proteomics search engine MetaMorpheus. Herein, we report the application of the averaging with rejection algorithms to direct injection and online liquid chromatography mass spectrometry data. Averaging with rejection algorithms demonstrated a 45% increase in the number of proteoforms detected in Jurkat T cell lysate. We show that the increase is due to improved spectral quality, particularly in regions surrounding isotopic envelopes.
Collapse
Affiliation(s)
- Austin V Carr
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Nicholas E Bollis
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - John G Pavek
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
8
|
Zancolli G, von Reumont BM, Anderluh G, Caliskan F, Chiusano ML, Fröhlich J, Hapeshi E, Hempel BF, Ikonomopoulou MP, Jungo F, Marchot P, de Farias TM, Modica MV, Moran Y, Nalbantsoy A, Procházka J, Tarallo A, Tonello F, Vitorino R, Zammit ML, Antunes A. Web of venom: exploration of big data resources in animal toxin research. Gigascience 2024; 13:giae054. [PMID: 39250076 PMCID: PMC11382406 DOI: 10.1093/gigascience/giae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/01/2024] [Accepted: 07/13/2024] [Indexed: 09/10/2024] Open
Abstract
Research on animal venoms and their components spans multiple disciplines, including biology, biochemistry, bioinformatics, pharmacology, medicine, and more. Manipulating and analyzing the diverse array of data required for venom research can be challenging, and relevant tools and resources are often dispersed across different online platforms, making them less accessible to nonexperts. In this article, we address the multifaceted needs of the scientific community involved in venom and toxin-related research by identifying and discussing web resources, databases, and tools commonly used in this field. We have compiled these resources into a comprehensive table available on the VenomZone website (https://venomzone.expasy.org/10897). Furthermore, we highlight the challenges currently faced by researchers in accessing and using these resources and emphasize the importance of community-driven interdisciplinary approaches. We conclude by underscoring the significance of enhancing standards, promoting interoperability, and encouraging data and method sharing within the venom research community.
Collapse
Affiliation(s)
- Giulia Zancolli
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Björn Marcus von Reumont
- Goethe University Frankfurt, Faculty of Biological Sciences, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
| | - Gregor Anderluh
- Department of Molecular Biology and Nanobiotechnology, National Institute of Chemistry, 1000 Ljubljana, Slovenia
| | - Figen Caliskan
- Department of Biology, Faculty of Science, Eskisehir Osmangazi University, 26040 Eskişehir, Turkey
| | - Maria Luisa Chiusano
- Department of Agricultural Sciences, University Federico II of Naples, 80055 Portici, Naples, Italy
- Department of Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | - Jacob Fröhlich
- Veterinary Center for Resistance Research (TZR), Freie Universität Berlin, 14163 Berlin, Germany
| | - Evroula Hapeshi
- Department of Health Sciences, School of Life and Health Sciences, University of Nicosia, 1700 Nicosia, Cyprus
| | - Benjamin-Florian Hempel
- Veterinary Center for Resistance Research (TZR), Freie Universität Berlin, 14163 Berlin, Germany
| | - Maria P Ikonomopoulou
- Madrid Institute of Advanced Studies in Food, Precision Nutrition & Aging Program, 28049 Madrid, Spain
| | - Florence Jungo
- SIB Swiss Institute of Bioinformatics, Swiss-Prot Group, 1211 Geneva, Switzerland
| | - Pascale Marchot
- Laboratory Architecture et Fonction des Macromolécules Biologiques, Aix-Marseille University, Centre National de la Recherche Scientifique, Faculté des Sciences, Campus Luminy, 13288 Marseille, France
| | - Tarcisio Mendes de Farias
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Vittoria Modica
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, 00198 Rome, Italy
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, 9190401 Jerusalem, Israel
| | - Ayse Nalbantsoy
- Engineering Faculty, Bioengineering Department, Ege University, 35100 Bornova-Izmir, Turkey
| | - Jan Procházka
- Laboratory of Transgenic Models of Diseases, Institute of Molecular Genetics of the Czech Academy of Sciences, 252 50 Vestec, Czech Republic
| | - Andrea Tarallo
- Institute of Research on Terrestrial Ecosystems (IRET), National Research Council (CNR), 73100 Lecce, Italy
| | - Fiorella Tonello
- Neuroscience Institute, National Research Council (CNR), 35131 Padua, Italy
| | - Rui Vitorino
- Department of Medical Sciences, iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Mark Lawrence Zammit
- Department of Clinical Pharmacology & Therapeutics, Faculty of Medicine & Surgery, University of Malta, 2090 Msida, Malta
- Malta National Poisons Centre, Malta Life Sciences Park, 3000 San Ġwann, Malta
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| |
Collapse
|
9
|
Castel J, Delaux S, Hernandez-Alba O, Cianférani S. Recent advances in structural mass spectrometry methods in the context of biosimilarity assessment: from sequence heterogeneities to higher order structures. J Pharm Biomed Anal 2023; 236:115696. [PMID: 37713983 DOI: 10.1016/j.jpba.2023.115696] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/31/2023] [Accepted: 09/01/2023] [Indexed: 09/17/2023]
Abstract
Biotherapeutics and their biosimilar versions have been flourishing in the biopharmaceutical market for several years. Structural and functional characterization is needed to achieve analytical biosimilarity through the assessment of critical quality attributes as required by regulatory authorities. The role of analytical strategies, particularly mass spectrometry-based methods, is pivotal to gathering valuable information for the in-depth characterization of biotherapeutics and biosimilarity assessment. Structural mass spectrometry methods (native MS, HDX-MS, top-down MS, etc.) provide information ranging from primary sequence assessment to higher order structure evaluation. This review focuses on recent developments and applications in structural mass spectrometry for biotherapeutic and biosimilar characterization.
Collapse
Affiliation(s)
- Jérôme Castel
- Laboratoire de Spectrométrie de Masse Bio-Organique, IPHC UMR 7178, Université de Strasbourg, CNRS, Strasbourg 67087, France; Infrastructure Nationale de Protéomique ProFI, FR2048 CNRS CEA, Strasbourg 67087, France
| | - Sarah Delaux
- Laboratoire de Spectrométrie de Masse Bio-Organique, IPHC UMR 7178, Université de Strasbourg, CNRS, Strasbourg 67087, France; Infrastructure Nationale de Protéomique ProFI, FR2048 CNRS CEA, Strasbourg 67087, France
| | - Oscar Hernandez-Alba
- Laboratoire de Spectrométrie de Masse Bio-Organique, IPHC UMR 7178, Université de Strasbourg, CNRS, Strasbourg 67087, France; Infrastructure Nationale de Protéomique ProFI, FR2048 CNRS CEA, Strasbourg 67087, France
| | - Sarah Cianférani
- Laboratoire de Spectrométrie de Masse Bio-Organique, IPHC UMR 7178, Université de Strasbourg, CNRS, Strasbourg 67087, France; Infrastructure Nationale de Protéomique ProFI, FR2048 CNRS CEA, Strasbourg 67087, France.
| |
Collapse
|
10
|
Su T, Hollas MAR, Fellers RT, Kelleher NL. Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics. Annu Rev Biomed Data Sci 2023; 6:357-376. [PMID: 37561601 PMCID: PMC10840079 DOI: 10.1146/annurev-biodatasci-020722-044021] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.
Collapse
Affiliation(s)
- Taojunfeng Su
- Department of Molecular Biosciences, Northwestern University, Evanston, Illinois, USA;
| | - Michael A R Hollas
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
| | - Ryan T Fellers
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
| | - Neil L Kelleher
- Department of Molecular Biosciences, Northwestern University, Evanston, Illinois, USA;
- Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA
- Department of Chemistry, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
11
|
Faizi M, Fellers RT, Lu D, Drown BS, Jambhekar A, Lahav G, Kelleher NL, Gunawardena J. MSModDetector: A Tool for Detecting Mass Shifts and Post-Translational Modifications in Individual Ion Mass Spectrometry Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543961. [PMID: 37333327 PMCID: PMC10274720 DOI: 10.1101/2023.06.06.543961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Motivation Post-translational modifications (PTMs) on proteins regulate protein structures and functions. A single protein molecule can possess multiple modification sites that can accommodate various PTM types, leading to a variety of different patterns, or combinations of PTMs, on that protein. Different PTM patterns can give rise to distinct biological functions. To facilitate the study of multiple PTMs, top-down mass spectrometry (MS) has proven to be a useful tool to measure the mass of intact proteins, thereby enabling even widely separated PTMs to be assigned to the same protein molecule and allowing determination of how many PTMs are attached to a single protein. Results We developed a Python module called MSModDetector that studies PTM patterns from individual ion mass spectrometry (I MS) data. I MS is an intact protein mass spectrometry approach that generates true mass spectra without the need to infer charge states. The algorithm first detects and quantifies mass shifts for a protein of interest and subsequently infers potential PTM patterns using linear programming. The algorithm is evaluated on simulated I MS data and experimental I MS data for the tumor suppressor protein p53. We show that MSModDetector is a useful tool for comparing a protein's PTM pattern landscape across different conditions. An improved analysis of PTM patterns will enable a deeper understanding of PTM-regulated cellular processes. Availability The source code is available at https://github.com/marjanfaizi/MSModDetector together with the scripts used for analyses and to generate the figures presented in this study.
Collapse
|
12
|
Larson EJ, Pergande MR, Moss ME, Rossler KJ, Wenger RK, Krichel B, Josyer H, Melby JA, Roberts DS, Pike K, Shi Z, Chan HJ, Knight B, Rogers HT, Brown KA, Ong IM, Jeong K, Marty MT, McIlwain SJ, Ge Y. MASH Native: a unified solution for native top-down proteomics data processing. Bioinformatics 2023; 39:btad359. [PMID: 37294807 PMCID: PMC10283151 DOI: 10.1093/bioinformatics/btad359] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 04/13/2023] [Accepted: 06/07/2023] [Indexed: 06/11/2023] Open
Abstract
MOTIVATION Native top-down proteomics (nTDP) integrates native mass spectrometry (nMS) with top-down proteomics (TDP) to provide comprehensive analysis of protein complexes together with proteoform identification and characterization. Despite significant advances in nMS and TDP software developments, a unified and user-friendly software package for analysis of nTDP data remains lacking. RESULTS We have developed MASH Native to provide a unified solution for nTDP to process complex datasets with database searching capabilities in a user-friendly interface. MASH Native supports various data formats and incorporates multiple options for deconvolution, database searching, and spectral summing to provide a "one-stop shop" for characterizing both native protein complexes and proteoforms. AVAILABILITY AND IMPLEMENTATION The MASH Native app, video tutorials, written tutorials, and additional documentation are freely available for download at https://labs.wisc.edu/gelab/MASH_Explorer/MASHSoftware.php. All data files shown in user tutorials are included with the MASH Native software in the download .zip file.
Collapse
Affiliation(s)
- Eli J Larson
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Melissa R Pergande
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Michelle E Moss
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kalina J Rossler
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - R Kent Wenger
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
- Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Boris Krichel
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Harini Josyer
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Jake A Melby
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - David S Roberts
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyndalanne Pike
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Zhuoxin Shi
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Hsin-Ju Chan
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Bridget Knight
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Holden T Rogers
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyle A Brown
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Irene M Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53705, United States
- University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Obstetrics and Gynecology, University of Wisconsin–Madison, Madison, WI 53705, United States
| | - Kyowon Jeong
- Department of Applied Bioinformatics, University of Tübingen, Tübingen 72704, Germany
| | - Michael T Marty
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85719, United States
| | - Sean J McIlwain
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI 53705, United States
- University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI 53705, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin–Madison, Madison, WI 53705, United States
- Department of Cell and Regenerative Biology, University of Wisconsin–Madison, Madison, WI 53705, United States
- Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin–Madison, Madison, WI 53705, United States
| |
Collapse
|
13
|
Tabb DL, Jeong K, Druart K, Gant MS, Brown KA, Nicora C, Zhou M, Couvillion S, Nakayasu E, Williams JE, Peterson HK, McGuire MK, McGuire MA, Metz TO, Chamot-Rooke J. Comparing Top-Down Proteoform Identification: Deconvolution, PrSM Overlap, and PTM Detection. J Proteome Res 2023. [PMID: 37235544 DOI: 10.1021/acs.jproteome.2c00673] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Generating top-down tandem mass spectra (MS/MS) from complex mixtures of proteoforms benefits from improvements in fractionation, separation, fragmentation, and mass analysis. The algorithms to match MS/MS to sequences have undergone a parallel evolution, with both spectral alignment and match-counting approaches producing high-quality proteoform-spectrum matches (PrSMs). This study assesses state-of-the-art algorithms for top-down identification (ProSight PD, TopPIC, MSPathFinderT, and pTop) in their yield of PrSMs while controlling false discovery rate. We evaluated deconvolution engines (ThermoFisher Xtract, Bruker AutoMSn, Matrix Science Mascot Distiller, TopFD, and FLASHDeconv) in both ThermoFisher Orbitrap-class and Bruker maXis Q-TOF data (PXD033208) to produce consistent precursor charges and mass determinations. Finally, we sought post-translational modifications (PTMs) in proteoforms from bovine milk (PXD031744) and human ovarian tissue. Contemporary identification workflows produce excellent PrSM yields, although approximately half of all identified proteoforms from these four pipelines were specific to only one workflow. Deconvolution algorithms disagree on precursor masses and charges, contributing to identification variability. Detection of PTMs is inconsistent among algorithms. In bovine milk, 18% of PrSMs produced by pTop and TopMG were singly phosphorylated, but this percentage fell to 1% for one algorithm. Applying multiple search engines produces more comprehensive assessments of experiments. Top-down algorithms would benefit from greater interoperability.
Collapse
Affiliation(s)
- David L Tabb
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen 72076, Germany
| | - Karen Druart
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Megan S Gant
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| | - Kyle A Brown
- School of Medicine and Public Health, University of Wisconsin, Madison, Wisconsin 53705, United States
| | - Carrie Nicora
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Mowei Zhou
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Sneha Couvillion
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ernesto Nakayasu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Janet E Williams
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Haley K Peterson
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Michelle K McGuire
- Margaret Ritchie School of Family and Consumer Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Mark A McGuire
- Department of Animal, Veterinary, and Food Sciences, University of Idaho, Moscow, Idaho 83844, United States
| | - Thomas O Metz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Julia Chamot-Rooke
- Université Paris Cité, Institut Pasteur, CNRS UAR 2024, Mass Spectrometry for Biology Unit, Paris 75015, France
| |
Collapse
|
14
|
Maráková K, Renner BJ, Thomas SL, Opetová M, Tomašovský R, Rai AJ, Schug KA. Solid phase extraction as sample pretreatment method for top-down quantitative analysis of low molecular weight proteins from biological samples using liquid chromatography - triple quadrupole mass spectrometry. Anal Chim Acta 2023; 1243:340801. [PMID: 36697174 DOI: 10.1016/j.aca.2023.340801] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 01/02/2023] [Accepted: 01/03/2023] [Indexed: 01/05/2023]
Abstract
Targeting and quantifying intact proteins from biological samples is still a very challenging research area. Several crucial steps exist in the analytical workflow, including development of a reliable sample preparation method. Here, we developed and applied for the first time a non-immunoaffinity sample preparation method based on a generally widely available micro-elution solid phase extraction (μSPE) strategy for the extraction of multiple lower molecular weight intact proteins (<30 kDa) from various biological matrices. Omission of a time-consuming drying and reconstitution step after extraction resulted in a more simple and rapid sample preparation procedure. A model set of eleven intact proteins (molecular weights: 5.5-29 kDa; isoelectric points: 4.5-11.3) were analyzed in multiple biological fluids using reversed-phase liquid chromatography with a triple quadrupole mass spectrometer operated in multiple reaction monitoring mode. Various sample pre-treatment reagents, sorbent types, and washing and elution solvents were experimentally tested and optimized to obtain the μSPE clean-up condition for a broad mixture of intact proteins having variable physicochemical properties. 1% trifluoroacetic acid and 0.2% Triton 100-X were selected as suitable sample pre-treatment reagents for releasing protein-protein interactions in human serum/plasma and human urine, respectively. Hydrophilic lipophilic balanced μSPE sorbent was selected as a high performing stationary phase. Addition of 1% trifluoroacetic acid to all washing and elution solutions showed the most beneficial effect for the extraction recovery of the proteins. Under the optimized conditions, reproducible extraction recoveries >65% for all targeted proteins (up to 30 kDa) in human urine and >50% for most of the proteins in serum/plasma were achieved. The selected conditions were applied also for the analysis of clinical serum and urine samples to demonstrate the feasibility of the developed method to target intact proteins directly by more affordable μSPE sample preparation and triple quadrupole mass spectrometry, which could be beneficial in many application fields.
Collapse
Affiliation(s)
- Katarína Maráková
- Department of Pharmaceutical Analysis and Nuclear Pharmacy, Faculty of Pharmacy, Comenius University in Bratislava, Bratislava, Slovakia; Toxicological and Antidoping Center, Comenius University in Bratislava, Bratislava, Slovakia.
| | - Beatriz J Renner
- Department of Chemistry & Biochemistry, The University of Texas at Arlington, Arlington, TX, USA
| | - Shannon L Thomas
- Department of Chemistry & Biochemistry, The University of Texas at Arlington, Arlington, TX, USA
| | - Martina Opetová
- Department of Pharmaceutical Analysis and Nuclear Pharmacy, Faculty of Pharmacy, Comenius University in Bratislava, Bratislava, Slovakia; Toxicological and Antidoping Center, Comenius University in Bratislava, Bratislava, Slovakia
| | - Radovan Tomašovský
- Department of Pharmaceutical Analysis and Nuclear Pharmacy, Faculty of Pharmacy, Comenius University in Bratislava, Bratislava, Slovakia; Toxicological and Antidoping Center, Comenius University in Bratislava, Bratislava, Slovakia
| | - Alex J Rai
- Department of Pathology and Cell Biology, Vagelos College of Physicians and Surgeons, Columbia University Irving Medical Center, New York, USA
| | - Kevin A Schug
- Department of Chemistry & Biochemistry, The University of Texas at Arlington, Arlington, TX, USA.
| |
Collapse
|
15
|
Martin EA, Fulcher JM, Zhou M, Monroe ME, Petyuk VA. TopPICR: A Companion R Package for Top-Down Proteomics Data Analysis. J Proteome Res 2023; 22:399-409. [PMID: 36631391 DOI: 10.1021/acs.jproteome.2c00570] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Top-down proteomics is the analysis of proteins in their intact form without proteolysis, thus preserving valuable information about post-translational modifications, isoforms, and proteolytic processing. However, it is still a developing field due to limitations in the instrumentation, difficulties with the interpretation of complex mass spectra, and a lack of well-established quantification approaches. TopPIC is one of the popular tools for proteoform identification. We extended its capabilities into label-free proteoform quantification by developing a companion R package (TopPICR). Key steps in the TopPICR pipeline include filtering identifications, inferring a minimal set of protein accessions explaining the observed sequences, aligning retention times, recalibrating measured masses, clustering features across data sets, and finally compiling feature intensities using the match-between-runs approach. The output of the pipeline is an MSnSet object which makes downstream data analysis seamlessly compatible with packages from the Bioconductor project. It also provides the capability for visualizing proteoforms within the context of the parent protein sequence. The functionality of TopPICR is demonstrated on top-down LC-MS/MS data sets of 10 human-in-mouse xenografts of luminal and basal breast tumor samples.
Collapse
Affiliation(s)
- Evan A Martin
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| | - James M Fulcher
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| | - Mowei Zhou
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| | - Vladislav A Petyuk
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington99352, United States
| |
Collapse
|
16
|
Yang R, Ma J, Zhang S, Zheng Y, Wang L, Zhu D. mzMD: visualization-oriented MS data storage and retrieval. Bioinformatics 2022; 38:2333-2340. [PMID: 35171986 DOI: 10.1093/bioinformatics/btac098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 01/23/2022] [Accepted: 02/14/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Drawing peaks in a data window of an MS dataset happens at all time in MS data visualization applications. This asks to retrieve from an MS dataset some selected peaks in a data window whose image in a display window reflects the visual feature of all peaks in the data window. If an algorithm for this purpose is asked to output high-quality solutions in real time, then the most fundamental dependence of it is on the storage format of the MS dataset. RESULTS We present mzMD, a new storage format of MS datasets and an algorithm to query this format of a storage system for a summary (a set of selected representative peaks) of a given data window. We propose a criterion Q-score to examine the quality of data window summaries. Experimental statistics on real MS datasets verified the high speed of mzMD in retrieving high-quality data window summaries. mzMD reported summaries of data windows whose Q-score outperforms those mzTree reported. The query speed of mzMD is the same as that of mzTree whereas its query speed stability is better than that of mzTree. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/yrm9837/mzMD-java. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Runmin Yang
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| | - Jingjing Ma
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| | - Shu Zhang
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| | - Yu Zheng
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China.,City University of Hong Kong Shenzhen Research Institute, Shenzhen 518057, China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, Qingdao 266237, China
| |
Collapse
|
17
|
Zhan Z, Wang L. Proteoform identification based on top-down tandem mass spectra with peak error corrections. Brief Bioinform 2022; 23:6524012. [PMID: 35136947 DOI: 10.1093/bib/bbab599] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/18/2021] [Accepted: 12/24/2021] [Indexed: 11/13/2022] Open
Abstract
In this paper, we study the problem for finding complex proteoforms from protein databases based on top-down tandem mass spectrum data. The main difficulty to solve the problem is to handle the combinatorial explosion of various alterations on a protein. To overcome the combinatorial explosion of various alterations on a protein, the problem has been formulated as the alignment problem of a proteoform mass graph (PMG) and a spectrum mass graph (SMG). The other important issue is to handle mass errors of peaks in the input spectrum. In previous methods, an error tolerance value is used to handle the mass differences between the matched consecutive nodes/peaks in PMG and SMG. However, such a way to handle mass error can not guarantee that the mass difference between any pairs of nodes in the alignment is approximately the same for both PMG and SMG. It may lead to large error accumulation if positive (or negative) errors occur consecutively for a large number of consecutive matched node pairs. The problem is severe so that some existing software packages include a step to further refine the alignments. In this paper, we propose a new model to handle the mass errors of peaks based on the formulation of the PMG and SMG. Note that the masses of sub-paths on the PMG are theoretical and suppose to be accurate. Our method allows each peak in the input spectrum to have a predefined error range. In the alignment of PMG and SMG, we need to give a correction of the mass for each matched peak within the predefined error range. After the correction, we impose that the mass between any two (not necessarily consecutive) matched nodes in the PMG is identical to that of the corresponding two matched peaks in the SMG. Intuitively, this kind of alignment is more accurate. We design an algorithm to find a maximum number of matched node and peak pairs in the two (PMG and SMG) mass graphs under the new constraint. The obtained alignment can show matched node and peak pairs as well as the corrected positions of peaks. The algorithm works well for moderate size input instances and takes very long time as well as huge size memory for large input size instances. Therefore, we propose an algorithm to do diagonal alignment. The diagonal alignment algorithm can solve large input size instances in reasonable time. Experiments show that our new algorithms can report alignments with much larger number of matched node pairs. The software package and test data sets are available at https://github.com/Zeirdo/TopMGRefine.
Collapse
Affiliation(s)
- Zhaohui Zhan
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Hong Kong, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave, Hong Kong, China.,City University of Hong Kong Shenzhen Research Institution, China
| |
Collapse
|
18
|
Qin S, Tian Z. Proteoform Identification and Quantification Using Intact Protein Database Search Engine ProteinGoggle. Methods Mol Biol 2022; 2500:131-144. [PMID: 35657591 DOI: 10.1007/978-1-0716-2325-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Proteomics studies the proteome of organisms, especially proteins that are differentially expressed under certain physiological or pathological conditions; qualitative identification of protein sequences and posttranslational modifications (PTMs) and their positions can help us systematically understand the structure and function of proteoforms. With the development and relative popularity of soft ionization technology (such as electrospray ionization technology) and high mass measurement accuracy and high-resolution mass spectrometers (such as orbitrap), the mass spectrometry (MS) characterization of complete proteins (the so-called top-down proteomics) has become possible and has gradually become popular. Corresponding database search engines and protein identification bioinformatics tools have also been greatly developed. This chapter provides a brief overview of intact protein database search algorithm "isotopic mass-to-charge ratio and envelope fingerprinting" and search engine ProteinGoggle.
Collapse
Affiliation(s)
- Suideng Qin
- School of Chemical Science & Engineering and Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, Shanghai, China
| | - Zhixin Tian
- School of Chemical Science & Engineering and Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, Shanghai, China.
| |
Collapse
|
19
|
Sun RX, Wang RM, Luo L, Liu C, Chi H, Zeng WF, He SM. Accurate Proteoform Identification and Quantitation Using pTop 2.0. Methods Mol Biol 2022; 2500:105-129. [PMID: 35657590 DOI: 10.1007/978-1-0716-2325-1_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The remarkable advancement of top-down proteomics in the past decade is driven by the technological development in separation, mass spectrometry (MS) instrumentation, novel fragmentation, and bioinformatics. However, the accurate identification and quantification of proteoforms, all clearly-defined molecular forms of protein products from a single gene, remain a challenging computational task. This is in part due to the complicated mass spectra from intact proteoforms when compared to those from the digested peptides. Herein, pTop 2.0 is developed to fill in the gap between the large-scale complex top-down MS data and the shortage of high-accuracy bioinformatic tools. Compared with pTop 1.0, the first version, pTop 2.0 concentrates mainly on the identification of the proteoforms with unexpected modifications or a terminal truncation. The quantitation based on isotopic labeling is also a new function, which can be carried out by the convenient and user-friendly "one-key operation," integrated together with the qualitative identifications. The accuracy and running speed of pTop 2.0 is significantly improved on the test data sets. This chapter will introduce the main features, step-by-step running operations, and algorithmic developments of pTop 2.0 in order to push the identification and quantitation of intact proteoforms to a higher-accuracy level in top-down proteomics.
Collapse
Affiliation(s)
- Rui-Xiang Sun
- National Institute of Biological Sciences, Beijing, China.
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
| | - Rui-Min Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Lan Luo
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Chao Liu
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Hao Chi
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Wen-Feng Zeng
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Si-Min He
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
20
|
Tiambeng TN, Wu Z, Melby JA, Ge Y. Size Exclusion Chromatography Strategies and MASH Explorer for Large Proteoform Characterization. Methods Mol Biol 2022; 2500:15-30. [PMID: 35657584 PMCID: PMC9703982 DOI: 10.1007/978-1-0716-2325-1_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Top-down mass spectrometry (MS)-based analysis of larger proteoforms (>50 kDa) is typically challenging due to an exponential decay in the signal-to-noise ratio with increasing protein molecular weight (MW) and coelution with low-MW proteoforms. Size exclusion chromatography (SEC) fractionates proteins based on their size, separating larger proteoforms from those of smaller size in the proteome. In this protocol, we initially describe the use of SEC to fractionate high-MW proteoforms from low-MW proteoforms. Subsequently, the SEC fractions containing the proteoforms of interest are subjected to reverse-phase liquid chromatography (RPLC) coupled online with high-resolution MS. Finally, proteoforms are characterized using MASH Explorer, a user-friendly software environment for in-depth proteoform characterization.
Collapse
Affiliation(s)
- Timothy N. Tiambeng
- Department of Chemistry, University of Wisconsin – Madison, Madison, WI 53706
| | - Zhijie Wu
- Department of Chemistry, University of Wisconsin – Madison, Madison, WI 53706
| | - Jake A. Melby
- Department of Chemistry, University of Wisconsin – Madison, Madison, WI 53706
| | - Ying Ge
- Department of Chemistry, University of Wisconsin – Madison, Madison, WI 53706,Department of Cell and Regenerative Biology, University of Wisconsin – Madison, Madison, WI 53705,Human Proteomic Program, University of Wisconsin – Madison, Madison WI 53705,To whom correspondence may be addressed: Dr. Ying Ge, 8551 WIMR-II, 1111 Highland Ave., Madison, Wisconsin 53705, USA. ; Tel: 608-265-4744
| |
Collapse
|
21
|
Yang X, Zhang L, Xia Y. Photochemical Disulfide-Ene Modification Enhances Protein Sequencing and Disulfide Mapping by Mass Spectrometry. Anal Chem 2021; 93:15231-15235. [PMID: 34751558 DOI: 10.1021/acs.analchem.1c04214] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A new photochemical disulfide-ene reaction system capable of alkylating protein disulfide bonds in seconds has been established. The system is simple, containing acetone and isopropanol for disulfide reduction under 254 nm UV irradiation and norbornene as a highly efficient alkylation reagent. Enhanced characterization of disulfide-rich proteins with significantly shortened analysis time is demonstrated by coupling the reaction online with mass spectrometry.
Collapse
Affiliation(s)
- Xiaoyue Yang
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Longfei Zhang
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Yu Xia
- Department of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
22
|
Lui KW, Ngai SM. PrSM-Level Side-by-Side Comparison of Online LC-MS Methods with Intact Histone H3 and H4 Proteoforms. J Proteome Res 2021; 20:4331-4345. [PMID: 34327993 DOI: 10.1021/acs.jproteome.1c00308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The heterogeneity of histone H3 proteoforms makes histone H3 top-down analysis challenging. To enhance the detection coverage of the proteoforms, performing liquid chromatography (LC) front-end to mass spectrometry (MS) detection is recommended. Here, using optimized electron-transfer/high-energy collision dissociation (EThcD) parameters, we have conducted a proteoform-spectrum match (PrSM)-level side-by-side comparison of reversed-phase LC-MS (RPLC-MS), "dual-gradient" weak cation-exchange/hydrophilic interaction LC-MS (dual-gradient WCX/HILIC-MS), and "organic-rich" WCX/HILIC-MS on the top-down analyses of H3.1, H3.2, and H4 proteins extracted from a HeLa cell culture. While both dual-gradient WCX/HILIC and organic-rich WCX/HILIC could resolve intact H3 and H4 proteoforms by the number of acetylations, the organic-rich method could enhance the separations of different trimethyl/acetyl near-isobaric H3 proteoforms. In comparison with RPLC-MS, both of the WCX/HILIC-MS methods enhanced the qualities of the H3 PrSMs and remarkably improved the range, reproducibility, and confidence in the identifications of H3 proteoforms.
Collapse
Affiliation(s)
- Kin-Wing Lui
- School of Life Sciences, The Chinese University of Hong Kong, Sha Tin, Hong Kong 999077, P. R. China
| | - Sai-Ming Ngai
- School of Life Sciences, The Chinese University of Hong Kong, Sha Tin, Hong Kong 999077, P. R. China.,State Key Laboratory of Agrobiotechnology and School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong 999077, P. R. China
| |
Collapse
|
23
|
Khalid MF, Iman K, Ghafoor A, Saboor M, Ali A, Muaz U, Basharat AR, Tahir T, Abubakar M, Akhter MA, Nabi W, Vanderbauwhede W, Ahmad F, Wajid B, Chaudhary SU. PERCEPTRON: an open-source GPU-accelerated proteoform identification pipeline for top-down proteomics. Nucleic Acids Res 2021; 49:W510-W515. [PMID: 33999207 PMCID: PMC8262694 DOI: 10.1093/nar/gkab368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/10/2021] [Accepted: 04/25/2021] [Indexed: 11/12/2022] Open
Abstract
PERCEPTRON is a next-generation freely available web-based proteoform identification and characterization platform for top-down proteomics (TDP). PERCEPTRON search pipeline brings together algorithms for (i) intact protein mass tuning, (ii) de novo sequence tags-based filtering, (iii) characterization of terminal as well as post-translational modifications, (iv) identification of truncated proteoforms, (v) in silico spectral comparison, and (vi) weight-based candidate protein scoring. High-throughput performance is achieved through the execution of optimized code via multiple threads in parallel, on graphics processing units (GPUs) using NVidia Compute Unified Device Architecture (CUDA) framework. An intuitive graphical web interface allows for setting up of search parameters as well as for visualization of results. The accuracy and performance of the tool have been validated on several TDP datasets and against available TDP software. Specifically, results obtained from searching two published TDP datasets demonstrate that PERCEPTRON outperforms all other tools by up to 135% in terms of reported proteins and 10-fold in terms of runtime. In conclusion, the proposed tool significantly enhances the state-of-the-art in TDP search software and is publicly available at https://perceptron.lums.edu.pk. Users can also create in-house deployments of the tool by building code available on the GitHub repository (http://github.com/BIRL/Perceptron).
Collapse
Affiliation(s)
- Muhammad Farhan Khalid
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Kanzal Iman
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Amna Ghafoor
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Mujtaba Saboor
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Ahsan Ali
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Urwa Muaz
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Abdul Rehman Basharat
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Taha Tahir
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Muhammad Abubakar
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Momina Amer Akhter
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Waqar Nabi
- School of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Wim Vanderbauwhede
- School of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Fayyaz Ahmad
- Department of Statistics, University of Gujrat, Gujrat, Pakistan
| | - Bilal Wajid
- Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
- Division of Research and Development, Sabz-Qalam, Lahore, Pakistan
| | - Safee Ullah Chaudhary
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| |
Collapse
|
24
|
Melby JA, Roberts DS, Larson EJ, Brown KA, Bayne EF, Jin S, Ge Y. Novel Strategies to Address the Challenges in Top-Down Proteomics. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1278-1294. [PMID: 33983025 PMCID: PMC8310706 DOI: 10.1021/jasms.1c00099] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Top-down mass spectrometry (MS)-based proteomics is a powerful technology for comprehensively characterizing proteoforms to decipher post-translational modifications (PTMs) together with genetic variations and alternative splicing isoforms toward a proteome-wide understanding of protein functions. In the past decade, top-down proteomics has experienced rapid growth benefiting from groundbreaking technological advances, which have begun to reveal the potential of top-down proteomics for understanding basic biological functions, unraveling disease mechanisms, and discovering new biomarkers. However, many challenges remain to be comprehensively addressed. In this Account & Perspective, we discuss the major challenges currently facing the top-down proteomics field, particularly in protein solubility, proteome dynamic range, proteome complexity, data analysis, proteoform-function relationship, and analytical throughput for precision medicine. We specifically review the major technology developments addressing these challenges with an emphasis on our research group's efforts, including the development of top-down MS-compatible surfactants for protein solubilization, functionalized nanoparticles for the enrichment of low-abundance proteoforms, strategies for multidimensional chromatography separation of proteins, and a new comprehensive user-friendly software package for top-down proteomics. We have also made efforts to connect proteoforms with biological functions and provide our visions on what the future holds for top-down proteomics.
Collapse
Affiliation(s)
- Jake A Melby
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - David S Roberts
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Eli J Larson
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Kyle A Brown
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Surgery, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Elizabeth F Bayne
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Song Jin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
- Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
25
|
A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations. BMC Bioinformatics 2021; 22:277. [PMID: 34039272 PMCID: PMC8157683 DOI: 10.1186/s12859-021-04159-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/27/2021] [Indexed: 02/06/2023] Open
Abstract
Background Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions. Results We observe that number and quality of the peptide-spectrum matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that have previously been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence at the proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins. Conclusions The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration of transcriptomics data and other sources of genome-level information. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04159-8.
Collapse
|
26
|
Clementy N, Bodin A, Bisson A, Teixeira-Gomes AP, Roger S, Angoulvant D, Labas V, Babuty D. The Defibrillation Conundrum: New Insights into the Mechanisms of Shock-Related Myocardial Injury Sustained from a Life-Saving Therapy. Int J Mol Sci 2021; 22:5003. [PMID: 34066832 PMCID: PMC8125879 DOI: 10.3390/ijms22095003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 05/03/2021] [Accepted: 05/05/2021] [Indexed: 11/16/2022] Open
Abstract
Implantable cardiac defibrillators (ICDs) are recommended to prevent the risk of sudden cardiac death. However, shocks are associated with an increased mortality with a dose response effect, and a strategy of reducing electrical therapy burden improves the prognosis of implanted patients. We review the mechanisms of defibrillation and its consequences, including cell damage, metabolic remodeling, calcium metabolism anomalies, and inflammatory and pro-fibrotic remodeling. Electrical shocks do save lives, but also promote myocardial stunning, heart failure, and pro-arrhythmic effects as seen in electrical storms. Limiting unnecessary implantations and therapies and proposing new methods of defibrillation in the future are recommended.
Collapse
Affiliation(s)
- Nicolas Clementy
- Service de Cardiologie, Hôpital Trousseau, Université de Tours, 37044 Tours, France; (A.B.); (A.B.); (D.A.); (D.B.)
- Transplantation, Immunologie et Inflammation T2I-EA 4245, Université de Tours, 37044 Tours, France;
| | - Alexandre Bodin
- Service de Cardiologie, Hôpital Trousseau, Université de Tours, 37044 Tours, France; (A.B.); (A.B.); (D.A.); (D.B.)
| | - Arnaud Bisson
- Service de Cardiologie, Hôpital Trousseau, Université de Tours, 37044 Tours, France; (A.B.); (A.B.); (D.A.); (D.B.)
- Transplantation, Immunologie et Inflammation T2I-EA 4245, Université de Tours, 37044 Tours, France;
| | - Ana-Paula Teixeira-Gomes
- Plate-forme de Chirurgie et d’Imagerie pour la Recherche et l’Enseignement (CIRE), INRA, Université de Tours, CHU de Tours, 37380 Nouzilly, France; (A.-P.T.-G.); (V.L.)
| | - Sebastien Roger
- Transplantation, Immunologie et Inflammation T2I-EA 4245, Université de Tours, 37044 Tours, France;
| | - Denis Angoulvant
- Service de Cardiologie, Hôpital Trousseau, Université de Tours, 37044 Tours, France; (A.B.); (A.B.); (D.A.); (D.B.)
- Transplantation, Immunologie et Inflammation T2I-EA 4245, Université de Tours, 37044 Tours, France;
| | - Valérie Labas
- Plate-forme de Chirurgie et d’Imagerie pour la Recherche et l’Enseignement (CIRE), INRA, Université de Tours, CHU de Tours, 37380 Nouzilly, France; (A.-P.T.-G.); (V.L.)
| | - Dominique Babuty
- Service de Cardiologie, Hôpital Trousseau, Université de Tours, 37044 Tours, France; (A.B.); (A.B.); (D.A.); (D.B.)
- Transplantation, Immunologie et Inflammation T2I-EA 4245, Université de Tours, 37044 Tours, France;
| |
Collapse
|
27
|
Lantz C, Zenaidee MA, Wei B, Hemminger Z, Ogorzalek Loo RR, Loo JA. ClipsMS: An Algorithm for Analyzing Internal Fragments Resulting from Top-Down Mass Spectrometry. J Proteome Res 2021; 20:1928-1935. [PMID: 33650866 DOI: 10.1021/acs.jproteome.0c00952] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Top-down mass spectrometry (TD-MS) of peptides and proteins results in product ions that can be correlated to polypeptide sequence. Fragments can either be terminal fragments, which contain either the N- or the C-terminus, or internal fragments that contain neither termini. Normally, only terminal fragments are assigned due to the computational difficulties of assigning internal fragments. Here we describe ClipsMS, an algorithm that can assign both terminal and internal fragments generated by top-down MS fragmentation. Further, ClipsMS can be used to locate various modifications on the protein sequence. Using ClipsMS to assign TD-MS generated product ions, we demonstrate that for apo-myoglobin, the inclusion of internal fragments increases the sequence coverage up to 78%. Interestingly, many internal fragments cover complementary regions to the terminal fragments that enhance the information that is extracted from a single top-down mass spectrum. Analysis of oxidized apo-myoglobin using terminal and internal fragment matching by ClipsMS confirmed the locations of oxidation sites on the two methionine residues. Internal fragments can be beneficial for top-down protein fragmentation analysis, and ClipsMS can be a valuable tool for assigning both terminal and internal fragments present in a top-down mass spectrum. Data are available via the MassIVE community resource with the identifiers MSV000086788 and MSV000086789.
Collapse
Affiliation(s)
- Carter Lantz
- Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Muhammad A Zenaidee
- Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Benqian Wei
- Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Zachary Hemminger
- Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Rachel R Ogorzalek Loo
- Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Joseph A Loo
- Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, California 90095, United States.,Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| |
Collapse
|
28
|
Melby JA, de Lange WJ, Zhang J, Roberts DS, Mitchell SD, Tucholski T, Kim G, Kyrvasilis A, McIlwain SJ, Kamp TJ, Ralphe JC, Ge Y. Functionally Integrated Top-Down Proteomics for Standardized Assessment of Human Induced Pluripotent Stem Cell-Derived Engineered Cardiac Tissues. J Proteome Res 2021; 20:1424-1433. [PMID: 33395532 DOI: 10.1021/acs.jproteome.0c00830] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Three-dimensional (3D) human induced pluripotent stem cell-derived engineered cardiac tissues (hiPSC-ECTs) have emerged as a promising alternative to two-dimensional hiPSC-cardiomyocyte monolayer systems because hiPSC-ECTs are a closer representation of endogenous cardiac tissues and more faithfully reflect the relevant cardiac pathophysiology. The ability to perform functional and molecular assessments using the same hiPSC-ECT construct would allow for more reliable correlation between observed functional performance and underlying molecular events, and thus is critically needed. Herein, for the first time, we have established an integrated method that permits sequential assessment of functional properties and top-down proteomics from the same single hiPSC-ECT construct. We quantitatively determined the differences in isometric twitch force and the sarcomeric proteoforms between two groups of hiPSC-ECTs that differed in the duration of time of 3D-ECT culture. Importantly, by using this integrated method we discovered a new and strong correlation between the measured contractile parameters and the phosphorylation levels of alpha-tropomyosin between the two groups of hiPSC-ECTs. The integration of functional assessments together with molecular characterization by top-down proteomics in the same hiPSC-ECT construct enables a holistic analysis of hiPSC-ECTs to accelerate their applications in disease modeling, cardiotoxicity, and drug discovery. Data are available via ProteomeXchange with identifier PXD022814.
Collapse
Affiliation(s)
- Jake A Melby
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Willem J de Lange
- Department of Pediatrics, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Jianhua Zhang
- Department of Medicine, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - David S Roberts
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Stanford D Mitchell
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Trisha Tucholski
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Gina Kim
- Department of Medicine, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Andreas Kyrvasilis
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Sean J McIlwain
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,UW Carbone Cancer Center, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Timothy J Kamp
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Department of Medicine, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - J Carter Ralphe
- Department of Pediatrics, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.,Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Human Proteomics Program, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
29
|
Chen W, Liu X. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. J Proteome Res 2020; 20:261-269. [PMID: 33183009 DOI: 10.1021/acs.jproteome.0c00369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| |
Collapse
|
30
|
Peris-Díaz MD, Guran R, Zitka O, Adam V, Krężel A. Mass Spectrometry-Based Structural Analysis of Cysteine-Rich Metal-Binding Sites in Proteins with MetaOdysseus R Software. J Proteome Res 2020; 20:776-785. [PMID: 32924499 PMCID: PMC7786378 DOI: 10.1021/acs.jproteome.0c00651] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
![]()
Identification
of metal-binding sites in proteins and understanding
metal-coupled protein folding mechanisms are aspects of high importance
for the structure-to-function relationship. Mass spectrometry (MS)
has brought a powerful adjunct perspective to structural biology,
obtaining from metal-to-protein stoichiometry to quaternary structure
information. Currently, the different experimental and/or instrumental
setups usually require the use of multiple data analysis software,
and in some cases, they lack some of the main data analysis steps
(MS processing, scoring, identification). Here, we present a comprehensive
data analysis pipeline that addresses charge-state deconvolution,
statistical scoring, and mass assignment for native MS, bottom-up,
and native top-down with emphasis on metal–protein complexes.
We have evaluated all of the approaches using assemblies of increasing
complexity, including free and chemically labeled proteins, from low-
to high-resolution MS. In all cases, the results have been compared
with common software and proved how MetaOdysseus outperformed them.
Collapse
Affiliation(s)
- Manuel David Peris-Díaz
- Department of Chemical Biology, Faculty of Biotechnology, University of Wrocław, F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| | - Roman Guran
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, 613 00 Brno, Czech Republic.,Central European Institute of Technology, Brno University of Technology, Purkynova 123, 612 00 Brno, Czech Republic
| | - Ondrej Zitka
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, 613 00 Brno, Czech Republic.,Central European Institute of Technology, Brno University of Technology, Purkynova 123, 612 00 Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemedelska 1, 613 00 Brno, Czech Republic.,Central European Institute of Technology, Brno University of Technology, Purkynova 123, 612 00 Brno, Czech Republic
| | - Artur Krężel
- Department of Chemical Biology, Faculty of Biotechnology, University of Wrocław, F. Joliot-Curie 14a, 50-383 Wrocław, Poland
| |
Collapse
|
31
|
Wu Z, Roberts DS, Melby JA, Wenger K, Wetzel M, Gu Y, Ramanathan SG, Bayne EF, Liu X, Sun R, Ong IM, McIlwain SJ, Ge Y. MASH Explorer: A Universal Software Environment for Top-Down Proteomics. J Proteome Res 2020; 19:3867-3876. [PMID: 32786689 DOI: 10.1021/acs.jproteome.0c00469] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Top-down mass spectrometry (MS)-based proteomics enable a comprehensive analysis of proteoforms with molecular specificity to achieve a proteome-wide understanding of protein functions. However, the lack of a universal software for top-down proteomics is becoming increasingly recognized as a major barrier, especially for newcomers. Here, we have developed MASH Explorer, a universal, comprehensive, and user-friendly software environment for top-down proteomics. MASH Explorer integrates multiple spectral deconvolution and database search algorithms into a single, universal platform which can process top-down proteomics data from various vendor formats, for the first time. It addresses the urgent need in the rapidly growing top-down proteomics community and is freely available to all users worldwide. With the critical need and tremendous support from the community, we envision that this MASH Explorer software package will play an integral role in advancing top-down proteomics to realize its full potential for biomedical research.
Collapse
Affiliation(s)
- Zhijie Wu
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - David S Roberts
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Jake A Melby
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Kent Wenger
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Molly Wetzel
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Yiwen Gu
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | | | - Elizabeth F Bayne
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Ruixiang Sun
- National Institute of Biological Sciences, Beijing, 102206, China
| | - Irene M Ong
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Department of Obstetrics and Gynecology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Sean J McIlwain
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States.,Human Proteomics Program, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
32
|
Basharat AR, Ning X, Liu X. EnvCNN: A Convolutional Neural Network Model for Evaluating Isotopic Envelopes in Top-Down Mass-Spectral Deconvolution. Anal Chem 2020; 92:7778-7785. [PMID: 32356965 PMCID: PMC7341906 DOI: 10.1021/acs.analchem.0c00903] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Top-down mass spectrometry has become the main method for intact proteoform identification, characterization, and quantitation. Because of the complexity of top-down mass spectrometry data, spectral deconvolution is an indispensable step in spectral data analysis, which groups spectral peaks into isotopic envelopes and extracts monoisotopic masses of precursor or fragment ions. The performance of spectral deconvolution methods relies heavily on their scoring functions, which distinguish correct envelopes from incorrect ones. A good scoring function increases the accuracy of deconvoluted masses reported from mass spectra. In this paper, we present EnvCNN, a convolutional neural network-based model for evaluating isotopic envelopes. We show that the model outperforms other scoring functions in distinguishing correct envelopes from incorrect ones and that it increases the number of identifications and improves the statistical significance of identifications in top-down spectral interpretation.
Collapse
Affiliation(s)
- Abdul Rehman Basharat
- School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, 46202, USA
| | - Xia Ning
- Department of Biomedical Informatics and Department of Computer Science and Engineering, Ohio State University, Columbus, Ohio, 43210, USA
| | - Xiaowen Liu
- School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, 46202, USA
| |
Collapse
|
33
|
Nickel Nanoparticles Induce the Synthesis of a Tumor-Related Polypeptide in Human Epidermal Keratinocytes. NANOMATERIALS 2020; 10:nano10050992. [PMID: 32455808 PMCID: PMC7279538 DOI: 10.3390/nano10050992] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/14/2020] [Accepted: 05/19/2020] [Indexed: 01/29/2023]
Abstract
Although nickel allergy and carcinogenicity are well known, their molecular mechanisms are still uncertain, thus demanding studies at the molecular level. The nickel carcinogenicity is known to be dependent on the chemical form of nickel, since only certain nickel compounds can enter the cell. This study investigates, for the first time, the cytotoxicity, cellular uptake, and molecular targets of nickel nanoparticles (NiNPs) in human skin cells in comparison with other chemical forms of nickel. The dose-response curve that was obtained for NiNPs in the cytotoxicity assays showed a linear behavior typical of genotoxic carcinogens. The exposure of keratinocytes to NiNPs leads to the release of Ni2+ ions and its accumulation in the cytosol. A 6 kDa nickel-binding molecule was found to be synthesized by cells exposed to NiNPs at a dose corresponding to medium mortality. This molecule was identified to be tumor-related p63-regulated gene 1 protein.
Collapse
|
34
|
Zhong J, Sun Y, Xie M, Peng W, Zhang C, Wu FX, Wang J. Proteoform characterization based on top-down mass spectrometry. Brief Bioinform 2020; 22:1729-1750. [PMID: 32118252 DOI: 10.1093/bib/bbaa015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Revised: 01/23/2020] [Indexed: 12/16/2022] Open
Abstract
Proteins are dominant executors of living processes. Compared to genetic variations, changes in the molecular structure and state of a protein (i.e. proteoforms) are more directly related to pathological changes in diseases. Characterizing proteoforms involves identifying and locating primary structure alterations (PSAs) in proteoforms, which is of practical importance for the advancement of the medical profession. With the development of mass spectrometry (MS) technology, the characterization of proteoforms based on top-down MS technology has become possible. This type of method is relatively new and faces many challenges. Since the proteoform identification is the most important process in characterizing proteoforms, we comprehensively review the existing proteoform identification methods in this study. Before identifying proteoforms, the spectra need to be preprocessed, and protein sequence databases can be filtered to speed up the identification. Therefore, we also summarize some popular deconvolution algorithms, various filtering algorithms for improving the proteoform identification performance and various scoring methods for localizing proteoforms. Moreover, commonly used methods were evaluated and compared in this review. We believe our review could help researchers better understand the current state of the development in this field and design new efficient algorithms for the proteoform characterization.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Yusui Sun
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Wei Peng
- Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Chushu Zhang
- College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan, Saskatoon, Canada
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering at Central South University, Changsha, Hunan, China
| |
Collapse
|
35
|
Jin Y, Diffee GM, Colman RJ, Anderson RM, Ge Y. Top-down Mass Spectrometry of Sarcomeric Protein Post-translational Modifications from Non-human Primate Skeletal Muscle. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2019; 30:2460-2469. [PMID: 30834509 PMCID: PMC6722035 DOI: 10.1007/s13361-019-02139-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 01/11/2019] [Accepted: 01/12/2019] [Indexed: 05/22/2023]
Abstract
Sarcomeric proteins, including myofilament and Z-disk proteins, play critical roles in regulating muscle contractile properties. A variety of isoforms and post-translational modifications (PTMs) of sarcomeric proteins have been shown to be associated with modulation of muscle functions and the occurrence of muscle diseases. Non-human primates (NHPs) are excellent research models for sarcopenia, a disease associated with alterations in sarcomeric proteins, due to their marked similarities to humans. However, the sarcomeric proteins in NHP skeletal muscle have not been well characterized. To gain a deeper understanding of sarcomeric proteins in NHP skeletal muscle, we employed top-down mass spectrometry (MS) to conduct a comprehensive analysis on isoforms and PTMs of sarcomeric proteins in rhesus macaque skeletal muscle. We identified 23 protein isoforms with 46 proteoforms of sarcomeric proteins, including 6 isoforms with 18 proteoforms from fast skeletal troponin T. Particularly, for the first time, a novel PDZ/LIM domain protein isoform, PDLIM7, was characterized with a newly identified protein sequence. Moreover, we also identified multiple PTMs on these proteins, including deamidation, methylation, acetylation, tri-methylation, phosphorylation, and S-glutathionylation. Most PTM sites were localized, including Asn13 deamidation on MLC-2S; His73 methylation on αactin; N-terminal acetylation on most identified proteins; N-terminal tri-methylation on MLC-1S, MLC-1F, MLC-2S, and MLC-2F; Ser14 phosphorylation on MLC-2S; and Ser15 and Ser16 phosphorylation on MLC-2F. In summary, a comprehensive characterization of sarcomeric proteins including multiple isoforms and PTMs in NHP skeletal muscle was achieved by analyzing intact proteins in the top-down MS approach.
Collapse
Affiliation(s)
- Yutong Jin
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Gary M Diffee
- Department of Kinesiology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Ricki J Colman
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI, 53715, USA
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, 53705, USA
| | - Rozalyn M Anderson
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Geriatric Research Education and Clinical Center, William S. Middleton Memorial Veterans Hospital, Madison, WI, 53705, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- Human Proteomics Program, University of Wisconsin-Madison, Madison, WI, 53705, USA.
| |
Collapse
|
36
|
Locard-Paulet M, Parra J, Albigot R, Mouton-Barbosa E, Bardi L, Burlet-Schiltz O, Marcoux J. VisioProt-MS: interactive 2D maps from intact protein mass spectrometry. Bioinformatics 2019; 35:679-681. [PMID: 30084957 PMCID: PMC6378940 DOI: 10.1093/bioinformatics/bty680] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 07/13/2018] [Accepted: 08/06/2018] [Indexed: 12/21/2022] Open
Abstract
SUMMARY VisioProt-MS is designed to summarize and analyze intact protein and top-down proteomics data. It plots the molecular weights of eluting proteins as a function of their retention time, thereby allowing inspection of runs from liquid chromatography coupled to mass spectrometry (LC-MS). It also overlays MS/MS identification results. VisioProt-MS is compatible with outputs from many different top-down dedicated software. To our knowledge, this is the only open source standalone application that allows the dynamic comparison of several MS files, a prerequisite for comparative analysis of different biological conditions. With its dynamic rendering, this user-friendly web application facilitates inspection, comparison and export of publication quality 2 D maps from deconvoluted LC-MS run(s) and top-down proteomics data. AVAILABILITY AND IMPLEMENTATION The Shiny-based web application VisioProt-MS is suitable for non-R users. It can be found at https://masstools.ipbs.fr/mstools/visioprot-ms/ and the corresponding scripts are downloadable at https://github.com/mlocardpaulet/VisioProt-MS. It is governed by the CeCILL license (http://www.cecill.info).
Collapse
Affiliation(s)
- Marie Locard-Paulet
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Julien Parra
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Renaud Albigot
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Emmanuelle Mouton-Barbosa
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Laurent Bardi
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Odile Burlet-Schiltz
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Julien Marcoux
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| |
Collapse
|
37
|
Vincent D, Binos S, Rochfort S, Spangenberg G. Top-Down Proteomics of Medicinal Cannabis. Proteomes 2019; 7:proteomes7040033. [PMID: 31554318 PMCID: PMC6958505 DOI: 10.3390/proteomes7040033] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 09/06/2019] [Accepted: 09/20/2019] [Indexed: 02/02/2023] Open
Abstract
The revised legislation on medicinal cannabis has triggered a surge of research studies in this space. Yet, cannabis proteomics is lagging. In a previous study, we optimised the protein extraction of mature buds for bottom-up proteomics. In this follow-up study, we developed a top-down mass spectrometry (MS) proteomics strategy to identify intact denatured protein from cannabis apical buds. After testing different source-induced dissociation (SID), collision-induced dissociation (CID), higher-energy collisional dissociation (HCD), and electron transfer dissociation (ETD) parameters on infused known protein standards, we devised three LC-MS/MS methods for top-down sequencing of cannabis proteins. Different MS/MS modes produced distinct spectra, albeit greatly overlapping between SID, CID, and HCD. The number of fragments increased with the energy applied; however, this did not necessarily translate into greater sequence coverage. Some precursors were more amenable to fragmentation than others. Sequence coverage decreased as the mass of the protein increased. Combining all MS/MS data maximised amino acid (AA) sequence coverage, achieving 73% for myoglobin. In this experiment, most cannabis proteins were smaller than 30 kD. A total of 46 cannabis proteins were identified with 136 proteoforms bearing different post-translational modifications (PTMs), including the excision of N-terminal M, the N-terminal acetylation, methylation, and acetylation of K resides, and phosphorylation. Most identified proteins are involved in photosynthesis, translation, and ATP production. Only one protein belongs to the phytocannabinoid biosynthesis, olivetolic acid cyclase.
Collapse
Affiliation(s)
- Delphine Vincent
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia.
| | - Steve Binos
- Thermo Fisher Scientific, Bio21 Institute, The University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia.
| | - Simone Rochfort
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia.
| | - German Spangenberg
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia.
| |
Collapse
|
38
|
Fert-Bober J, Murray CI, Parker SJ, Van Eyk JE. Precision Profiling of the Cardiovascular Post-Translationally Modified Proteome: Where There Is a Will, There Is a Way. Circ Res 2019; 122:1221-1237. [PMID: 29700069 DOI: 10.1161/circresaha.118.310966] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
There is an exponential increase in biological complexity as initial gene transcripts are spliced, translated into amino acid sequence, and post-translationally modified. Each protein can exist as multiple chemical or sequence-specific proteoforms, and each has the potential to be a critical mediator of a physiological or pathophysiological signaling cascade. Here, we provide an overview of how different proteoforms come about in biological systems and how they are most commonly measured using mass spectrometry-based proteomics and bioinformatics. Our goal is to present this information at a level accessible to every scientist interested in mass spectrometry and its application to proteome profiling. We will specifically discuss recent data linking various protein post-translational modifications to cardiovascular disease and conclude with a discussion for enablement and democratization of proteomics across the cardiovascular and scientific community. The aim is to inform and inspire the readership to explore a larger breadth of proteoform, particularity post-translational modifications, related to their particular areas of expertise in cardiovascular physiology.
Collapse
Affiliation(s)
- Justyna Fert-Bober
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| | - Christopher I Murray
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| | - Sarah J Parker
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA.
| | - Jennifer E Van Eyk
- From the Advanced Clinical BioSystems Research Institute, Smidt Heart Institute, Department of Medicine, Cedars Sinai Medical Center, Los Angeles, CA
| |
Collapse
|
39
|
Nagarajan A, Zhou M, Nguyen AY, Liberton M, Kedia K, Shi T, Piehowski P, Shukla A, Fillmore TL, Nicora C, Smith RD, Koppenaal DW, Jacobs JM, Pakrasi HB. Proteomic Insights into Phycobilisome Degradation, A Selective and Tightly Controlled Process in The Fast-Growing Cyanobacterium Synechococcus elongatus UTEX 2973. Biomolecules 2019; 9:biom9080374. [PMID: 31426316 PMCID: PMC6722726 DOI: 10.3390/biom9080374] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Revised: 08/12/2019] [Accepted: 08/13/2019] [Indexed: 11/16/2022] Open
Abstract
Phycobilisomes (PBSs) are large (3-5 megadalton) pigment-protein complexes in cyanobacteria that associate with thylakoid membranes and harvest light primarily for photosystem II. PBSs consist of highly ordered assemblies of pigmented phycobiliproteins (PBPs) and linker proteins that can account for up to half of the soluble protein in cells. Cyanobacteria adjust to changing environmental conditions by modulating PBS size and number. In response to nutrient depletion such as nitrogen (N) deprivation, PBSs are degraded in an extensive, tightly controlled, and reversible process. In Synechococcus elongatus UTEX 2973, a fast-growing cyanobacterium with a doubling time of two hours, the process of PBS degradation is very rapid, with 80% of PBSs per cell degraded in six hours under optimal light and CO2 conditions. Proteomic analysis during PBS degradation and re-synthesis revealed multiple proteoforms of PBPs with partially degraded phycocyanobilin (PCB) pigments. NblA, a small proteolysis adaptor essential for PBS degradation, was characterized and validated with targeted mass spectrometry. NblA levels rose from essentially 0 to 25,000 copies per cell within 30 min of N depletion, and correlated with the rate of decrease in phycocyanin (PC). Implications of this correlation on the overall mechanism of PBS degradation during N deprivation are discussed.
Collapse
Affiliation(s)
- Aparna Nagarajan
- Department of Biology, Washington University, St. Louis, MO 63130, USA
| | - Mowei Zhou
- Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Amelia Y Nguyen
- Department of Biology, Washington University, St. Louis, MO 63130, USA
| | - Michelle Liberton
- Department of Biology, Washington University, St. Louis, MO 63130, USA
| | - Komal Kedia
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Tujin Shi
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Paul Piehowski
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Anil Shukla
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Thomas L Fillmore
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Carrie Nicora
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Richard D Smith
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - David W Koppenaal
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Jon M Jacobs
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354, USA
| | - Himadri B Pakrasi
- Department of Biology, Washington University, St. Louis, MO 63130, USA.
| |
Collapse
|
40
|
Basharat AR, Iman K, Khalid MF, Anwar Z, Hussain R, Kabir HG, Tahreem M, Shahid A, Humayun M, Hayat HA, Mustafa M, Shoaib MA, Ullah Z, Zarina S, Ahmed S, Uddin E, Hamera S, Ahmad F, Chaudhary SU. SPECTRUM - A MATLAB Toolbox for Proteoform Identification from Top-Down Proteomics Data. Sci Rep 2019; 9:11267. [PMID: 31375721 PMCID: PMC6677810 DOI: 10.1038/s41598-019-47724-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 06/10/2019] [Indexed: 01/07/2023] Open
Abstract
Top-Down Proteomics (TDP) is an emerging proteomics protocol that involves identification, characterization, and quantitation of intact proteins using high-resolution mass spectrometry. TDP has an edge over other proteomics protocols in that it allows for: (i) accurate measurement of intact protein mass, (ii) high sequence coverage, and (iii) enhanced identification of post-translational modifications (PTMs). However, the complexity of TDP spectra poses a significant impediment to protein search and PTM characterization. Furthermore, limited software support is currently available in the form of search algorithms and pipelines. To address this need, we propose 'SPECTRUM', an open-architecture and open-source toolbox for TDP data analysis. Its salient features include: (i) MS2-based intact protein mass tuning, (ii) de novo peptide sequence tag analysis, (iii) propensity-driven PTM characterization, (iv) blind PTM search, (v) spectral comparison, (vi) identification of truncated proteins, (vii) multifactorial coefficient-weighted scoring, and (viii) intuitive graphical user interfaces to access the aforementioned functionalities and visualization of results. We have validated SPECTRUM using published datasets and benchmarked it against salient TDP tools. SPECTRUM provides significantly enhanced protein identification rates (91% to 177%) over its contemporaries. SPECTRUM has been implemented in MATLAB, and is freely available along with its source code and documentation at https://github.com/BIRL/SPECTRUM/.
Collapse
Affiliation(s)
- Abdul Rehman Basharat
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Kanzal Iman
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Muhammad Farhan Khalid
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Zohra Anwar
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Rashid Hussain
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Humnah Gohar Kabir
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Maria Tahreem
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Anam Shahid
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Maheen Humayun
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Hira Azmat Hayat
- Department of Computer Science, Lahore University of Management Sciences, Lahore, Pakistan
| | - Muhammad Mustafa
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Muhammad Ali Shoaib
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Zakir Ullah
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
- Lahore University of Management Sciences, Lahore, Pakistan
| | - Shamshad Zarina
- National Center for Proteomics, University of Karachi, Karachi, Pakistan
| | - Sameer Ahmed
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan
| | - Emad Uddin
- Department of Mechanical Engineering, National University of Sciences and Technology, Islamabad, Pakistan
| | - Sadia Hamera
- Institute of Life Sciences, University of Rostock, Rostock, Germany
- Lahore University of Management Sciences, Lahore, Pakistan
| | - Fayyaz Ahmad
- Department of Statistics, University of Gujrat, Gujrat, Pakistan
| | - Safee Ullah Chaudhary
- Biomedical Informatics Research Laboratory, Department of Biology, Lahore University of Management Sciences, Lahore, Pakistan.
| |
Collapse
|
41
|
Schaffer LV, Millikin RJ, Miller RM, Anderson LC, Fellers RT, Ge Y, Kelleher NL, LeDuc RD, Liu X, Payne SH, Sun L, Thomas PM, Tucholski T, Wang Z, Wu S, Wu Z, Yu D, Shortreed MR, Smith LM. Identification and Quantification of Proteoforms by Mass Spectrometry. Proteomics 2019; 19:e1800361. [PMID: 31050378 PMCID: PMC6602557 DOI: 10.1002/pmic.201800361] [Citation(s) in RCA: 140] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 04/07/2019] [Indexed: 12/29/2022]
Abstract
A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post-translational modifications. In top-down proteomic analyses, proteoforms are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top-down proteomic workflows. In this review, some recent advances are outlined and current challenges and future directions for the field are discussed.
Collapse
Affiliation(s)
- Leah V Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lissa C Anderson
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Tallahassee, FL, 32310, USA
| | - Ryan T Fellers
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Ying Ge
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Department of Cell and Regenerative Biology and Human Proteomics Program, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Neil L Kelleher
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
- Department of Chemistry and Molecular Biosciences and the Division of Hematology and Oncology, Northwestern University, Evanston, IL, 60208, USA
| | - Richard D LeDuc
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN, 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, UT, 84602
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, MI, 48824, USA
| | - Paul M Thomas
- Proteomics Center of Excellence, Northwestern University, Evanston, IL, 60208, USA
| | - Trisha Tucholski
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Zhe Wang
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Zhijie Wu
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Dahang Yu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, 73019, USA
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
42
|
Ghezellou P, Garikapati V, Kazemi SM, Strupat K, Ghassempour A, Spengler B. A perspective view of top-down proteomics in snake venom research. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2019; 33 Suppl 1:20-27. [PMID: 30076652 DOI: 10.1002/rcm.8255] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 07/25/2018] [Accepted: 07/29/2018] [Indexed: 06/08/2023]
Abstract
The venom produced by snakes contains complex mixtures of pharmacologically active proteins and peptides which play a crucial role in the pathophysiology of snakebite diseases. The deep understanding of venom proteomes can help to improve the treatment of this "neglected tropical disease" (as expressed by the World Health Organization [WHO]) and to develop new drugs. The most widely used technique for venom analysis is liquid chromatography/tandem mass spectrometry (LC/MS/MS)-based bottom-up (BU) proteomics. Considering the fact that multiple multi-locus gene families encode snake venom proteins, the major challenge for the BU proteomics is the limited sequence coverage and also the "protein inference problem" which result in a loss of information for the identification and characterization of toxin proteoforms (genetic variation, alternative mRNA splicing, single nucleotide polymorphism [SNP] and post-translational modifications [PTMs]). In contrast, intact protein measurements with top-down (TD) MS strategies cover almost complete protein sequences, and prove the ability to identify venom proteoforms and to localize their modifications and sequence variations.
Collapse
Affiliation(s)
- Parviz Ghezellou
- Institute of Inorganic and Analytical Chemistry, Justus Liebig University Giessen, Germany
- Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, Tehran, Iran
| | | | - Seyed Mahdi Kazemi
- Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, Tehran, Iran
| | | | - Alireza Ghassempour
- Medicinal Plants and Drugs Research Institute, Shahid Beheshti University, Tehran, Iran
| | - Bernhard Spengler
- Institute of Inorganic and Analytical Chemistry, Justus Liebig University Giessen, Germany
| |
Collapse
|
43
|
Toby TK, Fornelli L, Srzentić K, DeHart CJ, Levitsky J, Friedewald J, Kelleher NL. A comprehensive pipeline for translational top-down proteomics from a single blood draw. Nat Protoc 2019; 14:119-152. [PMID: 30518910 DOI: 10.1038/s41596-018-0085-7] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Top-down proteomics (TDP) by mass spectrometry (MS) is a technique by which intact proteins are analyzed. It has become increasingly popDesalting and concentrating GELFrEEular in translational research because of the value of characterizing distinct proteoforms of intact proteins. Compared to bottom-up proteomics (BUP) strategies, which measure digested peptide mixtures, TDP provides highly specific molecular information that avoids the bioinformatic challenge of protein inference. However, the technique has been difficult to implement widely because of inherent limitations of existing sample preparation methods and instrumentation. Recent improvements in proteoform pre-fractionation and the availability of high-resolution benchtop mass spectrometers have made it possible to use high-throughput TDP for the analysis of complex clinical samples. Here, we provide a comprehensive protocol for analysis of a common sample type in translational research: human peripheral blood mononuclear cells (PBMCs). The pipeline comprises multiple workflows that can be treated as modular by the reader and used for various applications. First, sample collection and cell preservation are described for two clinical biorepository storage schemes. Cell lysis and proteoform pre-fractionation by gel-eluted liquid fractionation entrapment electrophoresis are then described. Importantly, instrument setup and liquid chromatography-tandem MS are described for TDP analyses, which rely on high-resolution Fourier-transform MS. Finally, data processing and analysis are described using two different, application-dependent software tools: ProSight Lite for targeted analyses of one or a few proteoforms and TDPortal for high-throughput TDP in discovery mode. For a single sample, the minimum completion time of the entire experiment is 72 h.
Collapse
Affiliation(s)
- Timothy K Toby
- Departments of Chemistry and of Molecular Biosciences, Northwestern University, Evanston, IL, USA
| | - Luca Fornelli
- Departments of Chemistry and of Molecular Biosciences, Northwestern University, Evanston, IL, USA
| | - Kristina Srzentić
- Departments of Chemistry and of Molecular Biosciences, Northwestern University, Evanston, IL, USA
| | - Caroline J DeHart
- National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL, USA
| | - Josh Levitsky
- Comprehensive Transplant Center, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - John Friedewald
- Comprehensive Transplant Center, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Neil L Kelleher
- Departments of Chemistry and of Molecular Biosciences, Northwestern University, Evanston, IL, USA. .,National Resource for Translational and Developmental Proteomics, Northwestern University, Evanston, IL, USA.
| |
Collapse
|
44
|
LeDuc RD, Fellers RT, Early BP, Greer JB, Shams DP, Thomas PM, Kelleher NL. Accurate Estimation of Context-Dependent False Discovery Rates in Top-Down Proteomics. Mol Cell Proteomics 2019; 18:796-805. [PMID: 30647073 PMCID: PMC6442365 DOI: 10.1074/mcp.ra118.000993] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 01/04/2019] [Indexed: 11/06/2022] Open
Abstract
Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD_FDR_Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.
Collapse
Affiliation(s)
- Richard D LeDuc
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;.
| | - Ryan T Fellers
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois
| | - Bryan P Early
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;; §Department of Molecular Biosciences, Northwestern University, Evanston, Illinois
| | - Joseph B Greer
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois
| | - Daniel P Shams
- ¶Interdisciplinary Biological Sciences, Northwestern University, Evanston, Illinois
| | - Paul M Thomas
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;; §Department of Molecular Biosciences, Northwestern University, Evanston, Illinois
| | - Neil L Kelleher
- From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;; §Department of Molecular Biosciences, Northwestern University, Evanston, Illinois;; Department of Chemistry and the Feinberg School of Medicine, Northwestern University, Evanston, Illinois.
| |
Collapse
|
45
|
Lin Z, Wei L, Cai W, Zhu Y, Tucholski T, Mitchell SD, Guo W, Ford SP, Diffee GM, Ge Y. Simultaneous Quantification of Protein Expression and Modifications by Top-down Targeted Proteomics: A Case of the Sarcomeric Subproteome. Mol Cell Proteomics 2019; 18:594-605. [PMID: 30591534 PMCID: PMC6398208 DOI: 10.1074/mcp.tir118.001086] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Revised: 12/08/2018] [Indexed: 12/14/2022] Open
Abstract
Determining changes in protein expression and post-translational modifications (PTMs) is crucial for elucidating cellular signal transduction and disease mechanisms. Conventional antibody-based approaches have inherent problems such as the limited availability of high-quality antibodies and batch-to-batch variation. Top-down mass spectrometry (MS)-based proteomics has emerged as the most powerful method for characterization and quantification of protein modifications. Nevertheless, robust methods to simultaneously determine changes in protein expression and PTMs remain lacking. Herein, we have developed a straightforward and robust top-down liquid chromatography (LC)/MS-based targeted proteomics platform for simultaneous quantification of protein expression and PTMs with high throughput and high reproducibility. We employed this method to analyze the sarcomeric subproteome from various muscle types of different species, which successfully revealed skeletal muscle heterogeneity and cardiac developmental changes in sarcomeric protein isoform expression and PTMs. As demonstrated, this targeted top-down proteomics platform offers an excellent 'antibody-independent' alternative for the accurate quantification of sarcomeric protein expression and PTMs concurrently in complex mixtures, which is generally applicable to different species and various tissue types.
Collapse
Affiliation(s)
- Ziqing Lin
- From the ‡Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705
- §Human Proteomics Program, University of Wisconsin-Madison, Madison, WI 53705
| | - Liming Wei
- From the ‡Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705
- ¶Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, P. R. China
| | - Wenxuan Cai
- From the ‡Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705
- ‖Molecular & Cellular Pharmacology Training Program, University of Wisconsin-Madison, Madison, WI 53705
| | - Yanlong Zhu
- From the ‡Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705
- §Human Proteomics Program, University of Wisconsin-Madison, Madison, WI 53705
| | - Trisha Tucholski
- **Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706
| | - Stanford D Mitchell
- From the ‡Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705
- ‖Molecular & Cellular Pharmacology Training Program, University of Wisconsin-Madison, Madison, WI 53705
| | - Wei Guo
- ‡‡Department of Animal Science, Fetal Programming Center, University of Wyoming, Laramie, Wyoming 82071
| | - Stephen P Ford
- ‡‡Department of Animal Science, Fetal Programming Center, University of Wyoming, Laramie, Wyoming 82071
| | - Gary M Diffee
- §§Department of Kinesiology, University of Wisconsin-Madison, Madison, WI 53705
| | - Ying Ge
- From the ‡Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin 53705;
- §Human Proteomics Program, University of Wisconsin-Madison, Madison, WI 53705
- ‖Molecular & Cellular Pharmacology Training Program, University of Wisconsin-Madison, Madison, WI 53705
- **Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706
| |
Collapse
|
46
|
Kou Q, Wang Z, Lubeckyj RA, Wu S, Sun L, Liu X. A Markov Chain Monte Carlo Method for Estimating the Statistical Significance of Proteoform Identifications by Top-Down Mass Spectrometry. J Proteome Res 2019; 18:878-889. [PMID: 30638379 DOI: 10.1021/acs.jproteome.8b00562] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Top-down mass spectrometry is capable of identifying whole proteoform sequences with multiple post-translational modifications because it generates tandem mass spectra directly from intact proteoforms. Many software tools, such as ProSightPC, MSPathFinder, and TopMG, have been proposed for identifying proteoforms with modifications. In these tools, various methods are employed to estimate the statistical significance of identifications. However, most existing methods are designed for proteoform identifications without modifications, and the challenge remains for accurately estimating the statistical significance of proteoform identifications with modifications. Here we propose TopMCMC, a method that combines a Markov chain random walk algorithm and a greedy algorithm for assigning statistical significance to matches between spectra and protein sequences with variable modifications. Experimental results showed that TopMCMC achieved high accuracy in estimating E-values and false discovery rates of identifications in top-down mass spectrometry. Coupled with TopMG, TopMCMC identified more spectra than the generating function method from an MCF-7 top-down mass spectrometry data set.
Collapse
Affiliation(s)
- Qiang Kou
- Department of BioHealth Informatics , Indiana University-Purdue University Indianapolis , Indianapolis , Indiana 46202 , United States
| | - Zhe Wang
- Department of Chemistry and Biochemistry , The University of Oklahoma , Norman , Oklahoma 73019-5251 , United States
| | - Rachele A Lubeckyj
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824-1332 , United States
| | - Si Wu
- Department of Chemistry and Biochemistry , The University of Oklahoma , Norman , Oklahoma 73019-5251 , United States
| | - Liangliang Sun
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824-1332 , United States
| | - Xiaowen Liu
- Department of BioHealth Informatics , Indiana University-Purdue University Indianapolis , Indianapolis , Indiana 46202 , United States.,Center for Computational Biology and Bioinformatics , Indiana University School of Medicine , Indianapolis , Indiana 46202 , United States
| |
Collapse
|
47
|
Li Z, He B, Kou Q, Wang Z, Wu S, Liu Y, Feng W, Liu X. Evaluation of top-down mass spectral identification with homologous protein sequences. BMC Bioinformatics 2018; 19:494. [PMID: 30591035 PMCID: PMC6309053 DOI: 10.1186/s12859-018-2462-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Top-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization. RESULTS We tested TopPIC, a commonly used software tool for top-down mass spectral identification, on a top-down mass spectrometry data set of Escherichia coli K12 MG1655, and evaluated its performance using an Escherichia coli K12 MG1655 proteome database and a homologous protein database. The number of identified spectra with the homologous database was about half of that with the Escherichia coli K12 MG1655 database. We also tested TopPIC on a top-down mass spectrometry data set of human MCF-7 cells and obtained similar results. CONCLUSIONS Experimental results demonstrated that TopPIC is capable of identifying many proteoform spectrum matches and localizing unknown alterations using homologous protein sequences containing no more than 2 mutations.
Collapse
Affiliation(s)
- Ziwei Li
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| | - Bo He
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
| | - Qiang Kou
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202 USA
| | - Zhe Wang
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, Norman, OK, 73019 USA
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson Parkway, Norman, OK, 73019 USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| | - Weixing Feng
- College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001 China
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202 USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202 USA
| |
Collapse
|
48
|
Yang R, Zhu D. A graph-based filtering method for top-down mass spectral identification. BMC Genomics 2018; 19:666. [PMID: 30255788 PMCID: PMC6157290 DOI: 10.1186/s12864-018-5026-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Database search has been the main approach for proteoform identification by top-down tandem mass spectrometry. However, when the target proteoform that produced the spectrum contains post-translational modifications (PTMs) and/or mutations, it is quite time consuming to align a query spectrum against all protein sequences without any PTMs and mutations in a large database. Consequently, it is essential to develop efficient and sensitive filtering algorithms for speeding up database search. RESULTS In this paper, we propose a spectrum graph matching (SGM) based protein sequence filtering method for top-down mass spectral identification. It uses the subspectra of a query spectrum to generate spectrum graphs and searches them against a protein database to report the best candidates. As the sequence tag and gaped tag approaches need the preprocessing step to extract and select tags, the SGM filtering method circumvents this preprocessing step, thus simplifying data processing. We evaluated the filtration efficiency of the SGM filtering method with various parameter settings on an Escherichia coli top-down mass spectrometry data set and compared the performances of the SGM filtering method and two tag-based filtering methods on a data set of MCF-7 cells. CONCLUSIONS Experimental results on the data sets show that the SGM filtering method achieves high sensitivity in protein sequence filtration. When coupled with a spectral alignment algorithm, the SGM filtering method significantly increases the number of identified proteoform spectrum-matches compared with the tag-based methods in top-down mass spectrometry data analysis.
Collapse
Affiliation(s)
- Runmin Yang
- School of Computer Science and Technology, Shandong University, 1500, Shun Hua Lu, Jinan, 250101, China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, 1500, Shun Hua Lu, Jinan, 250101, China.
| |
Collapse
|
49
|
Zhu K, Liu X. A graph-based approach for proteoform identification and quantification using top-down homogeneous multiplexed tandem mass spectra. BMC Bioinformatics 2018; 19:280. [PMID: 30367573 PMCID: PMC6101081 DOI: 10.1186/s12859-018-2273-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background Top-down homogeneous multiplexed tandem mass (HomMTM) spectra are generated from modified proteoforms of the same protein with different post-translational modification patterns. They are frequently observed in the analysis of ultramodified proteins, some proteoforms of which have similar molecular weights and cannot be well separated by liquid chromatography in mass spectrometry analysis. Results We formulate the top-down HomMTM spectral identification problem as the minimum error k-splittable flow problem on graphs and propose a graph-based algorithm for the identification and quantification of proteoforms using top-down HomMTM spectra. Conclusions Experiments on a top-down mass spectrometry data set of the histone H4 protein showed that the proposed method identified many proteoform pairs that better explain the query spectra than single proteoforms. Electronic supplementary material The online version of this article (10.1186/s12859-018-2273-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kaiyuan Zhu
- Department of Computer Science, Indiana University Bloomington, 700 N. Woodlawn Avenue, Bloomington, IN, 47408, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202, USA. .,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 W. 10th Street, Indianapolis, IN, 46202, USA.
| |
Collapse
|
50
|
Kocurek KI, Griffiths RL, Cooper HJ. Ambient ionisation mass spectrometry for in situ analysis of intact proteins. JOURNAL OF MASS SPECTROMETRY : JMS 2018; 53:565-578. [PMID: 29607564 PMCID: PMC6001466 DOI: 10.1002/jms.4087] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 03/21/2018] [Accepted: 03/22/2018] [Indexed: 05/05/2023]
Abstract
Ambient surface mass spectrometry is an emerging field which shows great promise for the analysis of biomolecules directly from their biological substrate. In this article, we describe ambient ionisation mass spectrometry techniques for the in situ analysis of intact proteins. As a broad approach, the analysis of intact proteins offers unique advantages for the determination of primary sequence variations and posttranslational modifications, as well as interrogation of tertiary and quaternary structure and protein-protein/ligand interactions. In situ analysis of intact proteins offers the potential to couple these advantages with information relating to their biological environment, for example, their spatial distributions within healthy and diseased tissues. Here, we describe the techniques most commonly applied to in situ protein analysis (liquid extraction surface analysis, continuous flow liquid microjunction surface sampling, nano desorption electrospray ionisation, and desorption electrospray ionisation), their advantages, and limitations and describe their applications to date. We also discuss the incorporation of ion mobility spectrometry techniques (high field asymmetric waveform ion mobility spectrometry and travelling wave ion mobility spectrometry) into ambient workflows. Finally, future directions for the field are discussed.
Collapse
Affiliation(s)
- Klaudia I. Kocurek
- School of BiosciencesUniversity of BirminghamEdgbastonBirminghamB15 2TTUK
| | - Rian L. Griffiths
- School of BiosciencesUniversity of BirminghamEdgbastonBirminghamB15 2TTUK
| | - Helen J. Cooper
- School of BiosciencesUniversity of BirminghamEdgbastonBirminghamB15 2TTUK
| |
Collapse
|