1
|
Lutomski CA, Bennett JL, El-Baba TJ, Wu D, Hinkle JD, Burnap SA, Liko I, Mullen C, Syka JEP, Struwe WB, Robinson CV. Defining proteoform-specific interactions for drug targeting in a native cell signalling environment. Nat Chem 2025; 17:204-214. [PMID: 39806141 PMCID: PMC11794133 DOI: 10.1038/s41557-024-01711-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 11/29/2024] [Indexed: 01/16/2025]
Abstract
Understanding the dynamics of membrane protein-ligand interactions within a native lipid bilayer is a major goal for drug discovery. Typically, cell-based assays are used, however, they are often blind to the effects of protein modifications. In this study, using the archetypal G protein-coupled receptor rhodopsin, we found that the receptor and its effectors can be released directly from retina rod disc membranes using infrared irradiation in a mass spectrometer. Subsequent isolation and dissociation by infrared multiphoton dissociation enabled the sequencing of individual retina proteoforms. Specifically, we categorized distinct proteoforms of rhodopsin, localized labile palmitoylations, discovered a Gβγ proteoform that abolishes membrane association and defined lipid modifications on G proteins that influence their assembly. Given reports of undesirable side-effects involving vision, we characterized the off-target drug binding of two phosphodiesterase 5 inhibitors, vardenafil and sildenafil, to the retina rod phosphodiesterase 6 (PDE6). The results demonstrate differential off-target reactivity with PDE6 and an interaction preference for lipidated proteoforms of G proteins. In summary, this study highlights the opportunities for probing proteoform-ligand interactions within natural membrane environments.
Collapse
Affiliation(s)
- Corinne A Lutomski
- Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford, UK
- Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK
| | - Jack L Bennett
- Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford, UK
- Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK
| | - Tarick J El-Baba
- Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford, UK
- Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK
| | - Di Wu
- Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford, UK
- Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK
| | | | - Sean A Burnap
- Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK
- Department of Biochemistry, University of Oxford, Oxford, UK
| | | | | | | | - Weston B Struwe
- Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Carol V Robinson
- Physical and Theoretical Chemistry Laboratory, Department of Chemistry, University of Oxford, Oxford, UK.
- Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK.
| |
Collapse
|
2
|
Pavek JG, Whitworth IT, Nakayama L, Scalf M, Frey BL, Smith LM. Intact Mass Proteomics Using a Proteoform Atlas. J Proteome Res 2025; 24:323-332. [PMID: 39661499 PMCID: PMC12045104 DOI: 10.1021/acs.jproteome.4c00838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2024]
Abstract
Top-down proteomics, the characterization of intact proteoforms by tandem mass spectrometry, is the principal method for proteoform characterization in complex samples. Top-down proteomics relies on precursor isolation and subsequent gas-phase fragmentation to make proteoform identifications. While this strategy can produce highly detailed molecular information, the reliance on time-intensive tandem MS limits the speed with which proteoforms can be identified. We suggest that once proteoforms have been identified by top-down analysis in a system of interest, and archived in a system-specific Proteoform Atlas, subsequent analyses in that system can utilize the Atlas information to enable simpler and faster MS1-only identifications. We explore this idea here, using the E. coli ribosome as a model system of limited complexity. We used deep top-down analysis to construct an E. coli ribosomal Proteoform Atlas containing 2099 proteoforms from 52 of the 54 proteins that make up the E. coli ribosome. We show that using the Atlas enables confident MS1-only identifications of E. coli ribosomal proteoforms from E. coli that were perturbed by exposure to cold. Furthermore, this Atlas strategy identifies proteoforms up to 77% more rapidly compared to top-down identifications that require acquisition of both MS1 and MS2 spectra.
Collapse
Affiliation(s)
- John G. Pavek
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| | - Isabella T. Whitworth
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| | - Lisa Nakayama
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| | - Brian L. Frey
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| |
Collapse
|
3
|
McWhite CD, Sae-Lee W, Yuan Y, Mallam AL, Gort-Freitas NA, Ramundo S, Onishi M, Marcotte EM. Alternative proteoforms and proteoform-dependent assemblies in humans and plants. Mol Syst Biol 2024; 20:933-951. [PMID: 38918600 PMCID: PMC11297038 DOI: 10.1038/s44320-024-00048-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 06/04/2024] [Accepted: 06/06/2024] [Indexed: 06/27/2024] Open
Abstract
The variability of proteins at the sequence level creates an enormous potential for proteome complexity. Exploring the depths and limits of this complexity is an ongoing goal in biology. Here, we systematically survey human and plant high-throughput bottom-up native proteomics data for protein truncation variants, where substantial regions of the full-length protein are missing from an observed protein product. In humans, Arabidopsis, and the green alga Chlamydomonas, approximately one percent of observed proteins show a short form, which we can assign by comparison to RNA isoforms as either likely deriving from transcript-directed processes or limited proteolysis. While some detected protein fragments align with known splice forms and protein cleavage events, multiple examples are previously undescribed, such as our observation of fibrocystin proteolysis and nuclear translocation in a green alga. We find that truncations occur almost entirely between structured protein domains, even when short forms are derived from transcript variants. Intriguingly, multiple endogenous protein truncations of phase-separating translational proteins resemble cleaved proteoforms produced by enteroviruses during infection. Some truncated proteins are also observed in both humans and plants, suggesting that they date to the last eukaryotic common ancestor. Finally, we describe novel proteoform-specific protein complexes, where the loss of a domain may accompany complex formation.
Collapse
Affiliation(s)
- Claire D McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA.
| | - Wisath Sae-Lee
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Yaning Yuan
- Department of Biology, Duke University, Durham, NC, 27708, USA
| | - Anna L Mallam
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, 78712, USA
| | | | - Silvia Ramundo
- Gregor Mendel Institute of Molecular Plant Biology, 1030, Wien, Austria
| | - Masayuki Onishi
- Department of Biology, Duke University, Durham, NC, 27708, USA
| | - Edward M Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
4
|
Frey C, Arad M, Ku K, Hare R, Balagtas R, Shi Y, Moon KM, Foster LJ, Ghafourifar G. Development of automated proteomic workflows utilizing silicon-based coupling agents. J Proteomics 2024; 303:105215. [PMID: 38843981 DOI: 10.1016/j.jprot.2024.105215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 05/28/2024] [Accepted: 06/03/2024] [Indexed: 06/16/2024]
Abstract
Automated methods for enzyme immobilization via 4-triethoxysilylbutyraldehyde (TESB) derived silicone-based coupling agents were developed. TESB and its oxidized derivative, 4-triethoxysilylbutanoic acid (TESBA), were determined to be the most effective. The resulting immobilized enzyme particles (IEPs) displayed robustness, rapid digestion, and immobilization efficiency of 51 ± 8%. Furthermore, we automated the IEP procedure, allowing for multiple enzymes, and/or coupling agents to be fabricated at once, in a fraction of the time via an Agilent Bravo. The automated trypsin TESB and TESBA IEPs were shown to rival a classical in-gel digestion method. Moreover, pepsin IEPs favored cleavage at leucine (>50%) over aromatic and methionine residues. The IEP method was then adapted for an in-situ immobilized enzyme microreactor (IMER) fabrication. We determined that TESBA could functionalize the silica capillary's inner wall while simultaneously acting as an enzyme coupler. The IMER digestion of bovine serum albumin (BSA), mirroring IEP digestion conditions, yielded a 33-40% primary sequence coverage per LC-MS/MS analysis in as little as 15 min. Overall, our findings underscore the potential of both IEP and IMER methods, paving the way for automated analysis and a reduction in enzyme waste through reuse, thereby contributing to a more cost-effective and timely study of the proteome. SIGNIFICANCE: This research introduces 4-triethoxysilylbutyraldehyde (TESB) and its derivatives as silicon-based enzyme coupling agents and an automated liquid handling method for bottom-up proteomics (BUP) while streamlining sample preparation for high-throughput processing. Additionally, immobilized enzyme particle (IEP) fabrication and digestion within the 96-well plate allows for flexibility in protocol where different enzyme-coupler combinations can be employed simultaneously. By enabling the digestion of entire microplates and reducing manual labor, the proposed method enhances reproducibility and offers a more efficient alternative to classical in-gel techniques. Furthermore, pepsin IEPs were noted to favor cleavage at leucine residues which represents an interesting finding when compared to the literature that warrants further study. The capability of immobilized enzyme microreactors (IMER) for rapid digestion (in as little as 15 min) demonstrated the system's efficiency and potential for rapid proteomic analysis. This advancement in BUP not only improves efficiency, but also opens avenues for a fully automated, mass spectrometry-integrated proteomics workflow, promising to expedite research and discoveries in complex biological studies.
Collapse
Affiliation(s)
- Connor Frey
- Department of Chemistry, University of the Fraser Valley, 33844 King Road, Abbotsford, BC V2S 7M8, Canada; Faculty of Medicine, University of British Columbia, 2194 Health Sciences Mall, Vancouver, BC V6T 1Z3, Canada.
| | - Maor Arad
- Department of Chemistry, University of the Fraser Valley, 33844 King Road, Abbotsford, BC V2S 7M8, Canada; Department of Biochemistry and Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T1Z4, Canada.
| | - Kenneth Ku
- Department of Chemistry, University of the Fraser Valley, 33844 King Road, Abbotsford, BC V2S 7M8, Canada
| | - Rhien Hare
- Department of Chemistry, University of the Fraser Valley, 33844 King Road, Abbotsford, BC V2S 7M8, Canada; Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada.
| | - Ronald Balagtas
- Department of Chemistry, University of the Fraser Valley, 33844 King Road, Abbotsford, BC V2S 7M8, Canada.
| | - Yuming Shi
- Department of Biochemistry and Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T1Z4, Canada.
| | - Kyung-Mee Moon
- Department of Biochemistry and Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T1Z4, Canada.
| | - Leonard J Foster
- Department of Biochemistry and Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T1Z4, Canada.
| | - Golfam Ghafourifar
- Department of Chemistry, University of the Fraser Valley, 33844 King Road, Abbotsford, BC V2S 7M8, Canada.
| |
Collapse
|
5
|
Peters-Clarke TM, Coon JJ, Riley NM. Instrumentation at the Leading Edge of Proteomics. Anal Chem 2024; 96:7976-8010. [PMID: 38738990 PMCID: PMC11996003 DOI: 10.1021/acs.analchem.3c04497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]
Affiliation(s)
- Trenton M. Peters-Clarke
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Joshua J. Coon
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, USA
- Morgridge Institute for Research, Madison, WI, USA
| | | |
Collapse
|
6
|
Carr AV, Bollis NE, Pavek JG, Shortreed MR, Smith LM. Spectral averaging with outlier rejection algorithms to increase identifications in top-down proteomics. Proteomics 2024; 24:e2300234. [PMID: 38487981 PMCID: PMC11216233 DOI: 10.1002/pmic.202300234] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 02/15/2024] [Accepted: 02/29/2024] [Indexed: 04/05/2024]
Abstract
The identification of proteoforms by top-down proteomics requires both high quality fragmentation spectra and the neutral mass of the proteoform from which the fragments derive. Intact proteoform spectra can be highly complex and may include multiple overlapping proteoforms, as well as many isotopic peaks and charge states. The resulting lower signal-to-noise ratios for intact proteins complicates downstream analyses such as deconvolution. Averaging multiple scans is a common way to improve signal-to-noise, but mass spectrometry data contains artifacts unique to it that can degrade the quality of an averaged spectra. To overcome these limitations and increase signal-to-noise, we have implemented outlier rejection algorithms to remove outlier measurements efficiently and robustly in a set of MS1 scans prior to averaging. We have implemented averaging with rejection algorithms in the open-source, freely available, proteomics search engine MetaMorpheus. Herein, we report the application of the averaging with rejection algorithms to direct injection and online liquid chromatography mass spectrometry data. Averaging with rejection algorithms demonstrated a 45% increase in the number of proteoforms detected in Jurkat T cell lysate. We show that the increase is due to improved spectral quality, particularly in regions surrounding isotopic envelopes.
Collapse
Affiliation(s)
- Austin V Carr
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Nicholas E Bollis
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - John G Pavek
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
7
|
Zhan Z, Wang L. Fast peak error correction algorithms for proteoform identification using top-down tandem mass spectra. Bioinformatics 2024; 40:btae149. [PMID: 38498847 PMCID: PMC11212493 DOI: 10.1093/bioinformatics/btae149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 03/05/2024] [Accepted: 03/15/2024] [Indexed: 03/20/2024] Open
Abstract
MOTIVATION Proteoform identification is an important problem in proteomics. The main task is to find a modified protein that best fits the input spectrum. To overcome the combinatorial explosion of possible proteoforms, the proteoform mass graph and spectrum mass graph are used to represent the protein database and the spectrum, respectively. The problem becomes finding an optimal alignment between the proteoform mass graph and the spectrum mass graph. Peak error correction is an important issue for computing an optimal alignment between the two input mass graphs. RESULTS We propose a faster algorithm for the error correction alignment of spectrum mass graph and proteoform mass graph problem and produce a program package TopMGFast. The newly designed algorithms require less space and running time so that we are able to compute global optimal alignments for the two input mass graphs in a reasonable time. For the local alignment version, experiments show that the running time of the new algorithm is reduced by 2.5 times. For the global alignment version, experiments show that the maximum mass errors between any pair of matched nodes in the alignments obtained by our method are within a small range as designed, while the alignments produced by the state-of-the-art method, TopMG, have very large maximum mass errors for many cases. The obtained alignment sizes are roughly the same for both TopMG and TopMGFast. Of course, TopMGFast needs more running time than TopMG. Therefore, our new algorithm can obtain more reliable global alignments within a reasonable time. This is the first time that global optimal error correction alignments can be obtained using real datasets. AVAILABILITY AND IMPLEMENTATION The source code of the algorithm is available at https://github.com/Zeirdo/TopMGFast.
Collapse
Affiliation(s)
- Zhaohui Zhan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
- City University of Hong Kong Shenzhen Research Institution, ShenZhen, 518057, China
| |
Collapse
|
8
|
Pavek JG, Frey BL, Frost DC, Gu TJ, Li L, Smith LM. Cysteine Counting via Isotopic Chemical Labeling for Intact Mass Proteoform Identifications in Tissue. Anal Chem 2023; 95:15245-15253. [PMID: 37791746 PMCID: PMC10637319 DOI: 10.1021/acs.analchem.3c02473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Top-down proteomics, the tandem mass spectrometric analysis of intact proteoforms, is the dominant method for proteoform characterization in complex mixtures. While this strategy produces detailed molecular information, it also requires extensive instrument time per mass spectrum obtained and thus compromises the depth of proteoform coverage that is accessible on liquid chromatography time scales. Such a top-down analysis is necessary for making original proteoform identifications, but once a proteoform has been confidently identified, the extensive characterization it provides may no longer be required for a subsequent identification of the same proteoform. We present a strategy to identify proteoforms in tissue samples on the basis of the combination of an intact mass determination with a measured count of the number of cysteine residues present in each proteoform. We developed and characterized a cysteine tagging chemistry suitable for the efficient and specific labeling of cysteine residues within intact proteoforms and for providing a count of the cysteine amino acids present. On simple protein mixtures, the tagging chemistry yields greater than 98% labeling of all cysteine residues, with a labeling specificity of greater than 95%. Similar results are observed on more complex samples. In a proof-of-principle study, proteoforms present in a human prostate tumor biopsy were characterized. Observed proteoforms, each characterized by an intact mass and a cysteine count, were grouped into proteoform families (groups of proteoforms originating from the same gene). We observed 2190 unique experimental proteoforms, 703 of which were grouped into 275 proteoform families.
Collapse
Affiliation(s)
- John G. Pavek
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| | - Brian L. Frey
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| | - Dustin C. Frost
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Ave, Madison, WI 53705
| | - Ting-Jia Gu
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Ave, Madison, WI 53705
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Ave, Madison, WI 53705
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave. Madison, WI 53706
| |
Collapse
|
9
|
Chen W, Ding Z, Zang Y, Liu X. Characterization of Proteoform Post-Translational Modifications by Top-Down and Bottom-Up Mass Spectrometry in Conjunction with Annotations. J Proteome Res 2023; 22:3178-3189. [PMID: 37728997 PMCID: PMC10563160 DOI: 10.1021/acs.jproteome.3c00207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Indexed: 09/22/2023]
Abstract
Many proteoforms can be produced from a gene due to genetic mutations, alternative splicing, post-translational modifications (PTMs), and other variations. PTMs in proteoforms play critical roles in cell signaling, protein degradation, and other biological processes. Mass spectrometry (MS) is the primary technique for investigating PTMs in proteoforms, and two alternative MS approaches, top-down and bottom-up, have complementary strengths. The combination of the two approaches has the potential to increase the sensitivity and accuracy in PTM identification and characterization. In addition, protein and PTM knowledge bases, such as UniProt, provide valuable information for PTM characterization and verification. Here, we present a software pipeline PTM-TBA (PTM characterization by Top-down and Bottom-up MS and Annotations) for identifying and localizing PTMs in proteoforms by integrating top-down and bottom-up MS as well as PTM annotations. We assessed PTM-TBA using a technical triplicate of bottom-up and top-down MS data of SW480 cells. On average, database search of the top-down MS data identified 2000 mass shifts, 814.5 (40.7%) of which were matched to 11 common PTMs and 423 of which were localized. Of the mass shifts identified by top-down MS, PTM-TBA verified 435 mass shifts using the bottom-up MS data and UniProt annotations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department
of BioHealth Informatics, Indiana University-Purdue
University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Zhengming Ding
- Department
of Computer Science, Tulane School of Science and Engineering, Tulane University, New Orleans, Louisiana 70118, United States
| | - Yong Zang
- Department
of Biostatics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Tulane
Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, Louisiana 70112, United States
- Deming Department
of Medicine, Tulane University, New Orleans, Louisiana 70112, United States
| |
Collapse
|
10
|
Naryzhny S. Quantitative Aspects of the Human Cell Proteome. Int J Mol Sci 2023; 24:8524. [PMID: 37239870 PMCID: PMC10218018 DOI: 10.3390/ijms24108524] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/06/2023] [Accepted: 05/08/2023] [Indexed: 05/28/2023] Open
Abstract
The number and identity of proteins and proteoforms presented in a single human cell (a cellular proteome) are fundamental biological questions. The answers can be found with sophisticated and sensitive proteomics methods, including advanced mass spectrometry (MS) coupled with separation by gel electrophoresis and chromatography. So far, bioinformatics and experimental approaches have been applied to quantitate the complexity of the human proteome. This review analyzed the quantitative information obtained from several large-scale panoramic experiments in which high-resolution mass spectrometry-based proteomics in combination with liquid chromatography or two-dimensional gel electrophoresis (2DE) were used to evaluate the cellular proteome. It is important that even though all these experiments were performed in different labs using different equipment and calculation algorithms, the main conclusion about the distribution of proteome components (proteins or proteoforms) was basically the same for all human tissues or cells. It follows Zipf's law and has a formula N = A/x, where N is the number of proteoforms, A is a coefficient, and x is the limit of proteoform detection in terms of abundance.
Collapse
Affiliation(s)
- Stanislav Naryzhny
- Institute of Biomedical Chemistry, Pogodinskaya Str. 10, 119121 Moscow, Russia;
- Petersburg Institute of Nuclear Physics (PNPI) of National Research Center “Kurchatov Institute”, 188300 Gatchina, Russia
| |
Collapse
|
11
|
Sun B, Liu Z, Liu J, Zhao S, Wang L, Wang F. The utility of proteases in proteomics, from sequence profiling to structure and function analysis. Proteomics 2023; 23:e2200132. [PMID: 36382392 DOI: 10.1002/pmic.202200132] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 11/08/2022] [Accepted: 11/08/2022] [Indexed: 11/18/2022]
Abstract
In mass spectrometry (MS)-based bottom-up proteomics, protease digestion plays an essential role in profiling both proteome sequences and post-translational modifications (PTMs). Trypsin is the gold standard in digesting intact proteins into small-size peptides, which are more suitable for high-performance liquid chromatography (HPLC) separation and tandem MS (MS/MS) characterization. However, protein sequences lacking Lys and Arg cannot be cleaved by trypsin and may be missed in conventional proteomic analysis. Proteases with cleavage sites complementary to trypsin are widely applied in proteomic analysis to greatly improve the coverage of proteome sequences and PTM sites. In this review, we survey the common and newly emerging proteases used in proteomics analysis mainly in the last 5 years, focusing on their unique cleavage features and specific proteomics applications such as missing protein characterization, new PTM discovery, and de novo sequencing. In addition, we summarize the applications of proteases in structural proteomics and protein function analysis in recent years. Finally, we discuss the future development directions of new proteases and applications in proteomics.
Collapse
Affiliation(s)
- Binwen Sun
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Zheyi Liu
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
| | - Jin Liu
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Division of Hepatobiliary and Pancreatic Surgery, Department of General Surgery, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Shan Zhao
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
| | - Liming Wang
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Division of Hepatobiliary and Pancreatic Surgery, Department of General Surgery, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Fangjun Wang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
- University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing, 100049, China
| |
Collapse
|
12
|
Miller RM, Millikin RJ, Rolfs Z, Shortreed MR, Smith LM. Enhanced Proteomic Data Analysis with MetaMorpheus. Methods Mol Biol 2023; 2426:35-66. [PMID: 36308684 PMCID: PMC9623450 DOI: 10.1007/978-1-0716-1967-4_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
MetaMorpheus is a free and open-source software program dedicated to the comprehensive analysis of proteomic data. In bottom-up proteomics, protein samples are digested into peptides prior to chromatographic separation and tandem mass spectrometric analysis. The resulting fragmentation spectra are subsequently analyzed with search software programs to obtain peptide identifications and infer the presence of proteins in the samples. MetaMorpheus seeks to maximize the information gleaned from proteomic data through the use of (a) mass calibration, (b) post-translational modification discovery, (c) multiple search algorithms, which aid in the analysis of data from traditional, crosslinking, and glycoproteomic experiments, (d) isotope-based or label-free quantification, (e) multi-protease protein inference, and (f) spectral annotation and data visualization capabilities. This protocol provides detailed descriptions of how use MetaMorpheus and how to customize data analysis workflows using MetaMorpheus tasks to meet the specific needs of the user.
Collapse
Affiliation(s)
- Rachel M Miller
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | - Robert J Millikin
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | - Zach Rolfs
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA
| | | | - Lloyd M Smith
- University of Wisconsin-Madison, Department of Chemistry, Madison, WI, USA.
| |
Collapse
|
13
|
McCool EN, Xu T, Chen W, Beller NC, Nolan SM, Hummon AB, Liu X, Sun L. Deep top-down proteomics revealed significant proteoform-level differences between metastatic and nonmetastatic colorectal cancer cells. SCIENCE ADVANCES 2022; 8:eabq6348. [PMID: 36542699 PMCID: PMC9770947 DOI: 10.1126/sciadv.abq6348] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 11/18/2022] [Indexed: 05/23/2023]
Abstract
Understanding cancer metastasis at the proteoform level is crucial for discovering previously unknown protein biomarkers for cancer diagnosis and drug development. We present the first top-down proteomics (TDP) study of a pair of isogenic human nonmetastatic and metastatic colorectal cancer (CRC) cell lines (SW480 and SW620). We identified 23,622 proteoforms of 2332 proteins from the two cell lines, representing nearly fivefold improvement in the number of proteoform identifications (IDs) compared to previous TDP datasets of human cancer cells. We revealed substantial differences between the SW480 and SW620 cell lines regarding proteoform and single amino acid variant (SAAV) profiles. Quantitative TDP unveiled differentially expressed proteoforms between the two cell lines, and the corresponding genes had diversified functions and were closely related to cancer. Our study represents a pivotal advance in TDP toward the characterization of human proteome in a proteoform-specific manner, which will transform basic and translational biomedical research.
Collapse
Affiliation(s)
- Elijah N. McCool
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, MI 48824, USA
| | - Tian Xu
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, MI 48824, USA
| | - Wenrong Chen
- Department of BioHealth Informatics, Indiana University–Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN 46202, USA
| | - Nicole C. Beller
- Department of Chemistry and Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, OH 43210, USA
| | - Scott M. Nolan
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, MI 48824, USA
| | - Amanda B. Hummon
- Department of Chemistry and Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, OH 43210, USA
- The Comprehensive Cancer Center, The Ohio State University, 500 West 12th Avenue, Columbus, OH 43210, USA
| | - Xiaowen Liu
- Deming Department of Medicine, School of Medicine, Tulane University, 1441 Canal Street, New Orleans, LA 70112, USA
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, MI 48824, USA
| |
Collapse
|
14
|
Yang M, Hu H, Su P, Thomas PM, Camarillo JM, Greer JB, Early BP, Fellers RT, Kelleher NL, Laskin J. Proteoform-Selective Imaging of Tissues Using Mass Spectrometry. Angew Chem Int Ed Engl 2022; 61:e202200721. [PMID: 35446460 PMCID: PMC9276647 DOI: 10.1002/anie.202200721] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Indexed: 01/28/2023]
Abstract
Unraveling the complexity of biological systems relies on the development of new approaches for spatially resolved proteoform‐specific analysis of the proteome. Herein, we employ nanospray desorption electrospray ionization mass spectrometry imaging (nano‐DESI MSI) for the proteoform‐selective imaging of biological tissues. Nano‐DESI generates multiply charged protein ions, which is advantageous for their structural characterization using tandem mass spectrometry (MS/MS) directly on the tissue. Proof‐of‐concept experiments demonstrate that nano‐DESI MSI combined with on‐tissue top‐down proteomics is ideally suited for the proteoform‐selective imaging of tissue sections. Using rat brain tissue as a model system, we provide the first evidence of differential proteoform expression in different regions of the brain.
Collapse
Affiliation(s)
- Manxi Yang
- Department of ChemistryPurdue University560 Oval DriveWest LafayetteIN 47907USA
| | - Hang Hu
- Department of ChemistryPurdue University560 Oval DriveWest LafayetteIN 47907USA
| | - Pei Su
- Department of ChemistryPurdue University560 Oval DriveWest LafayetteIN 47907USA
- Departments of Chemistry and Molecular BiosciencesNorthwestern University2145 Sheridan RoadEvanstonIL 60208USA
| | - Paul M. Thomas
- Departments of Chemistry and Molecular BiosciencesNorthwestern University2145 Sheridan RoadEvanstonIL 60208USA
| | - Jeannie M. Camarillo
- Departments of Chemistry and Molecular BiosciencesNorthwestern University2145 Sheridan RoadEvanstonIL 60208USA
| | - Joseph B. Greer
- Departments of Chemistry and Molecular BiosciencesNorthwestern University2145 Sheridan RoadEvanstonIL 60208USA
| | - Bryan P. Early
- Departments of Chemistry and Molecular BiosciencesNorthwestern University2145 Sheridan RoadEvanstonIL 60208USA
| | - Ryan T. Fellers
- Departments of Chemistry and Molecular BiosciencesNorthwestern University2145 Sheridan RoadEvanstonIL 60208USA
| | - Neil L. Kelleher
- Departments of Chemistry and Molecular BiosciencesNorthwestern University2145 Sheridan RoadEvanstonIL 60208USA
| | - Julia Laskin
- Department of ChemistryPurdue University560 Oval DriveWest LafayetteIN 47907USA
| |
Collapse
|
15
|
Yang M, Hu H, Su P, Thomas PM, Camarillo JM, Greer JB, Early BP, Fellers RT, Kelleher NL, Laskin J. Proteoform‐Selective Imaging of Tissues Using Mass Spectrometry. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.202200721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Manxi Yang
- Purdue University Department of Chemistry chemistry 560 Oval Dr. 47906 West Lafayette UNITED STATES
| | - Hang Hu
- Purdue University Chemistry UNITED STATES
| | - Pei Su
- Northwestern University Chemistry and Molecular Biosciences UNITED STATES
| | - Paul M. Thomas
- Northwestern University Chemistry and Molecular Biosciences UNITED STATES
| | | | - Joseph B. Greer
- Northwestern University Chemistry and Molecular Biosciences UNITED STATES
| | - Bryan P. Early
- Northwestern University Chemistry and Molecular Biosciences UNITED STATES
| | - Ryan T. Fellers
- Northwestern University Chemistry and Molecular Biosciences UNITED STATES
| | - Neil L. Kelleher
- Northwestern University Chemistry and Molecular Biosciences UNITED STATES
| | - Julia Laskin
- Purdue University Department of Chemistry 560 Oval Dr. 47907 West Lafayette UNITED STATES
| |
Collapse
|
16
|
Hollas MAR, Robey M, Fellers R, LeDuc R, Thomas P, Kelleher N. The Human Proteoform Atlas: a FAIR community resource for experimentally derived proteoforms. Nucleic Acids Res 2022; 50:D526-D533. [PMID: 34986596 PMCID: PMC8728143 DOI: 10.1093/nar/gkab1086] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/06/2021] [Accepted: 11/14/2021] [Indexed: 01/01/2023] Open
Abstract
The Human Proteoform Atlas (HPfA) is a web-based repository of experimentally verified human proteoforms on-line at http://human-proteoform-atlas.org and is a direct descendant of the Consortium of Top-Down Proteomics' (CTDP) Proteoform Atlas. Proteoforms are the specific forms of protein molecules expressed by our cells and include the unique combination of post-translational modifications (PTMs), alternative splicing and other sources of variation deriving from a specific gene. The HPfA uses a FAIR system to assign persistent identifiers to proteoforms which allows for redundancy calling and tracking from prior and future studies in the growing community of proteoform biology and measurement. The HPfA is organized around open ontologies and enables flexible classification of proteoforms. To achieve this, a public registry of experimentally verified proteoforms was also created. Submission of new proteoforms can be processed through email vianrtdphelp@northwestern.edu, and future iterations of these proteoform atlases will help to organize and assign function to proteoforms, their PTMs and their complexes in the years ahead.
Collapse
Affiliation(s)
- Michael A R Hollas
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Matthew T Robey
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Ryan T Fellers
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Richard D LeDuc
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Paul M Thomas
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| | - Neil L Kelleher
- Departments of Molecular Biosciences, Chemistry, and the Chemistry of Life Processes Institute, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
17
|
Schaffer LV, Shortreed MR, Smith LM. Proteoform Analysis and Construction of Proteoform Families in Proteoform Suite. Methods Mol Biol 2022; 2500:67-81. [PMID: 35657588 PMCID: PMC9694099 DOI: 10.1007/978-1-0716-2325-1_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Proteoform Suite is an interactive software program for the identification and quantification of intact proteoforms from mass spectrometry data. Proteoform Suite identifies proteoforms observed by intact-mass (MS1) analysis. In intact-mass analysis, unfragmented experimental proteoforms are compared to a database of known proteoform sequences and to one another, searching for mass differences corresponding to well-known post-translational modifications or amino acids. Intact-mass analysis enables proteoforms observed in the MS1 data without MS/MS (MS2) fragmentation to be identified. Proteoform Suite further facilitates the construction and visualization of proteoform families, which are the sets of proteoforms derived from individual genes. Bottom-up peptide identifications and top-down (MS2) proteoform identifications can be integrated into the Proteoform Suite analysis to increase the sensitivity and accuracy of the analysis. Proteoform Suite is open source and freely available at https://github.com/smith-chem-wisc/proteoform-suite .
Collapse
Affiliation(s)
- Leah V Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | | | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
18
|
Rolfs Z, Smith LM. Internal Fragment Ions Disambiguate and Increase Identifications in Top-Down Proteomics. J Proteome Res 2021; 20:5412-5418. [PMID: 34738820 DOI: 10.1021/acs.jproteome.1c00599] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A large fraction of observed fragment ion intensity remains unidentified in top-down proteomics. The elucidation of these unknown fragment ions could enable researchers to identify additional proteoforms and reduce proteoform ambiguity in their analyses. Internal fragment ions have received considerable attention as a major source of these unidentified fragment ions. Internal fragments are product ions that contain neither protein terminus, in contrast with terminal ions that contain a single terminus. There are many more possible internal fragments than terminal fragments, and the resulting computational complexity has historically limited the application of internal fragment ions to low-complexity samples containing only one or a few proteins of interest. We implemented internal fragment ion functionality in MetaMorpheus to allow the proteome-wide annotation of internal fragment ions. MetaMorpheus first uses terminal fragment ions to identify putative proteoforms and then employs internal fragment ions to disambiguate similar proteoforms. In the analysis of mammalian cell lysates, we found that MetaMorpheus could disambiguate over half of its previously ambiguous proteoforms while also providing up to a 7% increase in proteoform-spectrum matches identified at a 1% false discovery rate.
Collapse
Affiliation(s)
- Zach Rolfs
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
19
|
Abstract
Proteoform identification is required to fully understand the biological diversity present in a sample. However, these identifications are often ambiguous because of the challenges in analyzing full length proteins by mass spectrometry. A five-level proteoform classification system was recently developed to delineate the ambiguity of proteoform identifications and to allow for comparisons across software platforms and acquisition methods. Widespread adoption of this system requires software tools to provide classification of the proteoform identifications. We describe here an implementation of the five-level classification system in the software program MetaMorpheus, which provides both bottom-up and top-down identifications. Additionally, we developed a stand-alone program called ProteoformClassifier that allows users to classify proteoform results from any search program, provided that the program writes output that includes the information necessary to evaluate proteoform ambiguity. This stand-alone program includes a small test file and database to evaluate if a given program provides sufficient information to evaluate ambiguity. If the program does not, then ProteoformClassifier provides meaningful feedback to assist developers with implementing the classification system. We tested currently available top-down software programs and found that none of them (other than MetaMorpheus) provided sufficient information regarding identification ambiguity to permit classification.
Collapse
Affiliation(s)
- Zach Rolfs
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
20
|
Lu L, Scalf M, Shortreed MR, Smith LM. Mesh Fragmentation Improves Dissociation Efficiency in Top-down Proteomics. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1319-1325. [PMID: 33754701 PMCID: PMC8783543 DOI: 10.1021/jasms.0c00462] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Top-down proteomics is a key mass spectrometry-based technology for comprehensive analysis of proteoforms. Proteoforms exhibit multiple high charge states and isotopic forms in full MS scans. The dissociation behavior of proteoforms in different charge states and subjected to different collision energies is highly variable. The current widely employed data-dependent acquisition (DDA) method selects a narrow m/z range (corresponding to a single proteoform charge state) for dissociation from the most abundant precursors. We describe here Mesh, a novel dissociation strategy, to dissociate multiple charge states of one proteoform with multiple collision energies. We show that the Mesh strategy has the potential to generate fragment ions with improved sequence coverage and improve identification ratios in top-down proteomic analyses of complex samples. The strategy is implemented within an open-source instrument control software program named MetaDrive to perform real time deconvolution and precursor selection.
Collapse
Affiliation(s)
- Lei Lu
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706, United States
| | - Michael R. Shortreed
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706, United States
| | - Lloyd M. Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706, United States
- Corresponding Author Phone: (608) 263-2594. Fax: (608) 265-6780.
| |
Collapse
|
21
|
Schaffer LV, Anderson LC, Butcher DS, Shortreed MR, Miller RM, Pavelec C, Smith LM. Construction of Human Proteoform Families from 21 Tesla Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Top-Down Proteomic Data. J Proteome Res 2020; 20:317-325. [PMID: 33074679 DOI: 10.1021/acs.jproteome.0c00403] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Identification of proteoforms, the different forms of a protein, is important to understand biological processes. A proteoform family is the set of different proteoforms from the same gene. We previously developed the software program Proteoform Suite, which constructs proteoform families and identifies proteoforms by intact-mass analysis. Here, we have applied this approach to top-down proteomic data acquired at the National High Magnetic Field Laboratory 21 tesla Fourier transform ion cyclotron resonance mass spectrometer (data available on the MassIVE platform with identifier MSV000085978). We explored the ability to construct proteoform families and identify proteoforms from the high mass accuracy data that this instrument provides for a complex cell lysate sample from the MCF-7 human breast cancer cell line. There were 2830 observed experimental proteforms, of which 932 were identified, 44 were ambiguous, and 1854 were unidentified. Of the 932 unique identified proteoforms, 766 were identified by top-down MS2 analysis at 1% false discovery rate (FDR) using TDPortal, and 166 were additional intact-mass identifications (∼4.7% calculated global FDR) made using Proteoform Suite. We recently published a proteoform level schema to represent ambiguity in proteoform identifications. We implemented this proteoform level classification in Proteoform Suite for intact-mass identifications, which enables users to determine the ambiguity levels and sources of ambiguity for each intact-mass proteoform identification.
Collapse
Affiliation(s)
- Leah V Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lissa C Anderson
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Tallahassee, Florida 32310, United States
| | - David S Butcher
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Tallahassee, Florida 32310, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Caitlin Pavelec
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
22
|
Cesnik AJ, Miller RM, Ibrahim K, Lu L, Millikin RJ, Shortreed MR, Frey BL, Smith LM. Spritz: A Proteogenomic Database Engine. J Proteome Res 2020; 20:1826-1834. [PMID: 32967423 DOI: 10.1021/acs.jproteome.0c00407] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Proteoforms are the workhorses of the cell, and subtle differences between their amino acid sequences or post-translational modifications (PTMs) can change their biological function. To most effectively identify and quantify proteoforms in genetically diverse samples by mass spectrometry (MS), it is advantageous to search the MS data against a sample-specific protein database that is tailored to the sample being analyzed, in that it contains the correct amino acid sequences and relevant PTMs for that sample. To this end, we have developed Spritz (https://smith-chem-wisc.github.io/Spritz/), an open-source software tool for generating protein databases annotated with sequence variations and PTMs. We provide a simple graphical user interface for Windows and scripts that can be run on any operating system. Spritz automatically sets up and executes approximately 20 tools, which enable the construction of a proteogenomic database from only raw RNA sequencing data. Sequence variations that are discovered in RNA sequencing data upon comparison to the Ensembl reference genome are annotated on proteins in these databases, and PTM annotations are transferred from UniProt. Modifications can also be discovered and added to the database using bottom-up mass spectrometry data and global PTM discovery in MetaMorpheus. We demonstrate that such sample-specific databases allow the identification of variant peptides, modified variant peptides, and variant proteoforms by searching bottom-up and top-down proteomic data from the Jurkat human T lymphocyte cell line and demonstrate the identification of phosphorylated variant sites with phosphoproteomic data from the U2OS human osteosarcoma cell line.
Collapse
Affiliation(s)
- Anthony J Cesnik
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States.,Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm 17121, Sweden.,Department of Genetics, Stanford University, Stanford, California 94305, United States.,Chan Zuckerberg Biohub, San Francisco, California 94158, United States
| | - Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Khairina Ibrahim
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lei Lu
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Brian L Frey
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
23
|
Schaffer LV, Millikin RJ, Shortreed MR, Scalf M, Smith LM. Improving Proteoform Identifications in Complex Systems Through Integration of Bottom-Up and Top-Down Data. J Proteome Res 2020; 19:3510-3517. [PMID: 32584579 DOI: 10.1021/acs.jproteome.0c00332] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Cellular functions are performed by a vast and diverse set of proteoforms. Proteoforms are the specific forms of proteins produced as a result of genetic variations, RNA splicing, and post-translational modifications (PTMs). Top-down mass spectrometric analysis of intact proteins enables proteoform identification, including proteoforms derived from sequence cleavage events or harboring multiple PTMs. In contrast, bottom-up proteomics identifies peptides, which necessitates protein inference and does not yield proteoform identifications. We seek here to exploit the synergies between these two data types to improve the quality and depth of the overall proteomic analysis. To this end, we automated the large-scale integration of results from multiprotease bottom-up and top-down analyses in the software program Proteoform Suite and applied it to the analysis of proteoforms from the human Jurkat T lymphocyte cell line. We implemented the recently developed proteoform-level classification scheme for top-down tandem mass spectrometry (MS/MS) identifications in Proteoform Suite, which enables users to observe the level and type of ambiguity for each proteoform identification, including which of the ambiguous proteoform identifications are supported by bottom-up-level evidence. We used Proteoform Suite to find instances where top-down identifications aid in protein inference from bottom-up analysis and conversely where bottom-up peptide identifications aid in proteoform PTM localization. We also show the use of bottom-up data to infer proteoform candidates potentially present in the sample, allowing confirmation of such proteoform candidates by intact-mass analysis of MS1 spectra. The implementation of these capabilities in the freely available software program Proteoform Suite enables users to integrate large-scale top-down and bottom-up data sets and to utilize the synergies between them to improve and extend the proteomic analysis.
Collapse
Affiliation(s)
- Leah V Schaffer
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Robert J Millikin
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Mark Scalf
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|