1
|
Tuttle LM, Klevit RE, Guttman M. A framework for automated multimodal HDX-MS analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.13.643099. [PMID: 40161831 PMCID: PMC11952558 DOI: 10.1101/2025.03.13.643099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
We present pyHXExpress, a customizable codebase for automated high-throughput multimodal analysis of all spectra generated from HDX-MS experiments. The workflow was validated against a synthetic test dataset to test the fitting algorithms and to confirm the statistical outputs. We further establish a framework for the determination of multimodality throughout a protein system by rigorous evaluation of multimodal fits across all peptide spectra. We demonstrate this approach using entire protein datasets to detect multimodality, conformational heterogeneity, and characterize dynamics of small heat shock protein HSPB5 and two disease mutants.
Collapse
Affiliation(s)
- Lisa M. Tuttle
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Rachel E. Klevit
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | - Miklos Guttman
- Department of Medicinal Chemistry, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
2
|
Grimaud A, Babović M, Holck FH, Jensen ON, Schwämmle V. How to Deal With Internal Fragment Ions? Mol Cell Proteomics 2025; 24:100896. [PMID: 39954811 DOI: 10.1016/j.mcpro.2024.100896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 10/29/2024] [Accepted: 12/15/2024] [Indexed: 02/17/2025] Open
Abstract
Tandem mass spectrometry of peptides and proteins generates 3mass spectra of their gas-phase fragmentation product ions, including N-terminal, C-terminal, and internal fragment ions. While N- and C-terminal ions are routinely assigned and identified using computational methods, internal fragment ions are often difficult to annotate correctly. They become particularly relevant for long peptides and full proteoforms where the peptide backbone is more likely to be fragmented multiple times. Internal fragment ions potentially offer tremendous information regarding amino acid sequences and positions of post-translational modifications of peptides and intact proteins. However, their practical application is challenged by the vast number of theoretical internal fragments that exist for long amino acid sequences, leading to a high risk of false-positive annotations. We analyze the mass spectral contributions of internal fragment ions in spectra from middle-down and top-down experiments and introduce a novel graph-based annotation approach designed to manage the complexity of internal fragments. Our graph-based representation allows us to compare multiple candidate proteoforms in a single graph, and to assess different candidate annotations in a fragment ion spectrum. We demonstrate cases from middle-down and top-down data where internal ions enhance amino acid sequence coverage of polypeptides and proteins and accurate localization of post-translational modifications. We conclude that our graph-based method provides a general approach to process complex tandem mass spectra, enhance annotation of internal fragment ions, and improve proteoform sequencing and characterization by mass spectrometry.
Collapse
Affiliation(s)
- Arthur Grimaud
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark.
| | - Maša Babović
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Frederik Haugaard Holck
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Ole N Jensen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
3
|
Tavis SL, Keller MJ, Stai AJ, Rush TA, Hettich RL. LipoCLEAN: A Machine Learning Filter to Improve Untargeted Lipid Identification Confidence. Anal Chem 2025; 97:255-261. [PMID: 39710937 DOI: 10.1021/acs.analchem.4c04040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
In untargeted lipidomics experiments, putative lipid identifications generated by automated analysis software require substantial manual filtering to arrive at usable high-confidence data. However, identification software tools do not make full use of the available data to assess the quality of lipid identifications. Here, we present a machine-learning-based model to provide coherent, holistic quality scores based on multiple lines of evidence. Underutilized metrics such as isotope ratios and chromatographic behavior allow for much higher accuracy of identification confidence. We find that approximately 50% of tandem mass spectrometry-based automated lipid identifications are incorrect but that multidimensional rescoring reduces false discoveries to only 7% while retaining 80% of true positives. Our method works with most chromatography methods and generalizes across a family of MS instruments. LipoCLEAN is available at https://github.com/stavis1/LipoCLEAN.
Collapse
Affiliation(s)
- Steven L Tavis
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Matthew J Keller
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Andrew J Stai
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Tomás A Rush
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
| |
Collapse
|
4
|
Durbin KR, Robey MT, Voong LN, Fellers RT, Lutomski CA, El-Baba TJ, Robinson CV, Kelleher NL. ProSight Native: Defining Protein Complex Composition from Native Top-Down Mass Spectrometry Data. J Proteome Res 2023; 22:2660-2668. [PMID: 37436406 PMCID: PMC10407923 DOI: 10.1021/acs.jproteome.3c00171] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Indexed: 07/13/2023]
Abstract
Native mass spectrometry has recently moved alongside traditional structural biology techniques in its ability to provide clear insights into the composition of protein complexes. However, to date, limited software tools are available for the comprehensive analysis of native mass spectrometry data on protein complexes, particularly for experiments aimed at elucidating the composition of an intact protein complex. Here, we introduce ProSight Native as a start-to-finish informatics platform for analyzing native protein and protein complex data. Combining mass determination via spectral deconvolution with a top-down database search and stoichiometry calculations, ProSight Native can determine the complete composition of protein complexes. To demonstrate its features, we used ProSight Native to successfully determine the composition of the homotetrameric membrane complex Aquaporin Z. We also revisited previously published spectra and were able to decipher the composition of a heterodimer complex bound with two noncovalently associated ligands. In addition to determining complex composition, we developed new tools in the software for validating native mass spectrometry fragment ions and mapping top-down fragmentation data onto three-dimensional protein structures. Taken together, ProSight Native will reduce the informatics burden on the growing field of native mass spectrometry, enabling the technology to further its reach.
Collapse
Affiliation(s)
| | | | - Lilien N. Voong
- Proteinaceous,
Inc., Evanston, Illinois 60201, United States
| | - Ryan T. Fellers
- Proteinaceous,
Inc., Evanston, Illinois 60201, United States
- Northwestern
University, Evanston, Illinois 60208, United States
| | - Corinne A. Lutomski
- Department
of Chemistry, University of Oxford, 12 Mansfield Rd. Oxford OX1 3TA, U.K.
- Kavli
Institute for NanoScience Discovery, Dorothy
Crowfoot Hodgkin Building University of Oxford, Oxford OX1 3QU, U.K.
| | - Tarick J. El-Baba
- Department
of Chemistry, University of Oxford, 12 Mansfield Rd. Oxford OX1 3TA, U.K.
- Kavli
Institute for NanoScience Discovery, Dorothy
Crowfoot Hodgkin Building University of Oxford, Oxford OX1 3QU, U.K.
| | - Carol V. Robinson
- Department
of Chemistry, University of Oxford, 12 Mansfield Rd. Oxford OX1 3TA, U.K.
- Kavli
Institute for NanoScience Discovery, Dorothy
Crowfoot Hodgkin Building University of Oxford, Oxford OX1 3QU, U.K.
| | - Neil L. Kelleher
- Proteinaceous,
Inc., Evanston, Illinois 60201, United States
- Northwestern
University, Evanston, Illinois 60208, United States
| |
Collapse
|
5
|
Agten A, Prostko P, Geubbelmans M, Liu Y, De Vijlder T, Valkenborg D. A Compositional Model to Predict the Aggregated Isotope Distribution for Average DNA and RNA Oligonucleotides. Metabolites 2021; 11:400. [PMID: 34207227 PMCID: PMC8234063 DOI: 10.3390/metabo11060400] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 06/12/2021] [Accepted: 06/15/2021] [Indexed: 12/23/2022] Open
Abstract
Structural modifications of DNA and RNA molecules play a pivotal role in epigenetic and posttranscriptional regulation. To characterise these modifications, more and more MS and MS/MS- based tools for the analysis of nucleic acids are being developed. To identify an oligonucleotide in a mass spectrum, it is useful to compare the obtained isotope pattern of the molecule of interest to the one that is theoretically expected based on its elemental composition. However, this is not straightforward when the identity of the molecule under investigation is unknown. Here, we present a modelling approach for the prediction of the aggregated isotope distribution of an average DNA or RNA molecule when a particular (monoisotopic) mass is available. For this purpose, a theoretical database of all possible DNA/RNA oligonucleotides up to a mass of 25 kDa is created, and the aggregated isotope distribution for the entire database of oligonucleotides is generated using the BRAIN algorithm. Since this isotope information is compositional in nature, the modelling method is based on the additive log-ratio analysis of Aitchison. As a result, a univariate weighted polynomial regression model of order 10 is fitted to predict the first 20 isotope peaks for DNA and RNA molecules. The performance of the prediction model is assessed by using a mean squared error approach and a modified Pearson's χ2 goodness-of-fit measure on experimental data. Our analysis has indicated that the variability in spectral accuracy contributed more to the errors than the approximation of the theoretical isotope distribution by our proposed average DNA/RNA model. The prediction model is implemented as an online tool. An R function can be downloaded to incorporate the method in custom analysis workflows to process mass spectral data.
Collapse
Affiliation(s)
- Annelies Agten
- Data Science Institute, UHasselt—Hasselt University, Agoralaan 1, BE 3590 Diepenbeek, Belgium; (A.A.); (P.P.); (M.G.)
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Agoralaan 1, BE 3590 Diepenbeek, Belgium
| | - Piotr Prostko
- Data Science Institute, UHasselt—Hasselt University, Agoralaan 1, BE 3590 Diepenbeek, Belgium; (A.A.); (P.P.); (M.G.)
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Agoralaan 1, BE 3590 Diepenbeek, Belgium
| | - Melvin Geubbelmans
- Data Science Institute, UHasselt—Hasselt University, Agoralaan 1, BE 3590 Diepenbeek, Belgium; (A.A.); (P.P.); (M.G.)
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Agoralaan 1, BE 3590 Diepenbeek, Belgium
| | - Youzhong Liu
- Chemical & Pharmaceutical Development & Supply, Janssen Research & Development, Turnhoutseweg 30, BE 2340 Beerse, Belgium; (Y.L.); (T.D.V.)
| | - Thomas De Vijlder
- Chemical & Pharmaceutical Development & Supply, Janssen Research & Development, Turnhoutseweg 30, BE 2340 Beerse, Belgium; (Y.L.); (T.D.V.)
| | - Dirk Valkenborg
- Data Science Institute, UHasselt—Hasselt University, Agoralaan 1, BE 3590 Diepenbeek, Belgium; (A.A.); (P.P.); (M.G.)
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat), Agoralaan 1, BE 3590 Diepenbeek, Belgium
| |
Collapse
|
6
|
Affiliation(s)
- Mateusz K. Łącki
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, Mainz 55131, Germany
| | - Dirk Valkenborg
- Data Science Institute, Hasselt University, BE3500 Hasselt, Belgium
- Interuniversity Institute of Biostatistics and Statistical Bioinformatics, Hasselt University, BE3500 Hasselt, Belgium
- Center for Proteomics, University of Antwerp, 2000 Antwerp, Belgium
- Applied Bio and Molecular Systems, Flemish Institute for Technological Research (VITO), 2400 Mol, Belgium
| | - Michał P. Startek
- Department of Mathematics, Informatics, and Mechanics, University of Warsaw, 02-097 Warsaw, Poland
| |
Collapse
|
7
|
Lermyte F, Dittwald P, Claesen J, Baggerman G, Sobott F, O'Connor PB, Laukens K, Hooyberghs J, Gambin A, Valkenborg D. MIND: A Double-Linear Model To Accurately Determine Monoisotopic Precursor Mass in High-Resolution Top-Down Proteomics. Anal Chem 2019; 91:10310-10319. [PMID: 31283196 DOI: 10.1021/acs.analchem.9b02682] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Top-down proteomics approaches are becoming ever more popular, due to the advantages offered by knowledge of the intact protein mass in correctly identifying the various proteoforms that potentially arise due to point mutation, alternative splicing, post-translational modifications, etc. Usually, the average mass is used in this context; however, it is known that this can fluctuate significantly due to both natural and technical causes. Ideally, one would prefer to use the monoisotopic precursor mass, but this falls below the detection limit for all but the smallest proteins. Methods that predict the monoisotopic mass based on the average mass are potentially affected by imprecisions associated with the average mass. To address this issue, we have developed a framework based on simple, linear models that allows prediction of the monoisotopic mass based on the exact mass of the most-abundant (aggregated) isotope peak, which is a robust measure of mass, insensitive to the aforementioned natural and technical causes. This linear model was tested experimentally, as well as in silico, and typically predicts monoisotopic masses with an accuracy of only a few parts per million. A confidence measure is associated with the predicted monoisotopic mass to handle the off-by-one-Da prediction error. Furthermore, we introduce a correction function to extract the "true" (i.e., theoretically) most-abundant isotope peak from a spectrum, even if the observed isotope distribution is distorted by noise or poor ion statistics. The method is available online as an R shiny app: https://valkenborg-lab.shinyapps.io/mind/.
Collapse
Affiliation(s)
- Frederik Lermyte
- Biomolecular and Analytical Mass Spectrometry Group, Department of Chemistry , University of Antwerp , 2000 Antwerp , Belgium.,UA-VITO Center for Proteomics , University of Antwerp , 2000 Antwerp , Belgium.,School of Engineering , University of Warwick , Coventry CV4 7AL , United Kingdom.,Department of Chemistry , University of Warwick , Coventry CV4 7AL , United Kingdom
| | - Piotr Dittwald
- Institute of Informatics , University of Warsaw , 00-927 Warsaw , Poland
| | - Jürgen Claesen
- Interuniversity Institute of Biostatistics and Statistical Bioinformatics , Hasselt University , BE3500 Hasselt , Belgium
| | - Geert Baggerman
- UA-VITO Center for Proteomics , University of Antwerp , 2000 Antwerp , Belgium.,Applied Bio and Molecular Systems , Flemish Institute for Technological Research (VITO) , 2400 Mol , Belgium
| | - Frank Sobott
- Biomolecular and Analytical Mass Spectrometry Group, Department of Chemistry , University of Antwerp , 2000 Antwerp , Belgium.,Astbury Centre for Structural Molecular Biology , University of Leeds , Leeds LS2 9JT , United Kingdom.,School of Molecular and Cellular Biology , University of Leeds , Leeds LS2 9JT , United Kingdom
| | - Peter B O'Connor
- Department of Chemistry , University of Warwick , Coventry CV4 7AL , United Kingdom
| | - Kris Laukens
- Adrem Data Lab, Department of Mathematics and Computer Science , University of Antwerp , 2000 Antwerp , Belgium.,Biomedical Informatics Network Antwerp (biomina) , University of Antwerp , 2000 Antwerp , Belgium
| | - Jef Hooyberghs
- Applied Bio and Molecular Systems , Flemish Institute for Technological Research (VITO) , 2400 Mol , Belgium
| | - Anna Gambin
- Institute of Informatics , University of Warsaw , 00-927 Warsaw , Poland
| | - Dirk Valkenborg
- UA-VITO Center for Proteomics , University of Antwerp , 2000 Antwerp , Belgium.,Interuniversity Institute of Biostatistics and Statistical Bioinformatics , Hasselt University , BE3500 Hasselt , Belgium.,Applied Bio and Molecular Systems , Flemish Institute for Technological Research (VITO) , 2400 Mol , Belgium
| |
Collapse
|
8
|
Lai CJS, Tan T, Zeng SL, Qi LW, Liu XG, Dong X, Li P, Liu EH. An integrated high resolution mass spectrometric data acquisition method for rapid screening of saponins in Panax notoginseng (Sanqi). J Pharm Biomed Anal 2015; 109:184-91. [PMID: 25778929 DOI: 10.1016/j.jpba.2015.02.028] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Revised: 02/14/2015] [Accepted: 02/16/2015] [Indexed: 01/10/2023]
Abstract
The aim of this study was to develop a convenient method without pretreatments for nontarget discovery of interested compounds. The segment and exposure strategy, coupled with two mass spectrometer data acquisition methods was firstly proposed for screening the saponins in extract of Panax notoginseng (Sanqi) via high-performance liquid chromatography tandem quadrupole time-of-flight mass spectrometry (HPLC-QTOF/MS). By gradually removing certain major or moderate interference compounds, the developed segment and exposure strategy could significantly improve the detection efficiency for trace compounds. Moreover, the newly developed five-point screening approach based on a modified mass defect filter strategy and the visual isotopic ion technique was verified to be efficient and reliable in picking out the interested precursor ions. In total, 234 ginsenosides including 67 potential new ones were characterized or tentatively identified from the extract of Sanqi. Particularly, some unusual compounds containing the branched glycosyl group or new substituted acyl groups were firstly reported. The proposed integrated strategy held a strong promise for analyses of the complex mixtures.
Collapse
Affiliation(s)
- Chang-Jiang-Sheng Lai
- State Key Laboratory of Natural Medicines (China Pharmaceutical University), No. 24 Tongjia lane, Nanjing 210009, China
| | - Ting Tan
- State Key Laboratory of Natural Medicines (China Pharmaceutical University), No. 24 Tongjia lane, Nanjing 210009, China
| | - Su-Ling Zeng
- State Key Laboratory of Natural Medicines (China Pharmaceutical University), No. 24 Tongjia lane, Nanjing 210009, China
| | - Lian-Wen Qi
- State Key Laboratory of Natural Medicines (China Pharmaceutical University), No. 24 Tongjia lane, Nanjing 210009, China
| | - Xin-Guang Liu
- State Key Laboratory of Natural Medicines (China Pharmaceutical University), No. 24 Tongjia lane, Nanjing 210009, China
| | - Xin Dong
- State Key Laboratory of Natural Medicines (China Pharmaceutical University), No. 24 Tongjia lane, Nanjing 210009, China
| | - Ping Li
- State Key Laboratory of Natural Medicines (China Pharmaceutical University), No. 24 Tongjia lane, Nanjing 210009, China.
| | - E-Hu Liu
- State Key Laboratory of Natural Medicines (China Pharmaceutical University), No. 24 Tongjia lane, Nanjing 210009, China.
| |
Collapse
|