1
|
Schmid R, Heuckeroth S, Korf A, Smirnov A, Myers O, Dyrlund TS, Bushuiev R, Murray KJ, Hoffmann N, Lu M, Sarvepalli A, Zhang Z, Fleischauer M, Dührkop K, Wesner M, Hoogstra SJ, Rudt E, Mokshyna O, Brungs C, Ponomarov K, Mutabdžija L, Damiani T, Pudney CJ, Earll M, Helmer PO, Fallon TR, Schulze T, Rivas-Ubach A, Bilbao A, Richter H, Nothias LF, Wang M, Orešič M, Weng JK, Böcker S, Jeibmann A, Hayen H, Karst U, Dorrestein PC, Petras D, Du X, Pluskal T. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat Biotechnol 2023; 41:447-449. [PMID: 36859716 PMCID: PMC10496610 DOI: 10.1038/s41587-023-01690-2] [Citation(s) in RCA: 91] [Impact Index Per Article: 91.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Affiliation(s)
- Robin Schmid
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Steffen Heuckeroth
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Ansgar Korf
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Aleksandr Smirnov
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Owen Myers
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | | | - Roman Bushuiev
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Kevin J Murray
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota - Twin Cities, Minneapolis, MN, USA
| | - Nils Hoffmann
- Institute for Bio- and Geosciences (IBG-5), Forschungszentrum Jülich GmbH, Jülich, Germany
| | - Miaoshan Lu
- School of Engineering, Westlake University, Hangzhou, China
| | - Abinesh Sarvepalli
- BlockLab, Center for Large Datasystems Research, San Diego Supercomputer Center, La Jolla, CA, USA
| | - Zheng Zhang
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Markus Fleischauer
- Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Kai Dührkop
- Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Mark Wesner
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Shawn J Hoogstra
- Agriculture and Agri-Food Canada, London Research and Development Centre, London, Ontario, Canada
| | - Edward Rudt
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Olena Mokshyna
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Corinna Brungs
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Kirill Ponomarov
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Lana Mutabdžija
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Tito Damiani
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Chris J Pudney
- Datacraft Technologies, Mosman Park, Washington, Western Australia, Australia
| | - Mark Earll
- Analytical Solutions Group, Product Technology and Engineering, Jealott's Hill International Research Centre, Bracknell, UK
| | - Patrick O Helmer
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Timothy R Fallon
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Tobias Schulze
- Department of Effect-Directed Analysis, Helmholtz Centre for Environmental Research - UFZ, Leipzig, Germany
| | - Albert Rivas-Ubach
- Ecology and Forest Genetics, Institute of Forest Sciences (ICIFOR-INIA-CSIC), Madrid, Spain
| | - Aivett Bilbao
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Henning Richter
- Clinic for Diagnostic Imaging, Diagnostic Imaging Research Unit (DIRU), University of Zurich, Zürich, Switzerland
| | - Louis-Félix Nothias
- School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, USA
| | - Matej Orešič
- School of Medical Sciences, Örebro University, Örebro, Sweden
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Jing-Ke Weng
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich Schiller University Jena, Jena, Germany
| | - Astrid Jeibmann
- Institute of Neuropathology, University Hospital Münster, Münster, Germany
| | - Heiko Hayen
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Uwe Karst
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Daniel Petras
- CMFI Cluster of Excellence, University of Tuebingen, Tuebingen, Germany
| | - Xiuxia Du
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Tomáš Pluskal
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
2
|
Dührkop K, Nothias LF, Fleischauer M, Reher R, Ludwig M, Hoffmann MA, Petras D, Gerwick WH, Rousu J, Dorrestein PC, Böcker S. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat Biotechnol 2021; 39:462-471. [PMID: 33230292 DOI: 10.1038/s41587-020-0740-8] [Citation(s) in RCA: 233] [Impact Index Per Article: 77.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022]
Abstract
Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level.
Collapse
Affiliation(s)
- Kai Dührkop
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Louis-Félix Nothias
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | | - Raphael Reher
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - Marcus Ludwig
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
| | - Martin A Hoffmann
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany
- International Max Planck Research School 'Exploration of Ecological Interactions with Molecular and Chemical Techniques', Max Planck Institute for Chemical Ecology, Jena, Germany
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
- Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
| | - William H Gerwick
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Juho Rousu
- Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, Espoo, Finland
| | - Pieter C Dorrestein
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller University, Jena, Germany.
| |
Collapse
|
6
|
Abstract
Supertree methods merge a set of overlapping phylogenetic trees into a supertree containing all taxa of the input trees. The challenge in supertree reconstruction is the way of dealing with conflicting information in the input trees. Many different algorithms for different objective functions have been suggested to resolve these conflicts. In particular, there exist methods based on encoding the source trees in a matrix, where the supertree is constructed applying a local search heuristic to optimize the respective objective function. We present a novel heuristic supertree algorithm called Bad Clade Deletion (BCD) supertrees. It uses minimum cuts to delete a locally minimal number of columns from such a matrix representation so that it is compatible. This is the complement problem to Matrix Representation with Compatibility (Maximum Split Fit). Our algorithm has guaranteed polynomial worst-case running time and performs swiftly in practice. Different from local search heuristics, it guarantees to return the directed perfect phylogeny for the input matrix, corresponding to the parent tree of the input trees, if one exists. Comparing supertrees to model trees for simulated data, BCD shows a better accuracy (F1 score) than the state-of-the-art algorithms SuperFine (up to 3%) and Matrix Representation with Parsimony (up to 7%); at the same time, BCD is up to 7 times faster than SuperFine, and up to 600 times faster than Matrix Representation with Parsimony. Finally, using the BCD supertree as a starting tree for a combined Maximum Likelihood analysis using RAxML, we reach significantly improved accuracy (1% higher F1 score) and running time (1.7-fold speedup).
Collapse
Affiliation(s)
- Markus Fleischauer
- Chair for Bioinformatics, Institute for Computer Science, Friedrich-Schiller-University Jena, Jena, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Institute for Computer Science, Friedrich-Schiller-University Jena, Jena, Germany
| |
Collapse
|
7
|
Abstract
Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well-known Matrix Representation with Parsimony, while others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the GSCM supertree. We find this modifications to increase the number of true positive clades by 18% compared to the currently used Overlap scoring.
Collapse
Affiliation(s)
- Markus Fleischauer
- Lehrstuhl für Bioinformatik, Friedrich-Schiller Universität , Jena , Thüringen , Germany
| | - Sebastian Böcker
- Lehrstuhl für Bioinformatik, Friedrich-Schiller Universität , Jena , Thüringen , Germany
| |
Collapse
|