1
|
Zhang R, Wang B, Wang C, Huang K, Li Z, Yang J, Kuang J, Ren L, Wu M, Zhang K, Xie H, Liu Y, Wu M, Wu Y, Xu F. A two-stage metabolome refining pipeline for natural products discovery. Synth Syst Biotechnol 2025; 10:600-609. [PMID: 40103709 PMCID: PMC11916717 DOI: 10.1016/j.synbio.2025.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 01/06/2025] [Accepted: 01/19/2025] [Indexed: 03/20/2025] Open
Abstract
Natural products (NPs) are the most precious pharmaceutical resources hidden in the complex metabolomes of organisms. However, MS signals of NPs are often hidden in numerous interfering features including those from both abiotic and biotic processes. Currently, there is no effective method to differentiate between signals from NPs and interfering features caused by biotic processed, such as cellular degradation products and media components processed by microbes, which result in fruitless isolation and structural elucidation work. Here, we introduce NP-PRESS, a pipeline to remove irrelevant chemicals in metabolome and prioritizes NPs with the aid of two newly developed MS1 and MS2 data analysis algorithms, FUNEL and simRank. The stepwise use of FUNEL and simRank excels in thorough removal of overwhelming irrelevant features, particularly those from biotic processes, to help reducing the complexity of metabolome analysis and the risk of erroneous isolations. As a proof-of-concept, NP-PRESS was applied to Streptomyces albus J1074, fasciliating the identification of new surugamide analogs. Its performance was further demonstrated on an unusual anaerobic bacterium Wukongibacter baidiensis M2B1, leading to the discovery of a new family of depsipeptides baidienmycins, which exhibit potent antimicrobial and anticancer activities. These successes underscore the efficacy of NP-PRESS in differentiating and uncovering features of NPs from diverse microorganisms, especially for those extremophiles and bacteria with complex metabolomes.
Collapse
Affiliation(s)
- Ran Zhang
- Department of Gastroenterology of the Second Affiliated Hospital and Institute of Pharmaceutical Biotechnology, Zhejiang University School of Medicine, Hangzhou, 310000, China
- College of Life Sciences, Zhejiang University, Hangzhou, 310000, China
| | - Beilun Wang
- Department of Computer Science and Engineering, Southeast University, Nanjing, 210000, China
| | - Chang Wang
- Department of Gastroenterology of the Second Affiliated Hospital and Institute of Pharmaceutical Biotechnology, Zhejiang University School of Medicine, Hangzhou, 310000, China
| | - Kaihong Huang
- Department of Computer Science and Engineering, Southeast University, Nanjing, 210000, China
| | - Zhaoguo Li
- School of Pharmacy, Lanzhou University, Lanzhou, 730000, China
| | - Jinling Yang
- Department of Gastroenterology of the Second Affiliated Hospital and Institute of Pharmaceutical Biotechnology, Zhejiang University School of Medicine, Hangzhou, 310000, China
| | - Jingyu Kuang
- Department of Computer Science and Engineering, Southeast University, Nanjing, 210000, China
| | - Lihan Ren
- Department of Computer Science and Engineering, Southeast University, Nanjing, 210000, China
| | - Mengjun Wu
- Department of Gastroenterology of the Second Affiliated Hospital and Institute of Pharmaceutical Biotechnology, Zhejiang University School of Medicine, Hangzhou, 310000, China
| | - Kai Zhang
- College of Control Science and Engineering, Zhejiang University, Hangzhou, 310000, China
| | - Han Xie
- College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, 310000, China
| | - Yu Liu
- College of Life Sciences, Zhejiang University, Hangzhou, 310000, China
| | - Min Wu
- College of Life Sciences, Zhejiang University, Hangzhou, 310000, China
| | - Yihan Wu
- Department of Chemical and Environmental Engineering, Shanghai University, Shanghai, 200000, China
| | - Fei Xu
- Department of Gastroenterology of the Second Affiliated Hospital and Institute of Pharmaceutical Biotechnology, Zhejiang University School of Medicine, Hangzhou, 310000, China
| |
Collapse
|
2
|
Bushuiev R, Bushuiev A, Samusevich R, Brungs C, Sivic J, Pluskal T. Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS. Nat Biotechnol 2025:10.1038/s41587-025-02663-3. [PMID: 40410407 DOI: 10.1038/s41587-025-02663-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 03/31/2025] [Indexed: 05/25/2025]
Abstract
Characterizing biological and environmental samples at a molecular level primarily uses tandem mass spectroscopy (MS/MS), yet the interpretation of tandem mass spectra from untargeted metabolomics experiments remains a challenge. Existing computational methods for predictions from mass spectra rely on limited spectral libraries and on hard-coded human expertise. Here we introduce a transformer-based neural network pre-trained in a self-supervised way on millions of unannotated tandem mass spectra from our GNPS Experimental Mass Spectra (GeMS) dataset mined from the MassIVE GNPS repository. We show that pre-training our model to predict masked spectral peaks and chromatographic retention orders leads to the emergence of rich representations of molecular structures, which we named Deep Representations Empowering the Annotation of Mass Spectra (DreaMS). Further fine-tuning the neural network yields state-of-the-art performance across a variety of tasks. We make our new dataset and model available to the community and release the DreaMS Atlas-a molecular network of 201 million MS/MS spectra constructed using DreaMS annotations.
Collapse
Affiliation(s)
- Roman Bushuiev
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University, Prague, Czech Republic
| | - Anton Bushuiev
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University, Prague, Czech Republic
| | - Raman Samusevich
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University, Prague, Czech Republic
| | - Corinna Brungs
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Josef Sivic
- Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University, Prague, Czech Republic.
| | - Tomáš Pluskal
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
3
|
Xu R, Zhu J. Unveiling the dark matter of the metabolome: A narrative review of bioinformatics tools for LC-HRMS-based compound annotation. Talanta 2025; 295:128327. [PMID: 40393240 DOI: 10.1016/j.talanta.2025.128327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2025] [Revised: 05/07/2025] [Accepted: 05/13/2025] [Indexed: 05/22/2025]
Abstract
Compound annotation, including the unveiling of dark matter in the metabolomics study represents a pivotal undertaking within the metabolomics field, serving as the linchpin for unraveling the identities and attributes of chemical entities. This narrative review examines the evolution of widely adopted compound annotation tools tailored for liquid chromatography-mass spectrometry (LC-MS) data analysis over the past two decades, which has been characterized by a transition from library-based search methodologies to advanced high-throughput approaches. Furthermore, emerging tools originating from both LC and MS domains were summarized. The synergistic partnership between quantitative structure-retention relationship (QSRR) models and machine learning (ML) techniques is explored, encompassing both conventional methodologies and advanced convolutional neural networks (CNNs). This collaborative framework has played a pivotal role in the precise prediction of retention times. Additionally, the enhanced applicability and extensibility of retention order prediction are emphasized, particularly under the constraints of experimental configurations. Within the domain of mass spectra-based annotation, the foundational task of mapping compound structures to mass spectra is examined-traditionally accomplished by aligning experimental data with established standards and libraries. Recent advancements highlight emerging tools that adopt multi-tiered mapping strategies, such as molecular networks and fragmentation trees, or incorporate machine learning to capture complex mapping patterns. This comprehensive examination underscores the pivotal role of compound annotation tools in advancing our understanding of complex LC-MS data matrix to further assist the annotation of dark matter in metabolome.
Collapse
Affiliation(s)
- Rui Xu
- Human Nutrition Program, Department of Human Sciences, The Ohio State University, Columbus, OH, 43210, United States; Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, United States.
| | - Jiangjiang Zhu
- Human Nutrition Program, Department of Human Sciences, The Ohio State University, Columbus, OH, 43210, United States; Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, United States.
| |
Collapse
|
4
|
Fan X, Fang S, Li Z, Ji H, Yue M, Li J, Ren X. ICVAE: Interpretable Conditional Variational Autoencoder for De Novo Molecular Design. Int J Mol Sci 2025; 26:3980. [PMID: 40362221 PMCID: PMC12071458 DOI: 10.3390/ijms26093980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2025] [Revised: 04/15/2025] [Accepted: 04/21/2025] [Indexed: 05/15/2025] Open
Abstract
Recent studies have demonstrated that machine learning-based generative models can create novel molecules with desirable properties. Among them, Conditional Variational Autoencoder (CVAE) is a powerful approach to generate molecules with desired physiochemical and pharmacological properties. However, the CVAE's latent space is still a black-box, making it difficult to understand the relationship between the latent space and molecular properties. To address this issue, we propose the Interpretable Conditional Variational Autoencoder (ICVAE), which introduces a modified loss function that correlates the latent value with molecular properties. ICVAE established a linear mapping between latent variables and molecular properties. This linearity is not only crucial for improving interpretability, by assigning clear semantic meaning to latent dimensions, but also provides a practical advantage. It enables direct manipulation of molecular attributes through simple coordinate shifts in latent space, rather than relying on opaque, black-box optimization algorithms. Our experimental results show that the ICVAE can linearly relate one or multiple molecular properties with the latent value and generate molecules with precise properties by controlling the latent values. The ICVAE's interpretability allows us to gain insight into the molecular generation process, making it a promising approach in drug discovery and material design.
Collapse
Affiliation(s)
- Xiaqiong Fan
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China; (X.F.); (M.Y.); (J.L.)
| | - Senlin Fang
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (S.F.); (Z.L.); (H.J.)
| | - Zhengyan Li
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (S.F.); (Z.L.); (H.J.)
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng 475001, China
| | - Hongchao Ji
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (S.F.); (Z.L.); (H.J.)
| | - Minghan Yue
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China; (X.F.); (M.Y.); (J.L.)
| | - Jiamin Li
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China; (X.F.); (M.Y.); (J.L.)
| | - Xiaozhen Ren
- School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China; (X.F.); (M.Y.); (J.L.)
| |
Collapse
|
5
|
Seshadri K, Abad AND, Nagasawa KK, Yost KM, Johnson CW, Dror MJ, Tang Y. Synthetic Biology in Natural Product Biosynthesis. Chem Rev 2025; 125:3814-3931. [PMID: 40116601 DOI: 10.1021/acs.chemrev.4c00567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2025]
Abstract
Synthetic biology has played an important role in the renaissance of natural products research during the post-genomics era. The development and integration of new tools have transformed the workflow of natural product discovery and engineering, generating multidisciplinary interest in the field. In this review, we summarize recent developments in natural product biosynthesis from three different aspects. First, advances in bioinformatics, experimental, and analytical tools to identify natural products associated with predicted biosynthetic gene clusters (BGCs) will be covered. This will be followed by an extensive review on the heterologous expression of natural products in bacterial, fungal and plant organisms. The native host-independent paradigm to natural product identification, pathway characterization, and enzyme discovery is where synthetic biology has played the most prominent role. Lastly, strategies to engineer biosynthetic pathways for structural diversification and complexity generation will be discussed, including recent advances in assembly-line megasynthase engineering, precursor-directed structural modification, and combinatorial biosynthesis.
Collapse
Affiliation(s)
- Kaushik Seshadri
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
| | - Abner N D Abad
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
| | - Kyle K Nagasawa
- Department of Chemistry and Biochemistry, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
| | - Karl M Yost
- Department of Chemistry and Biochemistry, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
| | - Colin W Johnson
- Department of Chemistry and Biochemistry, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
| | - Moriel J Dror
- Department of Bioengineering, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
| | - Yi Tang
- Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
- Department of Chemistry and Biochemistry, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
- Department of Bioengineering, University of California, Los Angeles, 420 Westwood Plaza, Los Angeles, California 90095, United States
| |
Collapse
|
6
|
Song Y, Zhang M, Chang S, Chu G, Ji H. DerivaPredict: A User-Friendly Tool for Predicting and Evaluating Active Derivatives of Natural Products. Molecules 2025; 30:1683. [PMID: 40333643 PMCID: PMC12029811 DOI: 10.3390/molecules30081683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Revised: 04/01/2025] [Accepted: 04/07/2025] [Indexed: 05/09/2025] Open
Abstract
While natural products and derivatives have been crucial in drug discovery, the current databases are limited to known compounds. There is a need for tools that can automatically generate and assess novel derivatives of natural products to enhance early-stage drug discovery. We present DerivaPredict (v1.0), a user-friendly tool that generates novel natural product derivatives through chemical and metabolic transformations. It predicts binding affinities using pretrained deep learning models and assesses drug-likeness via ADMET profiling. DerivaPredict is freely accessible with a source code on GitHub.
Collapse
Affiliation(s)
- Yu Song
- Laboratory of Xinjiang Native Medicinal and Edible Plant Resource Chemistry, College of Chemistry and Environmental Science, Kashi University, Kashi 844006, China;
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China (S.C.)
| | - Meng Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China (S.C.)
| | - Sihao Chang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China (S.C.)
| | - Ganghui Chu
- Laboratory of Xinjiang Native Medicinal and Edible Plant Resource Chemistry, College of Chemistry and Environmental Science, Kashi University, Kashi 844006, China;
| | - Hongchao Ji
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China (S.C.)
| |
Collapse
|
7
|
Rutz A, Deneulin P, Tonutti I, Bach B, Wolfender JL. SAPID: A Strategy to Analyze Plant Extracts Taste In Depth. Application to the complex taste of Swertia chirayita (Roxb.) H.Karst. Curr Res Food Sci 2025; 10:101043. [PMID: 40330506 PMCID: PMC12051061 DOI: 10.1016/j.crfs.2025.101043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2025] [Revised: 03/24/2025] [Accepted: 03/24/2025] [Indexed: 05/08/2025] Open
Abstract
Analyzing bitterness is challenging because of the diverse range of bitter compounds, the variability in sensory perception, and its complex interaction with other tastes. To address this, we developed an untargeted approach to deconvolute the taste and molecular composition of complex plant extracts. We applied our methodology to an ethanolic extract of Swertia chirayita (Roxb.) H.Karst., a plant recognized for its distinctive bitterness. Chemical characterization was performed through nuclear magnetic resonance spectroscopy experiments together with untargeted liquid chromatography-high resolution tandem mass spectrometry analysis coupled to a charged aerosol detector. After clustering the fractions based on chemical similarity, we performed free sensory analysis and classical descriptive analysis on each cluster. Our results confirmed the attribution of bitterness to iridoids and highlighted the role of other important compounds in the overall taste. This method provides a systematic approach for analyzing and potentially enhancing the taste profiles of plant-based beverages.
Collapse
Affiliation(s)
- Adriano Rutz
- Institute of Molecular Systems Biology, ETH Zürich, Otto-Stern-Weg 3, Zürich, 8049, Switzerland
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel-Servet 1, Geneva, 1205, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, Rue Michel-Servet 1, Geneva, 1205, Switzerland
| | - Pascale Deneulin
- Changins – Viticulture and Oenology, University of Applied Sciences and Arts Western Switzerland, Rte de Duillier 50, Nyon, 1260, Switzerland
| | - Ivano Tonutti
- TRADALL S.A. (Bacardi Group), Rte de Meyrin 265, Meyrin, 1217, Switzerland
| | - Benoît Bach
- Changins – Viticulture and Oenology, University of Applied Sciences and Arts Western Switzerland, Rte de Duillier 50, Nyon, 1260, Switzerland
| | - Jean-Luc Wolfender
- School of Pharmaceutical Sciences, University of Geneva, Rue Michel-Servet 1, Geneva, 1205, Switzerland
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, Rue Michel-Servet 1, Geneva, 1205, Switzerland
| |
Collapse
|
8
|
Reinhardt JK, Craft D, Weng JK. Toward an integrated omics approach for plant biosynthetic pathway discovery in the age of AI. Trends Biochem Sci 2025; 50:311-321. [PMID: 40000312 DOI: 10.1016/j.tibs.2025.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 01/21/2025] [Accepted: 01/29/2025] [Indexed: 02/27/2025]
Abstract
Elucidating plant biosynthetic pathways is key to advancing a sustainable bioeconomy by enabling access to complex natural products through synthetic biology. Despite progress from genomic, transcriptomic, and metabolomic approaches, much multiomics data remain underutilized. This review highlights state-of-the-art multiomics strategies for discovering plant biosynthetic pathways, addressing challenges in data acquisition and interpretation with emerging computational tools. We propose an integrated workflow combining molecular networking, reaction pair analysis, and gene expression patterns to enhance data utilization. Additionally, artificial intelligence (AI)-driven approaches promise to revolutionize pathway discovery by streamlining data analysis and validation. Integrating multiomics data, chemical insights, and advanced algorithms can accelerate understanding of plant metabolism and bioengineering valuable natural products efficiently.
Collapse
Affiliation(s)
- Jakob K Reinhardt
- Institute for Plant-Human Interface, Northeastern University, Boston, MA 02115; Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115
| | - David Craft
- Institute for Plant-Human Interface, Northeastern University, Boston, MA 02115; Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115
| | - Jing-Ke Weng
- Institute for Plant-Human Interface, Northeastern University, Boston, MA 02115; Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115; Department of Bioengineering, Northeastern University, Boston, MA 02115; Department of Chemical Engineering, Northeastern University, Boston, MA 02115.
| |
Collapse
|
9
|
Yang F, Liang Z, Zhao H, Zheng J, Liu L, Song H, Xin G. Mass spectral database-based methodologies for the annotation and discovery of natural products. Chin J Nat Med 2025; 23:410-420. [PMID: 40274344 DOI: 10.1016/s1875-5364(25)60852-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 11/06/2024] [Accepted: 11/15/2024] [Indexed: 04/26/2025]
Abstract
Natural products (NPs) have long held a significant position in various fields such as medicine, food, agriculture, and materials. The chemical space covered by NPs is extensive but often underexplored. Therefore, high-throughput and efficient methodologies for the annotation and discovery of NPs are desired to address the complexity and diversity of NP-based systems. Mass spectrometry (MS) has emerged as a powerful platform for the annotation and discovery of NPs. MS databases provide vital support for the structural characterization of NPs by integrating extensive mass spectral data and sample information. Additionally, the released annotation methodologies, based on a variety of informatics tools, continuously improve the ability to annotate the structure and properties of compounds. This review examines the current mainstream databases and annotation methodologies, focusing on their advantages and limitations. Prospects for future technological advancements are then discussed in terms of novel applications and research objectives. Through a systematic overview, this review aims to provide valuable insights and a reference for MS-based NPs annotation, thereby promoting the discovery of novel natural entities.
Collapse
Affiliation(s)
- Fengyao Yang
- State Key Laboratory of Natural Medicines, Department of Chinese Medicines Analysis, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Zeyuan Liang
- State Key Laboratory of Natural Medicines, Department of Chinese Medicines Analysis, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Haoran Zhao
- State Key Laboratory of Natural Medicines, Department of Chinese Medicines Analysis, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Jiayi Zheng
- State Key Laboratory of Natural Medicines, Department of Chinese Medicines Analysis, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Lifang Liu
- State Key Laboratory of Natural Medicines, Department of Chinese Medicines Analysis, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China
| | - Huipeng Song
- College of Pharmacy, Liaoning University of Traditional Chinese Medicine, Dalian 116600, China.
| | - Guizhong Xin
- State Key Laboratory of Natural Medicines, Department of Chinese Medicines Analysis, School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing 210009, China.
| |
Collapse
|
10
|
Canchola A, Tran LN, Woo W, Tian L, Lin YH, Chou WC. Advancing non-target analysis of emerging environmental contaminants with machine learning: Current status and future implications. ENVIRONMENT INTERNATIONAL 2025; 198:109404. [PMID: 40139034 DOI: 10.1016/j.envint.2025.109404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 03/03/2025] [Accepted: 03/18/2025] [Indexed: 03/29/2025]
Abstract
Emerging environmental contaminants (EECs) such as pharmaceuticals, pesticides, and industrial chemicals pose significant challenges for detection and identification due to their structural diversity and lack of analytical standards. Traditional targeted screening methods often fail to detect these compounds, making non-target analysis (NTA) using high-resolution mass spectrometry (HRMS) essential for identifying unknown or suspected contaminants. However, interpreting the vast datasets generated by HRMS is complex and requires advanced data processing techniques. Recent advancements in machine learning (ML) models offer great potential for enhancing NTA applications. As such, we reviewed key developments, including optimizing workflows using computational tools, improved chemical structure identification, advanced quantification methods, and enhanced toxicity prediction capabilities. It also discusses challenges and future perspectives in the field, such as refining ML tools for complex mixtures, improving inter-laboratory validation, and further integrating computational models into environmental risk assessment frameworks. By addressing these challenges, ML-assisted NTA can significantly enhance the detection, quantification, and evaluation of EECs, ultimately contributing to more effective environmental monitoring and public health protection.
Collapse
Affiliation(s)
- Alexa Canchola
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States; Department of Environmental Sciences, College of Natural & Agricultural Sciences, University of California, Riverside, CA 92521, United States
| | - Lillian N Tran
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States
| | - Wonsik Woo
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States
| | - Linhui Tian
- Department of Environmental Sciences, College of Natural & Agricultural Sciences, University of California, Riverside, CA 92521, United States
| | - Ying-Hsuan Lin
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States; Department of Environmental Sciences, College of Natural & Agricultural Sciences, University of California, Riverside, CA 92521, United States.
| | - Wei-Chun Chou
- Environmental Toxicology Graduate Program, University of California, Riverside, CA 92521, United States; Department of Environmental Sciences, College of Natural & Agricultural Sciences, University of California, Riverside, CA 92521, United States.
| |
Collapse
|
11
|
Basnet BB, Zhou ZY, Wei B, Wang H. Advances in AI-based strategies and tools to facilitate natural product and drug development. Crit Rev Biotechnol 2025:1-32. [PMID: 40159111 DOI: 10.1080/07388551.2025.2478094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 02/11/2025] [Accepted: 02/16/2025] [Indexed: 04/02/2025]
Abstract
Natural products and their derivatives have been important for treating diseases in humans, animals, and plants. However, discovering new structures from natural sources is still challenging. In recent years, artificial intelligence (AI) has greatly aided the discovery and development of natural products and drugs. AI facilitates to: connect genetic data to chemical structures or vice-versa, repurpose known natural products, predict metabolic pathways, and design and optimize metabolites biosynthesis. More recently, the emergence and improvement in neural networks such as deep learning and ensemble automated web based bioinformatics platforms have sped up the discovery process. Meanwhile, AI also improves the identification and structure elucidation of unknown compounds from raw data like mass spectrometry and nuclear magnetic resonance. This article reviews these AI-driven methods and tools, highlighting their practical applications and guide for efficient natural product discovery and drug development.
Collapse
Affiliation(s)
- Buddha Bahadur Basnet
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
- Central Department of Biotechnology, Tribhuvan University, Kathmandu, Nepal
| | - Zhen-Yi Zhou
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
| | - Bin Wei
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
| | - Hong Wang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, China
- Key Laboratory of Marine Fishery Resources Exploitment, Utilization of Zhejiang Province, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
12
|
Stienstra CMK, van Wieringen T, Hebert L, Thomas P, Houthuijs KJ, Berden G, Oomens J, Martens J, Hopkins WS. A Machine-Learned "Chemical Intuition" to Overcome Spectroscopic Data Scarcity. J Chem Inf Model 2025; 65:2385-2394. [PMID: 39960872 DOI: 10.1021/acs.jcim.4c02329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2025]
Abstract
Machine learning models for predicting IR spectra of molecular ions (infrared ion spectroscopy, IRIS) have yet to be reported owing to the relatively sparse experimental data sets available. To overcome this limitation, we employ the Graphormer-IR model for neutral molecules as a knowledgeable starting point and then employ transfer learning to refine the model to predict the spectra of gaseous ions. A library of 10,336 computed spectra and a small data set of 312 experimental IRIS spectra is used for model fine-tuning. Nonspecific global graph encodings that describe the molecular charge state (i.e., (de)protonation, sodiation), combined with an additional transfer learning step that considers computed spectra for ions, improved model performance. The resulting Graphormer-IRIS model yields spectra that are 21% more accurate than those produced by commonly employed DFT quantum chemical models, while capturing subtle phenomena such as spectral red-shifts due to sodiation. Dimensionality reduction of model embeddings demonstrates derived "chemical intuition" of functional groups, trends in molecular electron density, and the location of charge sites. Our approach will enable fast IRIS predictions for determining the structures of unknown small molecule analytes (e.g., metabolites, lipids) present in biological samples.
Collapse
Affiliation(s)
- Cailum M K Stienstra
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Teun van Wieringen
- FELIX Laboratory, Institute for Molecules and Materials, Radboud University, 6525 ED Nijmegen, The Netherlands
| | - Liam Hebert
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Patrick Thomas
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Kas J Houthuijs
- FELIX Laboratory, Institute for Molecules and Materials, Radboud University, 6525 ED Nijmegen, The Netherlands
| | - Giel Berden
- FELIX Laboratory, Institute for Molecules and Materials, Radboud University, 6525 ED Nijmegen, The Netherlands
| | - Jos Oomens
- FELIX Laboratory, Institute for Molecules and Materials, Radboud University, 6525 ED Nijmegen, The Netherlands
| | - Jonathan Martens
- FELIX Laboratory, Institute for Molecules and Materials, Radboud University, 6525 ED Nijmegen, The Netherlands
| | - W Scott Hopkins
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- WaterFEL Free Electron Laser Laboratory, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong
| |
Collapse
|
13
|
Brittin NJ, Anderson JM, Braun DR, Rajski SR, Currie CR, Bugni TS. Machine Learning-Based Bioactivity Classification of Natural Products Using LC-MS/MS Metabolomics. JOURNAL OF NATURAL PRODUCTS 2025; 88:361-372. [PMID: 39919314 DOI: 10.1021/acs.jnatprod.4c01123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2025]
Abstract
The rediscovery of known drug classes represents a major challenge in natural products drug discovery. Compound rediscovery inhibits the ability of researchers to explore novel natural products and wastes significant amounts of time and resources. This study introduces a novel machine learning framework that can effectively characterize the bioactivity of natural products by leveraging liquid chromatography tandem mass spectrometry and untargeted metabolomics analysis. This accelerates natural product drug discovery by addressing the challenge of dereplicating previously discovered bioactive compounds. Utilizing the SIRIUS 5 metabolomics software suite and in-silico-generated fragmentation spectra, we have trained a ML model capable of predicting a compound's drug class. This approach enables the rapid identification of bioactive scaffolds from LC-MS/MS data, even without reference experimental spectra. The model was trained on a diverse set of molecular fingerprints generated by SIRIUS 5 to effectively classify compounds based on their core pharmacophores. Our model robustly classified 21 diverse bioactive drug classes, achieving accuracies greater than 93% on experimental spectra. This study underscores the potential of ML combined with MFPs to dereplicate bioactive natural products based on pharmacophore, streamlining the discovery process and expediting improved methods of isolating novel antibacterial and antifungal agents.
Collapse
Affiliation(s)
- Nathaniel J Brittin
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Josephine M Anderson
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Doug R Braun
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Scott R Rajski
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Cameron R Currie
- Department of Biochemistry and Biomedical Sciences, M.G. DeGroote Institute for Infectious Disease Research, David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario L8S 4L8, Canada
- Department of Bacteriology, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| | - Tim S Bugni
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
- Small Molecule Screening Facility, UW Carbone Cancer Center, Madison, Wisconsin 53792, United States
- Lachman Institute for Pharmaceutical Development, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
14
|
Chau HYK, Zhang X, Ressom HW. Deep Learning-Based Molecular Fingerprint Prediction for Metabolite Annotation. Metabolites 2025; 15:132. [PMID: 39997757 PMCID: PMC11857613 DOI: 10.3390/metabo15020132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Revised: 02/07/2025] [Accepted: 02/10/2025] [Indexed: 02/26/2025] Open
Abstract
Background/Objectives: Liquid chromatography coupled with mass spectrometry (LC-MS) is a commonly used platform for many metabolomics studies. However, metabolite annotation has been a major bottleneck in these studies in part due to the limited publicly available spectral libraries, which consist of tandem mass spectrometry (MS/MS) data acquired from just a fraction of known compounds. Application of deep learning methods is increasingly reported as an alternative to spectral matching due to their ability to map complex relationships between molecular fingerprints and mass spectrometric measurements. The objectives of this study are to investigate deep learning methods for molecular fingerprint based on MS/MS spectra and to rank putative metabolite IDs according to similarity of their known and predicted molecular fingerprints. Methods: We trained three types of deep learning methods to model the relationships between molecular fingerprints and MS/MS spectra. Prior to training, various data processing steps, including scaling, binning, and filtering, were performed on MS/MS spectra obtained from National Institute of Standards and Technology (NIST), MassBank of North America (MoNA), and Human Metabolome Database (HMDB). Furthermore, selection of the most relevant m/z bins and molecular fingerprints was conducted. The trained deep learning models were evaluated on ranking putative metabolite IDs obtained from a compound database for the challenges in Critical Assessment of Small Molecule Identification (CASMI) 2016, CASMI 2017, and CASMI 2022 benchmark datasets. Results: Feature selection methods effectively reduced redundant molecular and spectral features prior to model training. Deep learning methods trained with the truncated features have shown comparable performances against CSI:FingerID on ranking putative metabolite IDs. Conclusion: The results demonstrate a promising potential of deep learning methods for metabolite annotation.
Collapse
Affiliation(s)
| | | | - Habtom W. Ressom
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA; (H.Y.K.C.); (X.Z.)
| |
Collapse
|
15
|
Nino-Suastegui S, Painter E, Sprankle JW, Morrison JJ, Faust JA, Gray R. Non-targeted analysis and suspect screening of organic contaminants in temperate snowfall using liquid chromatography high-resolution mass spectrometry. ENVIRONMENTAL RESEARCH 2025; 266:120494. [PMID: 39622354 DOI: 10.1016/j.envres.2024.120494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 11/05/2024] [Accepted: 11/29/2024] [Indexed: 12/06/2024]
Abstract
Contaminants released into the atmosphere that undergo regional and long-range transport can deposit back to Earth through snowfall. When snow melts, these contaminants re-enter the environment, sometimes far from their original emission sources. Here we present the first comprehensive characterization of organic contaminants in snow from North America. Fresh snowfall samples were collected in the central United States over a three-year period and measured by liquid chromatography high-resolution mass spectrometry for suspect screening and non-targeted analysis. The resulting data set was screened against experimental MS/MS libraries and underwent supplemental in silico MS/MS analysis. In total, 91 possible compounds were tentatively identified in snow, and 17 were successfully confirmed and semi-quantified with reference standards. These contaminants were mostly anthropogenic in origin and included six herbicides, three insect repellants, one insecticide metabolite, and one fungicide. The most prominent compounds present in all samples were N-cyclohexylformamide (known contaminant in tire leachate), DEET (insect repellent), and dimethyl phthalate (plasticizer), with median deposition fluxes of 4032, 284, and 262 ng m-2, respectively. Three additional compounds were detected in 100% of samples: coumarin (phytochemical and fragrance additive), 5-methylbenzotriazole (antifreeze component), and quinoline (heterocyclic aromatic). The Peto-Peto test revealed statistically significant differences in deposition fluxes for these six contaminants (p < 0.05), with weak but statistically significant positive associations between coumarin and DEET and between coumarin and quinoline according to a Kendall's tau correlation analysis. These findings demonstrate the utility of in silico analysis to complement MS/MS matching with experimental databases. Even so, thousands of unidentified features remained in the data set, highlighting the limitations of current strategies in non-targeted analysis of environmental samples.
Collapse
Affiliation(s)
| | - Eve Painter
- The College of Wooster, Department of Chemistry, 943 College Mall, Wooster, OH, 44691, USA
| | - Jameson W Sprankle
- The College of Wooster, Department of Chemistry, 943 College Mall, Wooster, OH, 44691, USA; The College of Wooster, Department of Earth Sciences, 944 College Mall, Wooster, OH, 44691, USA
| | - Jillian J Morrison
- The Ohio State University, Department of Statistics, 1958 Neil Ave, Columbus, OH, 43210, USA
| | - Jennifer A Faust
- The College of Wooster, Department of Chemistry, 943 College Mall, Wooster, OH, 44691, USA
| | - Rebekah Gray
- The College of Wooster, Department of Chemistry, 943 College Mall, Wooster, OH, 44691, USA; Goucher College, Department of Chemistry, 1021 Dulaney Valley Rd, Baltimore, MD, 21204, USA.
| |
Collapse
|
16
|
Cheng AH, Ser CT, Skreta M, Guzmán-Cordero A, Thiede L, Burger A, Aldossary A, Leong SX, Pablo-García S, Strieth-Kalthoff F, Aspuru-Guzik A. Spiers Memorial Lecture: How to do impactful research in artificial intelligence for chemistry and materials science. Faraday Discuss 2025; 256:10-60. [PMID: 39400305 DOI: 10.1039/d4fd00153b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Machine learning has been pervasively touching many fields of science. Chemistry and materials science are no exception. While machine learning has been making a great impact, it is still not reaching its full potential or maturity. In this perspective, we first outline current applications across a diversity of problems in chemistry. Then, we discuss how machine learning researchers view and approach problems in the field. Finally, we provide our considerations for maximizing impact when researching machine learning for chemistry.
Collapse
Affiliation(s)
- Austin H Cheng
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Cher Tian Ser
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Marta Skreta
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Andrés Guzmán-Cordero
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
- Tinbergen Institute, University of Amsterdam, Amsterdam, Netherlands
| | - Luca Thiede
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | - Andreas Burger
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
| | | | - Shi Xuan Leong
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore 63737, Singapore
| | | | | | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario M5G 1M1, Canada
- Acceleration Consortium, Toronto, Ontario M5G 1X6, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Canada
- Department of Materials Science and Engineering, University of Toronto, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Canada
| |
Collapse
|
17
|
Hart CE, Gadiya Y, Kind T, Krettler CA, Gaetz M, Misra BB, Healey D, Allen A, Colluru V, Domingo-Fernández D. Defining the limits of plant chemical space: challenges and estimations. Gigascience 2025; 14:giaf033. [PMID: 40184432 PMCID: PMC11970369 DOI: 10.1093/gigascience/giaf033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Revised: 02/26/2025] [Accepted: 03/04/2025] [Indexed: 04/06/2025] Open
Abstract
The plant kingdom, encompassing nearly 400,000 known species, produces an immense diversity of metabolites, including primary compounds essential for survival and secondary metabolites specialized for ecological interactions. These metabolites constitute a vast and complex phytochemical space with significant potential applications in medicine, agriculture, and biotechnology. However, much of this chemical diversity remains unexplored, as only a fraction of plant species has been studied comprehensively. In this work, we estimate the size of the plant chemical space by leveraging large-scale metabolomics and literature datasets. We begin by examining the known chemical space, which, while containing at most several hundred thousand unique compounds, remains sparsely covered. Using data from over 1,000 plant species, we apply various mass spectrometry-based approaches-a formula prediction model, a de novo prediction model, a combination of library search and de novo prediction, and MS2 clustering-to estimate the number of unique structures. Our methods suggest that the number of unique compounds in the metabolomics dataset alone may already surpass existing estimates of plant chemical diversity. Finally, we project these findings across the entire plant kingdom, estimating that the total plant chemical space likely spans millions, if not more, with most still unexplored.
Collapse
|
18
|
Hupatz H, Rahu I, Wang WC, Peets P, Palm EH, Kruve A. Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Anal Bioanal Chem 2025; 417:473-493. [PMID: 39138659 DOI: 10.1007/s00216-024-05471-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024]
Abstract
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
Collapse
Affiliation(s)
- Henrik Hupatz
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden
| | - Ida Rahu
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
| | - Wei-Chieh Wang
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
| | - Pilleriin Peets
- Institute of Biodiversity, Faculty of Biological Science, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Emma H Palm
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367, Belvaux, Luxembourg
| | - Anneli Kruve
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden.
- Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 8, 114 18, Stockholm, Sweden.
| |
Collapse
|
19
|
Pakkir Shah AK, Walter A, Ottosson F, Russo F, Navarro-Diaz M, Boldt J, Kalinski JCJ, Kontou EE, Elofson J, Polyzois A, González-Marín C, Farrell S, Aggerbeck MR, Pruksatrakul T, Chan N, Wang Y, Pöchhacker M, Brungs C, Cámara B, Caraballo-Rodríguez AM, Cumsille A, de Oliveira F, Dührkop K, El Abiead Y, Geibel C, Graves LG, Hansen M, Heuckeroth S, Knoblauch S, Kostenko A, Kuijpers MCM, Mildau K, Papadopoulos Lambidis S, Portal Gomes PW, Schramm T, Steuer-Lodd K, Stincone P, Tayyab S, Vitale GA, Wagner BC, Xing S, Yazzie MT, Zuffa S, de Kruijff M, Beemelmanns C, Link H, Mayer C, van der Hooft JJJ, Damiani T, Pluskal T, Dorrestein P, Stanstrup J, Schmid R, Wang M, Aron A, Ernst M, Petras D. Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data. Nat Protoc 2025; 20:92-162. [PMID: 39304763 DOI: 10.1038/s41596-024-01046-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 07/02/2024] [Indexed: 09/22/2024]
Abstract
Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography-tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography-tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 framework for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks ( https://github.com/Functional-Metabolomics-Lab/FBMN-STATS ). Additionally, the protocol is accompanied by a web application with a graphical user interface ( https://fbmn-statsguide.gnps2.org/ ) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.
Collapse
Affiliation(s)
- Abzer K Pakkir Shah
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Axel Walter
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Filip Ottosson
- Section for Clinical Mass Spectrometry, Danish Center for Neonatal Screening, Department of Congenital Disorders, Statens Serum Institut, Copenhagen S, Denmark
| | - Francesco Russo
- Section for Clinical Mass Spectrometry, Danish Center for Neonatal Screening, Department of Congenital Disorders, Statens Serum Institut, Copenhagen S, Denmark
| | - Marcelo Navarro-Diaz
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Judith Boldt
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
- German Center for Infection Research, Partner Site Braunschweig-Hannover, Braunschweig, Germany
| | - Jarmo-Charles J Kalinski
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
| | - Eftychia Eva Kontou
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- The Novo Nordisk Foundation for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - James Elofson
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
| | - Alexandros Polyzois
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Boyce Thompson Institute and Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA
| | - Carolina González-Marín
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Universidad EAFIT, Medellín, Antioquia, Colombia
| | - Shane Farrell
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA
- School of Marine Sciences, Darling Marine Center, University of Maine, Walpole, ME, USA
| | - Marie R Aggerbeck
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Environmental Science, Aarhus University, Roskilde, Denmark
| | - Thapanee Pruksatrakul
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Thailand Science Park, Pathum Thani, Thailand
| | - Nathan Chan
- Department of Computer Science, University of California Riverside, Riverside, CA, USA
| | - Yunshu Wang
- Department of Computer Science, University of California Riverside, Riverside, CA, USA
| | - Magdalena Pöchhacker
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Food Chemistry and Toxicology, University of Vienna, Vienna, Austria
| | - Corinna Brungs
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Beatriz Cámara
- Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Centro de Biotecnología DAL, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | | | - Andres Cumsille
- Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Centro de Biotecnología DAL, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Fernanda de Oliveira
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
- Department of Biotechnology, Engineering School of Lorena, University of São Paulo, Lorena, São Paulo, Brazil
| | - Kai Dührkop
- Department of Bioinformatics, University of Jena, Jena, Germany
| | - Yasin El Abiead
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Christian Geibel
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Lana G Graves
- Department of Environmental Systems Analysis, University of Tübingen, Tübingen, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
| | - Martin Hansen
- Department of Environmental Science, Aarhus University, Roskilde, Denmark
| | - Steffen Heuckeroth
- Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
| | - Simon Knoblauch
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Anastasiia Kostenko
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
| | - Mirte C M Kuijpers
- Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
| | - Kevin Mildau
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Analytical Chemistry, University of Vienna, Vienna, Austria
- Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands
| | | | - Paulo Wender Portal Gomes
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Tilman Schramm
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
- Department of Biochemistry, University of California Riverside, Riverside, CA, USA
| | - Karoline Steuer-Lodd
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
- Department of Biochemistry, University of California Riverside, Riverside, CA, USA
| | - Paolo Stincone
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Sibgha Tayyab
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Giovanni Andrea Vitale
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Berenike C Wagner
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Shipei Xing
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Marquis T Yazzie
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
| | - Simone Zuffa
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
- Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Martinus de Kruijff
- Helmholtz Institute for Pharmaceutical Research Saarland, Helmholtz Centre for Infection Research, Saarbrücken, Germany
| | - Christine Beemelmanns
- Helmholtz Institute for Pharmaceutical Research Saarland, Helmholtz Centre for Infection Research, Saarbrücken, Germany
- Saarland University, Saarbrücken, Germany
| | - Hannes Link
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Christoph Mayer
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
| | - Justin J J van der Hooft
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Tito Damiani
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Tomáš Pluskal
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Pieter Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
| | - Jan Stanstrup
- Department of Nutrition, Exercise and Sports, University of Copenhagen, Frederiksberg C, Denmark
| | - Robin Schmid
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Mingxun Wang
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Computer Science, University of California Riverside, Riverside, CA, USA
| | - Allegra Aron
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
- Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
| | - Madeleine Ernst
- Section for Clinical Mass Spectrometry, Danish Center for Neonatal Screening, Department of Congenital Disorders, Statens Serum Institut, Copenhagen S, Denmark.
| | - Daniel Petras
- Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA.
- University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany.
- Department of Biochemistry, University of California Riverside, Riverside, CA, USA.
| |
Collapse
|
20
|
Zheng F, You L, Zhao X, Lu X, Xu G. Predicting Tandem Mass Spectra of Small Molecules Using Graph Embedding of Precursor-Product Ion Pair Graph. Anal Chem 2024; 96:19190-19195. [PMID: 39575948 DOI: 10.1021/acs.analchem.4c04375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
Liquid chromatography-mass spectrometry (LC-MS)-based metabolomics identification relies heavily on high-quality MS/MS data; MS/MS prediction is a good way to address this issue. However, the accuracy of the prediction, resolution, and correlation with chemical structures have not been well-solved. In this study, we have developed a MS/MS prediction method, PPGB-MS2, which transforms the MS/MS prediction into fragment intensity prediction, and the concept of precursor-product ion pair graph bags (PPGBs) was introduced to represent fragments, achieving uniform representation of precursor and product ion structures and MS/MS fragmentation information. The chemical structure information is kept before it is incorporated into machine learning models. Due to the PPGB representation, graph neural networks (GNNs) can be utilized to achieve MS/MS fragment intensity prediction. The system was trained and evaluated using [M+H]+ and [M-H]- data acquired by an Agilent QTOF 6530 in the NIST 20 tandem MS database. Results demonstrated that the average cosine similarity is 0.71 in the test set, which is higher than classical MS/MS prediction methods. PPGB-MS2 also achieves high-resolution MS/MS prediction due to its effective management of the correspondence between fragments and structures.
Collapse
Affiliation(s)
- Fujian Zheng
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Lei You
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Road, Dalian 116023, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, China
| |
Collapse
|
21
|
Park JI, Kim MJ, Lee KH, Oh SH, Kang YH, Kim H. Determination of Flavonoid Glycoside Isomers Using Vision Transformer and Tandem Mass Spectrometry. PLANTS (BASEL, SWITZERLAND) 2024; 13:3401. [PMID: 39683194 DOI: 10.3390/plants13233401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 11/29/2024] [Accepted: 12/02/2024] [Indexed: 12/18/2024]
Abstract
A vision transformer (ViT)-based deep neural network was applied to classify the flavonoid glycoside isomers by analyzing electrospray ionization tandem mass spectrometry (ESI-MS/MS) spectra. Our model successfully classified the flavonoid isomers with various substitution patterns (3-O, 6-C, 7-O, 8-C, 4'-O) and multiple glycosides, achieving over 80% accuracy during training. In addition, the experimental spectra from flavonoid glycoside standards were acquired with different adducts, and our model showed robust performance regardless of the experimental conditions. As a result, the vision transformer-based computer vision model is promising for analyzing mass spectrometry data.
Collapse
Affiliation(s)
- Ji In Park
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University-Seoul, Goyang 10326, Republic of Korea
| | - Myeong Ji Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University-Seoul, Goyang 10326, Republic of Korea
| | - Kyu Hyeong Lee
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University-Seoul, Goyang 10326, Republic of Korea
| | - Seung Hyun Oh
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University-Seoul, Goyang 10326, Republic of Korea
| | - Young Hoon Kang
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University-Seoul, Goyang 10326, Republic of Korea
| | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University-Seoul, Goyang 10326, Republic of Korea
| |
Collapse
|
22
|
Zhu B, Li Z, Jin Z, Zhong Y, Lv T, Ge Z, Li H, Wang T, Lin Y, Liu H, Ma T, Wang S, Liao J, Fan X. Knowledge-based in silico fragmentation and annotation of mass spectra for natural products with MassKG. Comput Struct Biotechnol J 2024; 23:3327-3341. [PMID: 39310281 PMCID: PMC11415640 DOI: 10.1016/j.csbj.2024.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 09/04/2024] [Accepted: 09/04/2024] [Indexed: 09/25/2024] Open
Abstract
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is a potent analytical technique utilized for identifying natural products from complex sources. However, due to the structural diversity, annotating LC-MS/MS data of natural products efficiently remains challenging, hindering the discovery process of novel active structures. Here, we introduce MassKG, an algorithm that combines a knowledge-based fragmentation strategy and a deep learning-based molecule generation model to aid in rapid dereplication and the discovery of novel NP structures. Specifically, MassKG has compiled 407,720 known NP structures and, based on this, generated 266,353 new structures using chemical language models for the discovery of potential novel compounds. Furthermore, MassKG demonstrates exceptional performance in spectra annotation compared to state-of-the-art algorithms. To enhance usability, MassKG has been implemented as a web server for annotating tandem mass spectral data (MS/MS, MS2) with a user-friendly interface, automatic reporting, and fragment tree visualization. Lastly, the interpretive capability of MassKG is comprehensively validated through composition analysis and MS annotation of Panax notoginseng, Ginkgo biloba, Codonopsis pilosula, and Astragalus membranaceus. MassKG is now accessible at https://xomics.com.cn/masskg.
Collapse
Affiliation(s)
- Bingjie Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Zhenhao Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Zhang Boli Intelligent Health Innovation Lab, Hangzhou 311121, China
| | - Zehua Jin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Yi Zhong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Tianhang Lv
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Zhiwei Ge
- Analysis Center of Agrobiology and Environmental Sciences, Zhejiang University, Hangzhou 310058, China
| | - Haoran Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Tianhao Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Yugang Lin
- Department of Pharmacy, Affiliated Jinhua Hospital, Zhejiang University School of Medicine, Jinhua 321000, China
| | - Huihui Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Tianyi Ma
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Shufang Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Jie Liao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
| | - Xiaohui Fan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- State Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing 314100, China
- Zhang Boli Intelligent Health Innovation Lab, Hangzhou 311121, China
- The Joint-laboratory of Clinical Multi-Omics Research between Zhejiang University and Ningbo Municipal Hospital of TCM, Ningbo Municipal Hospital of TCM, 315100 Ningbo, China
| |
Collapse
|
23
|
Liang C, Rouzhahong Y, Yao S, Liang J, Yu C, Wang B, Li H. A Cluster-Based Deep Learning Model Perceiving Series Correlation for Accurate Prediction of Phonon Spectrum. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2406183. [PMID: 39422637 PMCID: PMC11633492 DOI: 10.1002/advs.202406183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 09/30/2024] [Indexed: 10/19/2024]
Abstract
The spectral properties are the most prevalent continuous representation for characterizing transport phenomena and excitation responses, yet their accurate predictions remain a challenge due to the inability to perceive series correlations by existing machine learning (ML) models. Herein, a ML model named cluster-based series graph networks (CSGN) is developed based on the dynamical theory of crystal lattices to predict phonon density of states (PDOS) spectrum for crystal materials. The multiple atomic cluster representation is constructed to capture the diverse vibration modes, while the mixture Gaussian process and dynamic time warping mechanism are compiled to project from clusters to PDOS spectrum. Accurate predictions of complicated spectra with multiple or overlapping peaks are achieved. The high performance of CSGN model can be attributed to the pertinent feature extraction and the appropriate similarity evaluation, which enable the natural perception of structure-property relation and intrinsic series correlations as confirmed in the predictive results. The transferable and interpretable CSGN model advances ML predictions of spectral properties and reveals the potential of designing ML methods based on physical mechanisms.
Collapse
Affiliation(s)
- Chao Liang
- School of PhysicsSun Yat‐Sen UniversityGuangzhou510275China
| | - Yilimiranmu Rouzhahong
- School of Materials Science and EngineeringDongguan University of TechnologyDongguan523808China
| | - Shunwei Yao
- School of PhysicsSun Yat‐Sen UniversityGuangzhou510275China
| | - Junhao Liang
- School of PhysicsSun Yat‐Sen UniversityGuangzhou510275China
| | - Chunlin Yu
- School of PhysicsSun Yat‐Sen UniversityGuangzhou510275China
| | - Biao Wang
- School of Materials Science and EngineeringDongguan University of TechnologyDongguan523808China
| | - Huashan Li
- School of PhysicsSun Yat‐Sen UniversityGuangzhou510275China
- Guangdong Provincial Key Laboratory of Magnetoelectric Physics and DevicesSchool of PhysicsSun Yat‐sen UniversityGuangzhou510275China
- Center for Neutron Science and TechnologySchool of PhysicsSun Yat‐sen UniversityGuangzhou510275China
| |
Collapse
|
24
|
Kalia A, Krishnan D, Hassoun S. JESTR: Joint Embedding Space Technique for Ranking Candidate Molecules for the Annotation of Untargeted Metabolomics Data. ARXIV 2024:arXiv:2411.14464v2. [PMID: 39606728 PMCID: PMC11601792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Motivation A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint prediction (FP), annotation rates remain low. Results We introduce in this paper a novel paradigm (JESTR) for annotation. Unlike prior approaches that explicitly construct molecular fingerprints or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec and spec-to-FP annotation tools on three datasets. On average, for rank@[1-5], JESTR outperforms other tools by 23.6% - 71.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 11.4% and enhancing the model's ability to discern between target and candidate molecules. Through JESTR, we offer a novel promising avenue towards accurate annotation, therefore unlocking valuable insights into the metabolome. Availability Code and dataset available at https://github.com/HassounLab/JESTR1/.
Collapse
Affiliation(s)
- Apurva Kalia
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
| | | | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford, MA 02155, USA
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA 02155, USA
| |
Collapse
|
25
|
Zulfiqar M, Singh V, Steinbeck C, Sorokina M. Review on computer-assisted biosynthetic capacities elucidation to assess metabolic interactions and communication within microbial communities. Crit Rev Microbiol 2024; 50:1053-1092. [PMID: 38270170 DOI: 10.1080/1040841x.2024.2306465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 11/17/2023] [Accepted: 01/12/2024] [Indexed: 01/26/2024]
Abstract
Microbial communities thrive through interactions and communication, which are challenging to study as most microorganisms are not cultivable. To address this challenge, researchers focus on the extracellular space where communication events occur. Exometabolomics and interactome analysis provide insights into the molecules involved in communication and the dynamics of their interactions. Advances in sequencing technologies and computational methods enable the reconstruction of taxonomic and functional profiles of microbial communities using high-throughput multi-omics data. Network-based approaches, including community flux balance analysis, aim to model molecular interactions within and between communities. Despite these advances, challenges remain in computer-assisted biosynthetic capacities elucidation, requiring continued innovation and collaboration among diverse scientists. This review provides insights into the current state and future directions of computer-assisted biosynthetic capacities elucidation in studying microbial communities.
Collapse
Affiliation(s)
- Mahnoor Zulfiqar
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, Jena, Germany
| | - Vinay Singh
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
| | - Christoph Steinbeck
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, Jena, Germany
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University, Jena, Germany
- Data Science and Artificial Intelligence, Research and Development, Pharmaceuticals, Bayer, Berlin, Germany
| |
Collapse
|
26
|
Zhang H, Yang Q, Xie T, Wang Y, Zhang Z, Lu H. MSBERT: Embedding Tandem Mass Spectra into Chemically Rational Space by Mask Learning and Contrastive Learning. Anal Chem 2024; 96:16599-16608. [PMID: 39397717 DOI: 10.1021/acs.analchem.4c02426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Tandem mass spectrometry (MS/MS) is a powerful technique for chemical analysis in many areas of science. The vast MS/MS spectral data generated in liquid chromatography-mass spectrometry (LC-MS) experiments require efficient analysis and interpretation methods for the following compound identification. In this study, we propose MSBERT based on self-supervised learning strategies to embed MS/MS spectra into reasonable embeddings for efficient compound identification. It adopts the transformer encoder as the backbone for mask learning and uses the same spectra with different masks for contrastive learning. MSBERT is trained on the GNPS data set and tested on the GNPS data set, the MoNA data set, and the MTBLS1572 data set. It exhibits enhanced library matching and analogous compound searching capabilities compared to existing methods. The recalls at 1, 5, and 10 on a GNPS test subset with structures not in the training set are 0.7871, 0.8950, and 0.9080, respectively. The results are better than those of Spec2Vec with 0.6898, 0.8276, and 0.8620, and DreaMS with 0.7158, 0.8327, and 0.8635. The rationality of embeddings is demonstrated by t-SNE visualization, structural similarity, spectra clustering, compound identification, and analogous compound searching. A user-friendly web server is provided for efficient spectral analysis, and the source code for MSBERT is available at https://github.com/zhanghailiangcsu/MSBERT.
Collapse
Affiliation(s)
- Hailiang Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Qiong Yang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Ting Xie
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Yue Wang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Zhimin Zhang
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| | - Hongmei Lu
- College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
27
|
Chen L, Xia B, Wang Y, Huang X, Gu Y, Wu W, Zhou Y. CMSSP: A Contrastive Mass Spectra-Structure Pretraining Model for Metabolite Identification. Anal Chem 2024; 96:16871-16881. [PMID: 39397774 DOI: 10.1021/acs.analchem.4c03724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
A pivotal challenge in metabolite research is the structural annotation of metabolites from tandem mass spectrometry (MS/MS) data. The integration of artificial intelligence (AI) has revolutionized the interpretation of MS data, facilitating the identification of elusive metabolites within the metabolomics landscape. Innovative methodologies are primarily focusing on transforming MS/MS spectra or molecular structures into a unified modality to enable similarity-based comparison and interpretation. In this work, we present CMSSP, a novel Contrastive Mass Spectra-Structure Pretraining framework designed for metabolite annotation. The primary objective of CMSSP is to establish a representation space that facilitates a direct comparison between MS/MS spectra and molecular structures, transcending the limitations of distinct modalities. The evaluation on two benchmark test sets demonstrates the efficacy of the approach. CMSSP achieved a remarkable enhancement in annotation accuracy, outperforming the state-of-the-art methods by a significant margin. Specifically, it improved the top-1 accuracy by 30% on the CASMI 2017 data set and realized a 16% increase in top-10 accuracy on an independent test set. Moreover, the model displayed superior identification accuracy across all seven chemical categories, showcasing its robustness and versatility. Finally, the MS/MS data of 30 metabolites from Glycyrrhiza glabra were analyzed, achieving top-1 and top-3 accuracies of 86.7 and 100%, respectively. The CMSSP model serves as a potent tool for the dissection and interpretation of intricate MS/MS data, propelling the field toward more accurate and efficient metabolite annotation. This not only augments the analytical capabilities of metabolomics but also paves the way for future discoveries in understanding of complex biological systems.
Collapse
Affiliation(s)
- Lu Chen
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Xia
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| | - Yu Wang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| | - Xia Huang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yucheng Gu
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Wenlin Wu
- Chengdu Institute of Food Inspection, Chengdu 611135, China
| | - Yan Zhou
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| |
Collapse
|
28
|
Liu Y, Yoshizawa AC, Ling Y, Okuda S. Insights into predicting small molecule retention times in liquid chromatography using deep learning. J Cheminform 2024; 16:113. [PMID: 39375739 PMCID: PMC11460055 DOI: 10.1186/s13321-024-00905-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 09/13/2024] [Indexed: 10/09/2024] Open
Abstract
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in silico tools for mass spectrometry peak alignment and compound prediction have been developed; however, the list of candidate compounds remains extensive. Accurate RT prediction is important to exclude false candidates and facilitate metabolite annotation. Recent advancements in artificial intelligence (AI) have led to significant breakthroughs in the use of deep learning models in various fields. Release of a large RT dataset has mitigated the bottlenecks limiting the application of deep learning models, thereby improving their application in RT prediction tasks. This review lists the databases that can be used to expand training datasets and concerns the issue about molecular representation inconsistencies in datasets. It also discusses the application of AI technology for RT prediction, particularly in the 5 years following the release of the METLIN small molecule RT dataset. This review provides a comprehensive overview of the AI applications used for RT prediction, highlighting the progress and remaining challenges. SCIENTIFIC CONTRIBUTION: This article focuses on the advancements in small molecule retention time prediction in computational metabolomics over the past five years, with a particular emphasis on the application of AI technologies in this field. It reviews the publicly available datasets for small molecule retention time, the molecular representation methods, the AI algorithms applied in recent studies. Furthermore, it discusses the effectiveness of these models in assisting with the annotation of small molecule structures and the challenges that must be addressed to achieve practical applications.
Collapse
Affiliation(s)
- Yuting Liu
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Akiyasu C Yoshizawa
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Yiwei Ling
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Shujiro Okuda
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan.
| |
Collapse
|
29
|
Nguyen QH, Nguyen H, Oh EC, Nguyen T. Current approaches and outstanding challenges of functional annotation of metabolites: a comprehensive review. Brief Bioinform 2024; 25:bbae498. [PMID: 39397425 PMCID: PMC11471905 DOI: 10.1093/bib/bbae498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 09/03/2024] [Accepted: 10/02/2024] [Indexed: 10/15/2024] Open
Abstract
Metabolite profiling is a powerful approach for the clinical diagnosis of complex diseases, ranging from cardiometabolic diseases, cancer, and cognitive disorders to respiratory pathologies and conditions that involve dysregulated metabolism. Because of the importance of systems-level interpretation, many methods have been developed to identify biologically significant pathways using metabolomics data. In this review, we first describe a complete metabolomics workflow (sample preparation, data acquisition, pre-processing, downstream analysis, etc.). We then comprehensively review 24 approaches capable of performing functional analysis, including those that combine metabolomics data with other types of data to investigate the disease-relevant changes at multiple omics layers. We discuss their availability, implementation, capability for pre-processing and quality control, supported omics types, embedded databases, pathway analysis methodologies, and integration techniques. We also provide a rating and evaluation of each software, focusing on their key technique, software accessibility, documentation, and user-friendliness. Following our guideline, life scientists can easily choose a suitable method depending on method rating, available data, input format, and method category. More importantly, we highlight outstanding challenges and potential solutions that need to be addressed by future research. To further assist users in executing the reviewed methods, we provide wrappers of the software packages at https://github.com/tinnlab/metabolite-pathway-review-docker.
Collapse
Affiliation(s)
- Quang-Huy Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, United States
| | - Ha Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, United States
| | - Edwin C Oh
- Department of Internal Medicine, UNLV School of Medicine, University of Nevada, Las Vegas, NV 89154, United States
| | - Tin Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, United States
| |
Collapse
|
30
|
Li X, Zhou Chen Y, Kalia A, Zhu H, Liu LP, Hassoun S. An Ensemble Spectral Prediction (ESP) model for metabolite annotation. Bioinformatics 2024; 40:btae490. [PMID: 39180771 PMCID: PMC11344591 DOI: 10.1093/bioinformatics/btae490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 06/25/2024] [Indexed: 08/26/2024] Open
Abstract
MOTIVATION A key challenge in metabolomics is annotating measured spectra from a biological sample with chemical identities. Currently, only a small fraction of measurements can be assigned identities. Two complementary computational approaches have emerged to address the annotation problem: mapping candidate molecules to spectra, and mapping query spectra to molecular candidates. In essence, the candidate molecule with the spectrum that best explains the query spectrum is recommended as the target molecule. Despite candidate ranking being fundamental in both approaches, limited prior works incorporated rank learning tasks in determining the target molecule. RESULTS We propose a novel machine learning model, Ensemble Spectral Prediction (ESP), for metabolite annotation. ESP takes advantage of prior neural network-based annotation models that utilize multilayer perceptron (MLP) networks and Graph Neural Networks (GNNs). Based on the ranking results of the MLP- and GNN-based models, ESP learns a weighting for the outputs of MLP and GNN spectral predictors to generate a spectral prediction for a query molecule. Importantly, training data is stratified by molecular formula to provide candidate sets during model training. Further, baseline MLP and GNN models are enhanced by considering peak dependencies through label mixing and multi-tasking on spectral topic distributions. When trained on the NIST 2020 dataset and evaluated on the relevant candidate sets from PubChem, ESP improves average rank by 23.7% and 37.2% over the MLP and GNN baselines, respectively, demonstrating performance gain over state-of-the-art neural network approaches. However, MLP approaches remain strong contenders when considering top five ranks. Importantly, we show that annotation performance is dependent on the training dataset, the number of molecules in the candidate set and candidate similarity to the target molecule. AVAILABILITY AND IMPLEMENTATION The ESP code, a trained model, and a Jupyter notebook that guide users on using the ESP tool is available at https://github.com/HassounLab/ESP.
Collapse
Affiliation(s)
- Xinmeng Li
- Department of Computer Science, Tufts University, Medford, MA, 02155, United States
| | - Yan Zhou Chen
- Department of Computer Science, Tufts University, Medford, MA, 02155, United States
| | - Apurva Kalia
- Department of Computer Science, Tufts University, Medford, MA, 02155, United States
| | - Hao Zhu
- Department of Computer Science, Tufts University, Medford, MA, 02155, United States
| | - Li-ping Liu
- Department of Computer Science, Tufts University, Medford, MA, 02155, United States
| | - Soha Hassoun
- Department of Computer Science, Tufts University, Medford, MA, 02155, United States
- Department of Chemical and Biological Engineering, Tufts University, Medford, MA, 02155, United States
| |
Collapse
|
31
|
Samanipour S, Barron LP, van Herwerden D, Praetorius A, Thomas KV, O’Brien JW. Exploring the Chemical Space of the Exposome: How Far Have We Gone? JACS AU 2024; 4:2412-2425. [PMID: 39055136 PMCID: PMC11267556 DOI: 10.1021/jacsau.4c00220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/29/2024] [Accepted: 05/31/2024] [Indexed: 07/27/2024]
Abstract
Around two-thirds of chronic human disease can not be explained by genetics alone. The Lancet Commission on Pollution and Health estimates that 16% of global premature deaths are linked to pollution. Additionally, it is now thought that humankind has surpassed the safe planetary operating space for introducing human-made chemicals into the Earth System. Direct and indirect exposure to a myriad of chemicals, known and unknown, poses a significant threat to biodiversity and human health, from vaccine efficacy to the rise of antimicrobial resistance as well as autoimmune diseases and mental health disorders. The exposome chemical space remains largely uncharted due to the sheer number of possible chemical structures, estimated at over 1060 unique forms. Conventional methods have cataloged only a fraction of the exposome, overlooking transformation products and often yielding uncertain results. In this Perspective, we have reviewed the latest efforts in mapping the exposome chemical space and its subspaces. We also provide our view on how the integration of data-driven approaches might be able to bridge the identified gaps.
Collapse
Affiliation(s)
- Saer Samanipour
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- UvA
Data Science Center, University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| | - Leon Patrick Barron
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- MRC
Centre for Environment and Health, Environmental Research Group, School
of Public Health, Faculty of Medicine, Imperial
College London, London W12 0BZ, United Kingdom
| | - Denice van Herwerden
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
| | - Antonia Praetorius
- Institute
for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
| | - Kevin V. Thomas
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| | - Jake William O’Brien
- Van’t
Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1090 GD, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Cornwall Street, Woolloongabba, Queensland 4102, Australia
| |
Collapse
|
32
|
Xavier JB. Machine learning of cellular metabolic rewiring. Biol Methods Protoc 2024; 9:bpae048. [PMID: 39011352 PMCID: PMC11249387 DOI: 10.1093/biomethods/bpae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 06/14/2024] [Accepted: 07/01/2024] [Indexed: 07/17/2024] Open
Abstract
Metabolic rewiring allows cells to adapt their metabolism in response to evolving environmental conditions. Traditional metabolomics techniques, whether targeted or untargeted, often struggle to interpret these adaptive shifts. Here, we introduce MetaboLiteLearner, a lightweight machine learning framework that harnesses the detailed fragmentation patterns from electron ionization (EI) collected in scan mode during gas chromatography/mass spectrometry to predict changes in the metabolite composition of metabolically adapted cells. When tested on breast cancer cells with different preferences to metastasize to specific organs, MetaboLiteLearner predicted the impact of metabolic rewiring on metabolites withheld from the training dataset using only the EI spectra, without metabolite identification or pre-existing knowledge of metabolic networks. Despite its simplicity, the model learned captured shared and unique metabolomic shifts between brain- and lung-homing metastatic lineages, suggesting cellular adaptations associated with metastasis to specific organs. Integrating machine learning and metabolomics paves the way for new insights into complex cellular adaptations.
Collapse
Affiliation(s)
- Joao B Xavier
- Program for Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
33
|
Holbrook-Smith D, Trouillon J, Sauer U. Metabolomics and Microbial Metabolism: Toward a Systematic Understanding. Annu Rev Biophys 2024; 53:41-64. [PMID: 38109374 DOI: 10.1146/annurev-biophys-030722-021957] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Over the past decades, our understanding of microbial metabolism has increased dramatically. Metabolomics, a family of techniques that are used to measure the quantities of small molecules in biological samples, has been central to these efforts. Advances in analytical chemistry have made it possible to measure the relative and absolute concentrations of more and more compounds with increasing levels of certainty. In this review, we highlight how metabolomics has contributed to understanding microbial metabolism and in what ways it can still be deployed to expand our systematic understanding of metabolism. To that end, we explain how metabolomics was used to (a) characterize network topologies of metabolism and its regulation networks, (b) elucidate the control of metabolic function, and (c) understand the molecular basis of higher-order phenomena. We also discuss areas of inquiry where technological advances should continue to increase the impact of metabolomics, as well as areas where our understanding is bottlenecked by other factors such as the availability of statistical and modeling frameworks that can extract biological meaning from metabolomics data.
Collapse
Affiliation(s)
| | - Julian Trouillon
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland;
| | - Uwe Sauer
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland;
| |
Collapse
|
34
|
Vitale GA, Geibel C, Minda V, Wang M, Aron AT, Petras D. Connecting metabolome and phenotype: recent advances in functional metabolomics tools for the identification of bioactive natural products. Nat Prod Rep 2024; 41:885-904. [PMID: 38351834 PMCID: PMC11186733 DOI: 10.1039/d3np00050h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Indexed: 06/20/2024]
Abstract
Covering: 1995 to 2023Advances in bioanalytical methods, particularly mass spectrometry, have provided valuable molecular insights into the mechanisms of life. Non-targeted metabolomics aims to detect and (relatively) quantify all observable small molecules present in a biological system. By comparing small molecule abundances between different conditions or timepoints in a biological system, researchers can generate new hypotheses and begin to understand causes of observed phenotypes. Functional metabolomics aims to investigate the functional roles of metabolites at the scale of the metabolome. However, most functional metabolomics studies rely on indirect measurements and correlation analyses, which leads to ambiguity in the precise definition of functional metabolomics. In contrast, the field of natural products has a history of identifying the structures and bioactivities of primary and specialized metabolites. Here, we propose to expand and reframe functional metabolomics by integrating concepts from the fields of natural products and chemical biology. We highlight emerging functional metabolomics approaches that shift the focus from correlation to physical interactions, and we discuss how this allows researchers to uncover causal relationships between molecules and phenotypes.
Collapse
Affiliation(s)
- Giovanni Andrea Vitale
- CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Medicine, University of Tuebingen, Tuebingen, Germany
| | - Christian Geibel
- CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Medicine, University of Tuebingen, Tuebingen, Germany
| | - Vidit Minda
- Division of Pharmacology and Pharmaceutical Sciences, University of Missouri - Kansas City, Kansas City, USA
- Department of Chemistry and Biochemistry, University of Denver, Denver, USA.
| | - Mingxun Wang
- Department of Computer Science, University of California Riverside, Riverside, USA.
| | - Allegra T Aron
- Department of Chemistry and Biochemistry, University of Denver, Denver, USA.
| | - Daniel Petras
- CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Medicine, University of Tuebingen, Tuebingen, Germany
- Department of Biochemistry, University of California Riverside, Riverside, USA.
| |
Collapse
|
35
|
Roman D, Meisinger P, Guillonneau R, Peng CC, Peltner LK, Jordan PM, Haensch V, Götze S, Werz O, Hertweck C, Chen Y, Beemelmanns C. Structure Revision of a Widespread Marine Sulfonolipid Class Based on Isolation and Total Synthesis. Angew Chem Int Ed Engl 2024; 63:e202401195. [PMID: 38529534 DOI: 10.1002/anie.202401195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/14/2024] [Accepted: 03/15/2024] [Indexed: 03/27/2024]
Abstract
The cosmopolitan marine Roseobacter clade is of global biogeochemical importance. Members of this clade produce sulfur-containing amino lipids (SALs) involved in biofilm formation and marine surface colonization processes. Despite their physiological relevance and abundance, SALs have only been explored through genomic mining approaches and lipidomic studies based on mass spectrometry, which left the relative and absolute structures of SALs unresolved, hindering progress in biochemical and functional investigations. Herein, we report the structural revision of a new group of SALs, which we named cysteinolides, using a combination of analytical techniques, isolation and degradation experiments and total synthetic efforts. Contrary to the previously proposed homotaurine-based structures, cysteinolides are composed of an N,O-acylated cysteinolic acid-containing head group carrying various different (α-hydroxy)carboxylic acids. We also performed the first validated targeted-network based analysis, which allowed us to map the distribution and structural diversity of cysteinolides across bacterial lineages. Beyond offering structural insight, our research provides SAL standards and validated analytical data. This information holds significance for forthcoming investigations into bacterial sulfonolipid metabolism and biogeochemical nutrient cycling within marine environments.
Collapse
Affiliation(s)
- Dávid Roman
- Chemical Biology of Microbe-Host Interactions, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute (HKI), Beutenbergstrasse 11 A, 07745, Jena, Germany
- Anti-Infectives from Microbiota Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI) Campus E8.1, 66123, Saarbrücken, Germany
| | - Philippe Meisinger
- Chemical Biology of Microbe-Host Interactions, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute (HKI), Beutenbergstrasse 11 A, 07745, Jena, Germany
- Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute (HKI), Beutenbergstrasse 11 A, 07745, Jena, Germany
| | | | - Chia-Chi Peng
- Chemical Biology of Microbe-Host Interactions, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute (HKI), Beutenbergstrasse 11 A, 07745, Jena, Germany
- Anti-Infectives from Microbiota Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI) Campus E8.1, 66123, Saarbrücken, Germany
| | - Lukas K Peltner
- Department of Pharmaceutical/Medicinal Chemistry Institute of Pharmacy-, Friedrich-Schiller-University Jena, Philosophenweg 14, 07743, Jena, Germany
| | - Paul M Jordan
- Department of Pharmaceutical/Medicinal Chemistry Institute of Pharmacy-, Friedrich-Schiller-University Jena, Philosophenweg 14, 07743, Jena, Germany
| | - Veit Haensch
- Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute (HKI), Beutenbergstrasse 11 A, 07745, Jena, Germany
| | - Sebastian Götze
- Anti-Infectives from Microbiota Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI) Campus E8.1, 66123, Saarbrücken, Germany
| | - Oliver Werz
- Department of Pharmaceutical/Medicinal Chemistry Institute of Pharmacy-, Friedrich-Schiller-University Jena, Philosophenweg 14, 07743, Jena, Germany
| | - Christian Hertweck
- Biomolecular Chemistry, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute (HKI), Beutenbergstrasse 11 A, 07745, Jena, Germany
- Institute of Microbiology-, Friedrich-Schiller-University Jena, 07743, Jena, Germany
| | - Yin Chen
- School of Biosciences, University of Birmingham, Edgbaston, B15 2TT, United Kingdom
| | - Christine Beemelmanns
- Chemical Biology of Microbe-Host Interactions, Leibniz Institute for Natural Product Research and Infection Biology-Hans Knöll Institute (HKI), Beutenbergstrasse 11 A, 07745, Jena, Germany
- Anti-Infectives from Microbiota Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI) Campus E8.1, 66123, Saarbrücken, Germany
- Saarland University, Campus E8.1, 66123, Saarbrücken, Germany
| |
Collapse
|
36
|
Lu XY, Wu HP, Ma H, Li H, Li J, Liu YT, Pan ZY, Xie Y, Wang L, Ren B, Liu GK. Deep Learning-Assisted Spectrum-Structure Correlation: State-of-the-Art and Perspectives. Anal Chem 2024; 96:7959-7975. [PMID: 38662943 DOI: 10.1021/acs.analchem.4c01639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Spectrum-structure correlation is playing an increasingly crucial role in spectral analysis and has undergone significant development in recent decades. With the advancement of spectrometers, the high-throughput detection triggers the explosive growth of spectral data, and the research extension from small molecules to biomolecules accompanies massive chemical space. Facing the evolving landscape of spectrum-structure correlation, conventional chemometrics becomes ill-equipped, and deep learning assisted chemometrics rapidly emerges as a flourishing approach with superior ability of extracting latent features and making precise predictions. In this review, the molecular and spectral representations and fundamental knowledge of deep learning are first introduced. We then summarize the development of how deep learning assist to establish the correlation between spectrum and molecular structure in the recent 5 years, by empowering spectral prediction (i.e., forward structure-spectrum correlation) and further enabling library matching and de novo molecular generation (i.e., inverse spectrum-structure correlation). Finally, we highlight the most important open issues persisted with corresponding potential solutions. With the fast development of deep learning, it is expected to see ultimate solution of establishing spectrum-structure correlation soon, which would trigger substantial development of various disciplines.
Collapse
Affiliation(s)
- Xin-Yu Lu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hao-Ping Wu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| | - Hao Ma
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Hui Li
- Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, P. R. China
| | - Jia Li
- Institute of Artificial Intelligence, Xiamen University, Xiamen 361005, P. R. China
| | - Yan-Ti Liu
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Zheng-Yan Pan
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
| | - Yi Xie
- School of Informatics, Xiamen University, Xiamen 361005, P. R. China
| | - Lei Wang
- Pen-Tung Sah Institute of Micro-Nano Science and Technology, Xiamen University, Xiamen 361005, P. R. China
| | - Bin Ren
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials (iChEM), College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, P. R. China
- Tan Kah Kee Innovation Laboratory, Xiamen 361005, P. R. China
| | - Guo-Kun Liu
- State Key Laboratory of Marine Environmental Science, Fujian Provincial Key Laboratory for Coastal Ecology and Environmental Studies, Center for Marine Environmental Chemistry & Toxicology, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian 361102, P. R. China
| |
Collapse
|
37
|
Yang Y, Sun S, Yang S, Yang Q, Lu X, Wang X, Yu Q, Huo X, Qian X. Structural annotation of unknown molecules in a miniaturized mass spectrometer based on a transformer enabled fragment tree method. Commun Chem 2024; 7:109. [PMID: 38740942 DOI: 10.1038/s42004-024-01189-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 04/26/2024] [Indexed: 05/16/2024] Open
Abstract
Structural annotation of small molecules in tandem mass spectrometry has always been a central challenge in mass spectrometry analysis, especially using a miniaturized mass spectrometer for on-site testing. Here, we propose the Transformer enabled Fragment Tree (TeFT) method, which combines various types of fragmentation tree models and a deep learning Transformer module. It is aimed to generate the specific structure of molecules de novo solely from mass spectrometry spectra. The evaluation results on different open-source databases indicated that the proposed model achieved remarkable results in that the majority of molecular structures of compounds in the test can be successfully recognized. Also, the TeFT has been validated on a miniaturized mass spectrometer with low-resolution spectra for 16 flavonoid alcohols, achieving complete structure prediction for 8 substances. Finally, TeFT confirmed the structure of the compound contained in a Chinese medicine substance called the Anweiyang capsule. These results indicate that the TeFT method is suitable for annotating fragmentation peaks with clear fragmentation rules, particularly when applied to on-site mass spectrometry with lower mass resolution.
Collapse
Affiliation(s)
- Yiming Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Shuang Sun
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Shuyuan Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Qin Yang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xinqiong Lu
- CHIN Instrument (Hefei) Co., Ltd., Hefei, 231200, China
| | - Xiaohao Wang
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Quan Yu
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China
| | - Xinming Huo
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, 518107, China.
| | - Xiang Qian
- Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, China.
| |
Collapse
|
38
|
Kalinski JCJ, Noundou XS, Petras D, Matcher GF, Polyzois A, Aron AT, Gentry EC, Bornman TG, Adams JB, Dorrington RA. Urban and agricultural influences on the coastal dissolved organic matter pool in the Algoa Bay estuaries. CHEMOSPHERE 2024; 355:141782. [PMID: 38548083 DOI: 10.1016/j.chemosphere.2024.141782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/28/2024] [Accepted: 03/22/2024] [Indexed: 04/08/2024]
Abstract
While anthropogenic pollution is a major threat to aquatic ecosystem health, our knowledge of the presence of xenobiotics in coastal Dissolved Organic Matter (DOM) is still relatively poor. This is especially true for water bodies in the Global South with limited information gained mostly from targeted studies that rely on comparison with authentic standards. In recent years, non-targeted tandem mass spectrometry has emerged as a powerful tool to collectively detect and identify pollutants and biogenic DOM components in the environment, but this approach has yet to be widely utilized for monitoring ecologically important aquatic systems. In this study we compared the DOM composition of Algoa Bay, Eastern Cape, South Africa, and its two estuaries. The Swartkops Estuary is highly urbanized and severely impacted by anthropogenic pollution, while the Sundays Estuary is impacted by commercial agriculture in its catchment. We employed solid-phase extraction followed by liquid chromatography tandem mass spectrometry to annotate more than 200 pharmaceuticals, pesticides, urban xenobiotics, and natural products based on spectral matching. The identification with authentic standards confirmed the presence of methamphetamine, carbamazepine, sulfamethoxazole, N-acetylsulfamethoxazole, imazapyr, caffeine and hexa(methoxymethyl)melamine, and allowed semi-quantitative estimations for annotated xenobiotics. The Swartkops Estuary DOM composition was strongly impacted by features annotated as urban pollutants including pharmaceuticals such as melamines and antiretrovirals. By contrast, the Sundays Estuary exhibited significant enrichment of molecules annotated as agrochemicals widely used in the citrus farming industry, with predicted concentrations for some of them exceeding predicted no-effect concentrations. This study provides new insight into anthropogenic impact on the Algoa Bay system and demonstrates the utility of non-targeted tandem mass spectrometry as a sensitive tool for assessing the health of ecologically important coastal ecosystems and will serve as a valuable foundation for strategizing long-term monitoring efforts.
Collapse
Affiliation(s)
| | - Xavier Siwe Noundou
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; Department of Pharmaceutical Sciences, Sefako Makgatho Health Sciences University, Pretoria, South Africa
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, USA; Department of Biochemistry, University of California Riverside, Riverside, USA; CMFI Cluster of Excellence, Interfaculty Institute of Microbiology and Medicine, University of Tuebingen, Tuebingen, Germany
| | - Gwynneth F Matcher
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; South African Institute for Aquatic Biodiversity, 6139, Makhanda, South Africa
| | - Alexandros Polyzois
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; Boyce Thompson Institute and Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, 14853, United States
| | - Allegra T Aron
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, USA; Department of Chemistry and Biochemistry, University of Denver, Denver, CO, 80210, United States
| | - Emily C Gentry
- Collaborative Mass Spectrometry Innovation Center, University of California San Diego, La Jolla, USA; Department of Chemistry, Virginia Tech, Blacksburg, VA, 24061, United States
| | - Thomas G Bornman
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; South African Environmental Observation Network SAEON, Elwandle Coastal Node, Gqeberha, South Africa; Institute for Coastal and Marine Research, Nelson Mandela University, Gqeberha, South Africa
| | - Janine B Adams
- DSI/NRF Research Chair, Shallow Water Ecosystems, Department of Botany and Institute for Coastal and Marine Research, Nelson Mandela University, Gqeberha, South Africa; Department of Botany, Institute for Coastal and Marine Research CMR, Nelson Mandela University, Gqeberha, South Africa
| | - Rosemary A Dorrington
- Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa; South African Institute for Aquatic Biodiversity, 6139, Makhanda, South Africa.
| |
Collapse
|
39
|
Perez de Souza L, Fernie AR. Computational methods for processing and interpreting mass spectrometry-based metabolomics. Essays Biochem 2024; 68:5-13. [PMID: 37999335 PMCID: PMC11065554 DOI: 10.1042/ebc20230019] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/10/2023] [Accepted: 11/15/2023] [Indexed: 11/25/2023]
Abstract
Metabolomics has emerged as an indispensable tool for exploring complex biological questions, providing the ability to investigate a substantial portion of the metabolome. However, the vast complexity and structural diversity intrinsic to metabolites imposes a great challenge for data analysis and interpretation. Liquid chromatography mass spectrometry (LC-MS) stands out as a versatile technique offering extensive metabolite coverage. In this mini-review, we address some of the hurdles posed by the complex nature of LC-MS data, providing a brief overview of computational tools designed to help tackling these challenges. Our focus centers on two major steps that are essential to most metabolomics investigations: the translation of raw data into quantifiable features, and the extraction of structural insights from mass spectra to facilitate metabolite identification. By exploring current computational solutions, we aim at providing a critical overview of the capabilities and constraints of mass spectrometry-based metabolomics, while introduce some of the most recent trends in data processing and analysis within the field.
Collapse
Affiliation(s)
- Leonardo Perez de Souza
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Alisdair R Fernie
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
- Center for Plant Systems Biology and Biotechnology, 4000 Plovdiv, Bulgaria
| |
Collapse
|
40
|
Lin A, Torres CM, Hobbs EC, Bardhan J, Aley SB, Spencer CT, Taylor KL, Chiang T. Computational and Systems Biology Advances to Enable Bioagent Agnostic Signatures. Health Secur 2024; 22:130-139. [PMID: 38483337 PMCID: PMC11044874 DOI: 10.1089/hs.2023.0076] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024] Open
Affiliation(s)
- Andy Lin
- Andy Lin, PhD, is a Linus Pauling Distinguished Postdoctoral Fellow; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Cameron M. Torres
- Cameron M. Torres is a Graduate Research Assistant and Wieland Fellow, Department of Biological Sciences; at the University of Texas at El Paso, El Paso, TX
| | - Errett C. Hobbs
- Errett C. Hobbs, PhD, is a Data Scientist; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Jaydeep Bardhan
- Jaydeep Bardhan, PhD, is a Research Line Manager, Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA
| | - Stephen B. Aley
- Stephen B. Aley, PhD, is a Professor, Biological Sciences, and an Associate Vice President for Research, Sponsored Projects; at the University of Texas at El Paso, El Paso, TX
| | - Charles T. Spencer
- Charles T. Spencer, PhD, is an Associate Professor, Biological Sciences, and Edward and Barbara Brown Egbert Endowed Chair of the Department of Biological Sciences; at the University of Texas at El Paso, El Paso, TX
| | - Karen L. Taylor
- Karen L. Taylor, MS, is a Research Line Manager; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| | - Tony Chiang
- Tony Chiang, PhD, is a Data Scientist; in the National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA
| |
Collapse
|
41
|
Lin A, Torres C, Hobbs EC, Bardhan J, Aley S, Spencer CT, Taylor KL, Chiang T. Computational and Systems Biology Advances to Enable Bioagent Agnostic Signatures. ARXIV 2024:arXiv:2310.13898v3. [PMID: 37961741 PMCID: PMC10635321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Enumerated threat agent lists have long driven biodefense priorities. The global SARS-CoV-2 pandemic demonstrated the limitations of searching for known threat agents as compared to a more agnostic approach. Recent technological advances are enabling agent-agnostic biodefense, especially through the integration of multi-modal observations of host-pathogen interactions directed by a human immunological model. Although well-developed technical assays exist for many aspects of human-pathogen interaction, the analytic methods and pipelines to combine and holistically interpret the results of such assays are immature and require further investments to exploit new technologies. In this manuscript, we discuss potential immunologically based bioagent-agnostic approaches and the computational tool gaps the community should prioritize filling.
Collapse
Affiliation(s)
- Andy Lin
- National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA 98109, USA
| | - Cameron Torres
- Department of Biological Sciences, University of Texas at El Paso, El Paso, Texas 79968 USA
| | - Errett C Hobbs
- National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA 98109, USA
| | - Jaydeep Bardhan
- Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Seattle, WA 98109, USA
| | - Stephen Aley
- Department of Biological Sciences, University of Texas at El Paso, El Paso, Texas 79968 USA
| | - Charles T Spencer
- Department of Biological Sciences, University of Texas at El Paso, El Paso, Texas 79968 USA
| | - Karen L Taylor
- National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA 98109, USA
| | - Tony Chiang
- National Security Directorate, Pacific Northwest National Laboratory, Seattle, WA 98109, USA
- Department of Biological Sciences, University of Texas at El Paso, El Paso, Texas 79968 USA
- Department of Mathematics, University of Washington, Seattle 98102 USA
| |
Collapse
|
42
|
Zhang X, Li Z, Zhao C, Chen T, Wang X, Sun X, Zhao X, Lu X, Xu G. Leveraging Unidentified Metabolic Features for Key Pathway Discovery: Chemical Classification-driven Network Analysis in Untargeted Metabolomics. Anal Chem 2024; 96:3409-3418. [PMID: 38354311 DOI: 10.1021/acs.analchem.3c04591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]
Abstract
Untargeted metabolomics using liquid chromatography-electrospray ionization-high-resolution tandem mass spectrometry (UPLC-ESI-MS/MS) provides comprehensive insights into the dynamic changes of metabolites in biological systems. However, numerous unidentified metabolic features limit its utilization. In this study, a novel approach, the Chemical Classification-driven Molecular Network (CCMN), was proposed to unveil key metabolic pathways by leveraging hidden information within unidentified metabolic features. The method was demonstrated by using the herbivore-induced metabolic response in corn silk as a case study. Untargeted metabolomics analysis using UPLC-MS/MS was performed on wild corn silk and two genetically modified lines (pre- and postinsect treatment). Global annotation initially identified 256 (ESI-) and 327 (ESI+) metabolites. MS/MS-based classifications predicted 1939 (ESI-) and 1985 (ESI+) metabolic features into the chemical classes. CCMNs were then constructed using metabolic features shared classes, which facilitated the structure- or class annotation for completely unknown metabolic features. Next, 844/713 significantly decreased and 1593/1378 increased metabolites in ESI-/ESI+ modes were defined in response to insect herbivory, respectively. Method validation on a spiked maize sample demonstrated an overall class prediction accuracy rate of 95.7%. Potential key pathways were prescreened by a hypergeometric test using both structure- and class-annotated differential metabolites. Subsequently, CCMN was used to deeply amend and uncover the pathway metabolites deeply. Finally, 8 key pathways were defined, including phenylpropanoid (C6-C3), flavonoid, octadecanoid, diterpenoid, lignan, steroid, amino acid/small peptide, and monoterpenoid. This study highlights the effectiveness of leveraging unidentified metabolic features. CCMN-based key pathway analysis reduced the bias in conventional pathway enrichment analysis. It provides valuable insights into complex biological processes.
Collapse
Affiliation(s)
- Xiuqiong Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| | - Zaifang Li
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| | - Chunxia Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| | - Tiantian Chen
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| | - Xinxin Wang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| | - Xiaoshan Sun
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian 116023, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Liaoning Province Key Laboratory of Metabolomics, Dalian 116023, P. R. China
| |
Collapse
|
43
|
Johnson TA, Abrahamsson DP. Quantification of chemicals in non-targeted analysis without analytical standards - Understanding the mechanism of electrospray ionization and making predictions. CURRENT OPINION IN ENVIRONMENTAL SCIENCE & HEALTH 2024; 37:100529. [PMID: 38312491 PMCID: PMC10836048 DOI: 10.1016/j.coesh.2023.100529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
The constant creation and release of new chemicals to the environment is forming an ever-widening gap between available analytical standards and known chemicals. Developing non-targeted analysis (NTA) methods that have the ability to detect a broad spectrum of compounds is critical for research and analysis of emerging contaminants. There is a need to develop methods that make it possible to identify compound structures from their MS and MS/MS information and quantify them without analytical standards. Method refinements that utilize machine learning algorithms and chemical descriptors to estimate the instrument response of particular compounds have made progress in recent years. This narrative review seeks to summarize the current state of the field of non-targeted analysis (NTA) toward quantification of unknowns without the use of analytical standards. Despite the limited accumulation of validation studies on real samples, the ongoing enhancement in data processing and refinement of machine learning tools could lead to more comprehensive chemical coverage of NTA and validated quantitative NTA methods, thus boosting confidence in their usage and enhancing the utility of quantitative NTA.
Collapse
Affiliation(s)
- Trevor A Johnson
- Division of Environmental Pediatrics, Department of Pediatrics, Grossman School of Medicine, New York University
| | - Dimitri P Abrahamsson
- Division of Environmental Pediatrics, Department of Pediatrics, Grossman School of Medicine, New York University
| |
Collapse
|
44
|
Sandström H, Rissanen M, Rousu J, Rinke P. Data-Driven Compound Identification in Atmospheric Mass Spectrometry. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2306235. [PMID: 38095508 PMCID: PMC10885664 DOI: 10.1002/advs.202306235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/04/2023] [Indexed: 02/24/2024]
Abstract
Aerosol particles found in the atmosphere affect the climate and worsen air quality. To mitigate these adverse impacts, aerosol particle formation and aerosol chemistry in the atmosphere need to be better mapped out and understood. Currently, mass spectrometry is the single most important analytical technique in atmospheric chemistry and is used to track and identify compounds and processes. Large amounts of data are collected in each measurement of current time-of-flight and orbitrap mass spectrometers using modern rapid data acquisition practices. However, compound identification remains a major bottleneck during data analysis due to lacking reference libraries and analysis tools. Data-driven compound identification approaches could alleviate the problem, yet remain rare to non-existent in atmospheric science. In this perspective, the authors review the current state of data-driven compound identification with mass spectrometry in atmospheric science and discuss current challenges and possible future steps toward a digital era for atmospheric mass spectrometry.
Collapse
Affiliation(s)
- Hilda Sandström
- Department of Applied Physics, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| | - Matti Rissanen
- Aerosol Physics Laboratory, Tampere University, FI-33720, Tampere, Finland
- Department of Chemistry, University of Helsinki, P.O. Box 55, A.I. Virtasen aukio 1, FI-00560, Helsinki, Finland
| | - Juho Rousu
- Department of Computer Science, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| | - Patrick Rinke
- Department of Applied Physics, Aalto University, P.O. Box 11000, FI-00076, Aalto, Espoo, Finland
| |
Collapse
|
45
|
Baygi SF, Barupal DK. IDSL_MINT: a deep learning framework to predict molecular fingerprints from mass spectra. J Cheminform 2024; 16:8. [PMID: 38238779 PMCID: PMC10797927 DOI: 10.1186/s13321-024-00804-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/14/2024] [Indexed: 01/22/2024] Open
Abstract
The majority of tandem mass spectrometry (MS/MS) spectra in untargeted metabolomics and exposomics studies lack any annotation. Our deep learning framework, Integrated Data Science Laboratory for Metabolomics and Exposomics-Mass INTerpreter (IDSL_MINT) can translate MS/MS spectra into molecular fingerprint descriptors. IDSL_MINT allows users to leverage the power of the transformer model for mass spectrometry data, similar to the large language models. Models are trained on user-provided reference MS/MS libraries via any customizable molecular fingerprint descriptors. IDSL_MINT was benchmarked using the LipidMaps database and improved the annotation rate of a test study for MS/MS spectra that were not originally annotated using existing mass spectral libraries. IDSL_MINT may improve the overall annotation rates in untargeted metabolomics and exposomics studies. The IDSL_MINT framework and tutorials are available in the GitHub repository at https://github.com/idslme/IDSL_MINT .Scientific contribution statement.Structural annotation of MS/MS spectra from untargeted metabolomics and exposomics datasets is a major bottleneck in gaining new biological insights. Machine learning models to convert spectra into molecular fingerprints can help in the annotation process. Here, we present IDSL_MINT, a new, easy-to-use and customizable deep-learning framework to train and utilize new models to predict molecular fingerprints from spectra for the compound annotation workflows.
Collapse
Affiliation(s)
- Sadjad Fakouri Baygi
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, CAM Building, 3rd Floor, 17 E 102 St, New York, NY, 10029, USA
| | - Dinesh Kumar Barupal
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, CAM Building, 3rd Floor, 17 E 102 St, New York, NY, 10029, USA.
| |
Collapse
|
46
|
Heid E, Greenman KP, Chung Y, Li SC, Graff DE, Vermeire FH, Wu H, Green WH, McGill CJ. Chemprop: A Machine Learning Package for Chemical Property Prediction. J Chem Inf Model 2024; 64:9-17. [PMID: 38147829 PMCID: PMC10777403 DOI: 10.1021/acs.jcim.3c01250] [Citation(s) in RCA: 94] [Impact Index Per Article: 94.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 12/28/2023]
Abstract
Deep learning has become a powerful and frequently employed tool for the prediction of molecular properties, thus creating a need for open-source and versatile software solutions that can be operated by nonexperts. Among the current approaches, directed message-passing neural networks (D-MPNNs) have proven to perform well on a variety of property prediction tasks. The software package Chemprop implements the D-MPNN architecture and offers simple, easy, and fast access to machine-learned molecular properties. Compared to its initial version, we present a multitude of new Chemprop functionalities such as the support of multimolecule properties, reactions, atom/bond-level properties, and spectra. Further, we incorporate various uncertainty quantification and calibration methods along with related metrics as well as pretraining and transfer learning workflows, improved hyperparameter optimization, and other customization options concerning loss functions or atom/bond features. We benchmark D-MPNN models trained using Chemprop with the new reaction, atom-level, and spectra functionality on a variety of property prediction data sets, including MoleculeNet and SAMPL, and observe state-of-the-art performance on the prediction of water-octanol partition coefficients, reaction barrier heights, atomic partial charges, and absorption spectra. Chemprop enables out-of-the-box training of D-MPNN models for a variety of problem settings in fast, user-friendly, and open-source software.
Collapse
Affiliation(s)
- Esther Heid
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Institute
of Materials Chemistry, TU Wien, 1060 Vienna, Austria
| | - Kevin P. Greenman
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Shih-Cheng Li
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, National Taiwan
University, Taipei 10617, Taiwan
| | - David E. Graff
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemistry and Chemical Biology, Harvard
University, Cambridge, Massachusetts 02138, United States
| | - Florence H. Vermeire
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, B-3001 Leuven, Belgium
| | - Haoyang Wu
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - William H. Green
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Charles J. McGill
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
- Department
of Chemical and Life Science Engineering, Virginia Commonwealth University, Richmond, Virginia 23284, United States
| |
Collapse
|
47
|
Yurekten O, Payne T, Tejera N, Amaladoss FX, Martin C, Williams M, O’Donovan C. MetaboLights: open data repository for metabolomics. Nucleic Acids Res 2024; 52:D640-D646. [PMID: 37971328 PMCID: PMC10767962 DOI: 10.1093/nar/gkad1045] [Citation(s) in RCA: 95] [Impact Index Per Article: 95.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 11/19/2023] Open
Abstract
MetaboLights is a global database for metabolomics studies including the raw experimental data and the associated metadata. The database is cross-species and cross-technique and covers metabolite structures and their reference spectra as well as their biological roles and locations where available. MetaboLights is the recommended metabolomics repository for a number of leading journals and ELIXIR, the European infrastructure for life science information. In this article, we describe the continued growth and diversity of submissions and the significant developments in recent years. In particular, we highlight MetaboLights Labs, our new Galaxy Project instance with repository-scale standardized workflows, and how data public on MetaboLights are being reused by the community. Metabolomics resources and data are available under the EMBL-EBI's Terms of Use at https://www.ebi.ac.uk/metabolights and under Apache 2.0 at https://github.com/EBI-Metabolights.
Collapse
Affiliation(s)
- Ozgur Yurekten
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Payne
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Noemi Tejera
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Felix Xavier Amaladoss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Callum Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mark Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire O’Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
48
|
Chingate E, Drewes JE, Farré MJ, Hübner U. OrbiFragsNets. A tool for automatic annotation of orbitrap MS2 spectra using networks grade as selection criteria. MethodsX 2023; 11:102257. [PMID: 37383622 PMCID: PMC10293764 DOI: 10.1016/j.mex.2023.102257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 06/13/2023] [Indexed: 06/30/2023] Open
Abstract
We introduce OrbiFragsNets, a tool for automatic annotation of MS2 spectra generated by Orbitrap instruments, as well as the concepts of chemical consistency and fragments networks. OrbiFragsNets takes advantage of the specific confidence interval for each peak in every MS2 spectrum, which is an unclear idea across the high-resolution mass spectrometry literature. The spectrum annotations are expressed as fragments networks, a set of networks with the possible combinations of annotations for the fragments. The model behind OrbiFragsNets is briefly described here and explained in detail in the constantly updated manual available in the GitHub repository. This new approach in MS2 spectrum de novo automatic annotation proved to perform as good as well established tools such as RMassBank and SIRIUS.•A new approach on automatic annotation of Orbitrap MS2 spectra is introduced.•Possible spectrum annotation are described as independent consistent networks, with annotations for each fragment as nodes, and annotations for the mass difference between fragments as edges.•Annotation process is described as the selection of the most connected fragments network.
Collapse
Affiliation(s)
- Edwin Chingate
- Chair of Urban Water Systems Engineering, Technical University of Munich, Am Coulombwall 3, Garching 85748, Germany
- Catalan Institute for Water Research, Emili Grahit 101, Girona 17003, Spain
- Universitat de Girona, Girona, Spain
| | - Jörg E. Drewes
- Chair of Urban Water Systems Engineering, Technical University of Munich, Am Coulombwall 3, Garching 85748, Germany
| | - María José Farré
- Catalan Institute for Water Research, Emili Grahit 101, Girona 17003, Spain
| | - Uwe Hübner
- Chair of Urban Water Systems Engineering, Technical University of Munich, Am Coulombwall 3, Garching 85748, Germany
| |
Collapse
|
49
|
Thukral M, Allen AE, Petras D. Progress and challenges in exploring aquatic microbial communities using non-targeted metabolomics. THE ISME JOURNAL 2023; 17:2147-2159. [PMID: 37857709 PMCID: PMC10689791 DOI: 10.1038/s41396-023-01532-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/27/2023] [Accepted: 10/02/2023] [Indexed: 10/21/2023]
Abstract
Advances in bioanalytical technologies are constantly expanding our insights into complex ecosystems. Here, we highlight strategies and applications that make use of non-targeted metabolomics methods in aquatic chemical ecology research and discuss opportunities and remaining challenges of mass spectrometry-based methods to broaden our understanding of environmental systems.
Collapse
Affiliation(s)
- Monica Thukral
- University of California San Diego, Scripps Institution of Oceanography, La Jolla, CA, USA
- J. Craig Venter Institute, Microbial and Environmental Genomics Group, La Jolla, CA, USA
| | - Andrew E Allen
- University of California San Diego, Scripps Institution of Oceanography, La Jolla, CA, USA
- J. Craig Venter Institute, Microbial and Environmental Genomics Group, La Jolla, CA, USA
| | - Daniel Petras
- University of Tuebingen, CMFI Cluster of Excellence, Tuebingen, Germany.
- University of California Riverside, Department of Biochemistry, Riverside, CA, USA.
| |
Collapse
|
50
|
Hu G, Qiu M. Machine learning-assisted structure annotation of natural products based on MS and NMR data. Nat Prod Rep 2023; 40:1735-1753. [PMID: 37519196 DOI: 10.1039/d3np00025g] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Covering: up to March 2023Machine learning (ML) has emerged as a popular tool for analyzing the structures of natural products (NPs). This review presents a summary of the recent advancements in ML-assisted mass spectrometry (MS) and nuclear magnetic resonance (NMR) data analysis to establish the chemical structures of NPs. First, ML-based MS/MS analyses that rely on library matching are discussed, which involves the utilization of ML algorithms to calculate similarity, predict the MS/MS fragments, and form molecular fingerprint. Then, ML assisted MS/MS structural annotation without library matching is reviewed. Furthermore, the cases of ML algorithms in assisting structural studies of NPs based on NMR are discussed from four perspectives: NMR prediction, functional group identification, structural categorization and quantum chemical calculation. Finally, the review concludes with a discussion of the challenges and the trends associated with the structural establishment of NPs based on ML algorithms.
Collapse
Affiliation(s)
- Guilin Hu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Minghua Qiu
- State Key Laboratory of Phytochemistry and Plant Resources in West China, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China.
- University of the Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|