1
|
Wang Y, Wang Y, Zhang Z, Xu K, Fang Q, Wu X, Ma S. Molecular networking: An efficient tool for discovering and identifying natural products. J Pharm Biomed Anal 2025; 259:116741. [PMID: 40014895 DOI: 10.1016/j.jpba.2025.116741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 02/06/2025] [Accepted: 02/08/2025] [Indexed: 03/01/2025]
Abstract
Natural products (NPs), play a crucial role in drug development. However, the discovery of NPs is accidental, and conventional identification methods lack accuracy. To overcome these challenges, an increasing number of researchers are directing their attention to Molecular networking (MN). MN based on secondary mass spectrometry has become an important tool for the separation, purification and structural identification of NPs. However, most new tools are not well known. This review started with the most basic MN tool and explains it from the principle, workflow, and application. Then introduce the principles and workflows of the remaining eight new types of MN tools. The reliability of various MNs is mainly verified based on the application of phytochemistry and metabolomics. Subsequently, the principles and applications of 12 structural annotation tools are introduced. For the first time, the scope of 9 kinds of MN tools is compared horizontally, and 12 kinds of structured annotation tools are classified from the type of compound structure suitable for identification. The advantages and disadvantages of various tools are summarized, and make suggestions for future application directions and the development of computing tools in this review. MN tools are expected to enhance the efficiency of the discovery and identification in NPs.
Collapse
Affiliation(s)
- Yongjian Wang
- National Institutes for Food and Drug Control, Beijing 102629, China; Hebei University of Chinese Medicine, Shijiazhuang 050091, China
| | - Yadan Wang
- National Institutes for Food and Drug Control, Beijing 102629, China; State Key Laboratory of Drug Regulatory Science, Beijing 100050, China
| | - Zhongmou Zhang
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing 102488, China
| | - Kailing Xu
- National Institutes for Food and Drug Control, Beijing 102629, China
| | - Qiufang Fang
- Shenyang Pharmaceutical University, Shenyang 110179, China
| | - Xianfu Wu
- National Institutes for Food and Drug Control, Beijing 102629, China.
| | - Shuangcheng Ma
- State Key Laboratory of Drug Regulatory Science, Beijing 100050, China; Chinese Pharmacopoeia Commission, Beijing 100061, China.
| |
Collapse
|
2
|
Quinlan ZA, Nelson CE, Koester I, Petras D, Nothias L, Comstock J, White BM, Aluwihare LI, Bailey BA, Carlson CA, Dorrestein PC, Haas AF, Wegley Kelly L. Microbial Community Metabolism of Coral Reef Exometabolomes Broadens the Chemodiversity of Labile Dissolved Organic Matter. Environ Microbiol 2025; 27:e70064. [PMID: 40108841 PMCID: PMC11923415 DOI: 10.1111/1462-2920.70064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 01/27/2025] [Accepted: 02/04/2025] [Indexed: 03/22/2025]
Abstract
Dissolved organic matter (DOM) comprises diverse compounds with variable bioavailability across aquatic ecosystems. The sources and quantities of DOM can influence microbial growth and community structure with effects on biogeochemical processes. To investigate the chemodiversity of labile DOM in tropical reef waters, we tracked microbial utilisation of over 3000 untargeted mass spectrometry ion features exuded from two coral and three algal species. Roughly half of these features clustered into over 500 biologically labile spectral subnetworks annotated to diverse structural superclasses, including benzenoids, lipids, organic acids, heterocyclics and phenylpropanoids, comprising on average one-third of the ion richness and abundance within each chemical class. Distinct subsets of these labile compounds were exuded by algae and corals during the day and night, driving differential microbial growth and substrate utilisation. This study expands the chemical diversity of labile marine DOM with implications for carbon cycling in coastal environments.
Collapse
Affiliation(s)
| | - Craig E. Nelson
- Daniel K. Inouye Center for Microbial Oceanography: Research and Education, Department of Oceanography and Sea Grant College Program, School of Ocean and Earth Science and TechnologyUniversity of Hawaiʻi at MānoaHonoluluHawaiʻiUSA
| | - Irina Koester
- Scripps Institution of Oceanography, UC San DiegoLa JollaCaliforniaUSA
| | - Daniel Petras
- Collaborative Mass Spectrometry Innovation CenterSkaggs School of Pharmacy and Pharmaceutical Sciences, UC San DiegoLa JollaCaliforniaUSA
- Controlling Microbes to Fight Infections Cluster of ExcellenceUniversity of TuebingenTuebingenGermany
| | - Louis‐Felix Nothias
- Collaborative Mass Spectrometry Innovation CenterSkaggs School of Pharmacy and Pharmaceutical Sciences, UC San DiegoLa JollaCaliforniaUSA
- Université Côte d'Azur, CNRS, ICNNiceFrance
| | - Jacqueline Comstock
- Department of EcologyEvolution and Marine Biology and Marine Science Institute, University of CaliforniaSanta BarbaraCaliforniaUSA
| | - Brandie M. White
- Department of Mathematics and StatisticsSan Diego State UniversitySan DiegoCaliforniaUSA
| | | | - Barbara A. Bailey
- Department of Mathematics and StatisticsSan Diego State UniversitySan DiegoCaliforniaUSA
| | - Craig A. Carlson
- Department of EcologyEvolution and Marine Biology and Marine Science Institute, University of CaliforniaSanta BarbaraCaliforniaUSA
| | - Pieter C. Dorrestein
- Collaborative Mass Spectrometry Innovation CenterSkaggs School of Pharmacy and Pharmaceutical Sciences, UC San DiegoLa JollaCaliforniaUSA
| | - Andreas F. Haas
- NIOZ Royal Netherlands Institute for Sea Research and Utrecht UniversityTexelthe Netherlands
| | | |
Collapse
|
3
|
Calvert D, Dew T, Gadon A, Gros J, Cook D. Valorisation of hop leaves for their bioactive compounds: Identification and quantification of phenolics across different varieties, crop years and stages of development. Food Chem 2025; 465:142005. [PMID: 39577263 DOI: 10.1016/j.foodchem.2024.142005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 10/18/2024] [Accepted: 11/09/2024] [Indexed: 11/24/2024]
Abstract
Hop leaves, a by-product from hop cone harvesting, contain phenolic compounds of potential value for food or beverage applications. However, the abundant phenolics in hop leaves remain largely unquantified. This study quantified phenolics in hop leaves over two crop years, for three commercially significant varieties, at different developmental stages post-flowering. Ethanolic hop extracts were characterised using LC-ESI-qTOF-MS/MS and HPLC-DAD for the annotation and quantification of phenolics and bitter resins. Hop leaf phenolic profile exhibited considerable structural diversity, differing significantly from that of respective cones. Kaempferol/quercetin 3-O-glycosides and chlorogenic acids were the most abundant sub-groups with phenolic acids, procyanidins, prenylflavonoids and bitter resins also present. Phenolic profile was mainly variety-dependent with some crop year and developmental effects. Flavonol 3-O-glycosides were the main compounds driving varietal differences. Findings demonstrate the structural diversity and high concentrations of phenolic compounds in hop leaf extracts and their potential as a source of bioactives for valorisation.
Collapse
Affiliation(s)
- Duncan Calvert
- International Centre for Brewing Science, University of Nottingham
| | - Tristan Dew
- Division of Food, Nutrition and Dietetics, University of Nottingham
| | - Arthur Gadon
- Anheuser-Busch InBev nv/sa, Brouwerijplein 1, 3000 Leuven, Belgium
| | - Jacques Gros
- Anheuser-Busch InBev nv/sa, Brouwerijplein 1, 3000 Leuven, Belgium
| | - David Cook
- International Centre for Brewing Science, University of Nottingham..
| |
Collapse
|
4
|
Chau HYK, Zhang X, Ressom HW. Deep Learning-Based Molecular Fingerprint Prediction for Metabolite Annotation. Metabolites 2025; 15:132. [PMID: 39997757 PMCID: PMC11857613 DOI: 10.3390/metabo15020132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Revised: 02/07/2025] [Accepted: 02/10/2025] [Indexed: 02/26/2025] Open
Abstract
Background/Objectives: Liquid chromatography coupled with mass spectrometry (LC-MS) is a commonly used platform for many metabolomics studies. However, metabolite annotation has been a major bottleneck in these studies in part due to the limited publicly available spectral libraries, which consist of tandem mass spectrometry (MS/MS) data acquired from just a fraction of known compounds. Application of deep learning methods is increasingly reported as an alternative to spectral matching due to their ability to map complex relationships between molecular fingerprints and mass spectrometric measurements. The objectives of this study are to investigate deep learning methods for molecular fingerprint based on MS/MS spectra and to rank putative metabolite IDs according to similarity of their known and predicted molecular fingerprints. Methods: We trained three types of deep learning methods to model the relationships between molecular fingerprints and MS/MS spectra. Prior to training, various data processing steps, including scaling, binning, and filtering, were performed on MS/MS spectra obtained from National Institute of Standards and Technology (NIST), MassBank of North America (MoNA), and Human Metabolome Database (HMDB). Furthermore, selection of the most relevant m/z bins and molecular fingerprints was conducted. The trained deep learning models were evaluated on ranking putative metabolite IDs obtained from a compound database for the challenges in Critical Assessment of Small Molecule Identification (CASMI) 2016, CASMI 2017, and CASMI 2022 benchmark datasets. Results: Feature selection methods effectively reduced redundant molecular and spectral features prior to model training. Deep learning methods trained with the truncated features have shown comparable performances against CSI:FingerID on ranking putative metabolite IDs. Conclusion: The results demonstrate a promising potential of deep learning methods for metabolite annotation.
Collapse
Affiliation(s)
| | | | - Habtom W. Ressom
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC 20057, USA; (H.Y.K.C.); (X.Z.)
| |
Collapse
|
5
|
Subrahmaniam HJ, Picó FX, Bataillon T, Salomonsen CL, Glasius M, Ehlers BK. Natural variation in root exudate composition in the genetically structured Arabidopsis thaliana in the Iberian Peninsula. THE NEW PHYTOLOGIST 2025; 245:1437-1449. [PMID: 39658885 PMCID: PMC11754937 DOI: 10.1111/nph.20314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 11/14/2024] [Indexed: 12/12/2024]
Abstract
Plant root exudates are involved in nutrient acquisition, microbial partnerships, and inter-organism signaling. Yet, little is known about the genetic and environmental drivers of root exudate variation at large geographical scales, which may help understand the evolutionary trajectories of plants in heterogeneous environments. We quantified natural variation in the chemical composition of Arabidopsis thaliana root exudates in 105 Iberian accessions. We identified up to 373 putative compounds using ultra-high-performance liquid chromatography coupled with mass spectrometry. We estimated the broad-sense heritability of compounds and conducted a genome-wide association (GWA) study. We associated variation in root exudates to variation in geographic, environmental, life history, and genetic attributes of Iberian accessions. Only 25 of 373 compounds exhibited broad-sense heritability values significantly different from zero. GWA analysis identified polymorphisms associated with 12 root exudate compounds and 26 known genes involved in metabolism, defense, signaling, and nutrient transport. The genetic structure influenced root exudate composition involving terpenoids. We detected five terpenoids related to plant defense significantly varying in mean abundances in two genetic clusters. Our study provides first insights into the extent of root exudate natural variation at a regional scale depicting a diversified evolutionary trajectory among A. thaliana genetic clusters chiefly mediated by terpenoid composition.
Collapse
Affiliation(s)
- Harihar Jaishree Subrahmaniam
- Department of EcoscienceAarhus UniversityAarhus C8000Denmark
- Institut für Pflanzenwissenschaften und MikrobiologieUniversität HamburgHamburg22609Germany
| | - F. Xavier Picó
- Departamento de Ecología y Evolución, Estación Biológica de DoñanaConsejo Superior de Investigaciones CientíficasSevilla41092Spain
| | - Thomas Bataillon
- Department of Molecular Biology and Genetics, Bioinformatics Research CentreAarhus UniversityAarhus C8000Denmark
| | | | | | - Bodil K. Ehlers
- Department of EcoscienceAarhus UniversityAarhus C8000Denmark
| |
Collapse
|
6
|
Russo FF, Nowatzky Y, Jaeger C, Parr MK, Benner P, Muth T, Lisec J. Machine learning methods for compound annotation in non-targeted mass spectrometry-A brief overview of fingerprinting, in silico fragmentation and de novo methods. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2024; 38:e9876. [PMID: 39180507 DOI: 10.1002/rcm.9876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/03/2024] [Accepted: 07/12/2024] [Indexed: 08/26/2024]
Abstract
Non-targeted screenings (NTS) are essential tools in different fields, such as forensics, health and environmental sciences. NTSs often employ mass spectrometry (MS) methods due to their high throughput and sensitivity in comparison to, for example, nuclear magnetic resonance-based methods. As the identification of mass spectral signals, called annotation, is labour intensive, it has been used for developing supporting tools based on machine learning (ML). However, both the diversity of mass spectral signals and the sheer quantity of different ML tools developed for compound annotation present a challenge for researchers in maintaining a comprehensive overview of the field. In this work, we illustrate which ML-based methods are available for compound annotation in non-targeted MS experiments and provide a nuanced comparison of the ML models used in MS data analysis, unravelling their unique features and performance metrics. Through this overview we support researchers to judiciously apply these tools in their daily research. This review also offers a detailed exploration of methods and datasets to show gaps in current methods, and promising target areas, offering a starting point for developers intending to improve existing methodologies.
Collapse
Affiliation(s)
- Francesco F Russo
- Department of Analytical Chemistry and Reference Materials, Organic Trace Analysis and Food Analysis, Bundesanstalt für Materialforschung und -prüfung (BAM), Berlin, Germany
| | - Yannek Nowatzky
- eScience, Bundesanstalt für Materialprüfung und -forschung, Berlin, Germany
| | - Carsten Jaeger
- Department of Analytical Chemistry and Reference Materials, Environmental Analysis, Bundesanstalt für Materialforschung und -prüfung (BAM), Berlin, Germany
| | - Maria K Parr
- Institute of Pharmacy, Pharmaceutical and Medicinal Chemistry (Pharmaceutical Analyses), Freie Universität, Berlin, Germany
| | - Phillipp Benner
- eScience, Bundesanstalt für Materialprüfung und -forschung, Berlin, Germany
| | - Thilo Muth
- Department MF 2, Domain Specific Data Competence Centre, Robert Koch Institut, Berlin, Germany
| | - Jan Lisec
- Department of Analytical Chemistry and Reference Materials, Organic Trace Analysis and Food Analysis, Bundesanstalt für Materialforschung und -prüfung (BAM), Berlin, Germany
| |
Collapse
|
7
|
Nguyen J, Overstreet R, King E, Ciesielski D. Advancing the Prediction of MS/MS Spectra Using Machine Learning. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:2256-2266. [PMID: 39258761 DOI: 10.1021/jasms.4c00154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Tandem mass spectrometry (MS/MS) is an important tool for the identification of small molecules and metabolites where resultant spectra are most commonly identified by matching them with spectra in MS/MS reference libraries. While popular, this strategy is limited by the contents of existing reference libraries. In response to this limitation, various methods are being developed for the in silico generation of spectra to augment existing libraries. Recently, machine learning and deep learning techniques have been applied to predict spectra with greater speed and accuracy. Here, we investigate the challenges these algorithms face in achieving fast and accurate predictions on a wide range of small molecules. The challenges are often amplified by the use of generic machine learning benchmarking tactics, which lead to misleading accuracy scores. Curating data sets, only predicting spectra for sufficiently high collision energies, and working more closely with experimental mass spectrometrists are recommended strategies to improve overall prediction accuracy in this nuanced field.
Collapse
Affiliation(s)
- Julia Nguyen
- Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Richard Overstreet
- Signature Science and Technology Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Ethan King
- Computing and Analytics Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Danielle Ciesielski
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
8
|
Dutertre Q, Guy PA, Sutour S, Peitsch MC, Ivanov NV, Glauser G, von Reuss S. Identification of Granatane Alkaloids from Duboisia myoporoides (Solanaceae) using Molecular Networking and Semisynthesis. JOURNAL OF NATURAL PRODUCTS 2024; 87:1914-1920. [PMID: 39038492 PMCID: PMC11348422 DOI: 10.1021/acs.jnatprod.4c00304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 07/05/2024] [Accepted: 07/05/2024] [Indexed: 07/24/2024]
Abstract
The Solanaceae plant family contains at least 98 genera and over 2700 species. The Duboisia genus stands out for its ability to produce pyridine and tropane alkaloids, which are relatively poorly characterized at the phytochemical level. In this study, we analyzed dried leaves of Duboisia spp. using supercritical CO2 extraction and ultra-high-pressure liquid chromatography coupled to high-resolution tandem mass spectrometry, followed by feature-based molecular networking. Thirty-one known tropane alkaloids were putatively annotated, and the identity of six (atropine, scopolamine, anisodamine, aposcopolamine, apoatropine, and noratropine) were identified using reference standards. Two new granatane alkaloids connected in the molecular network were highlighted from Duboisia myoporoides, and their α-granatane tropate and α-granatane isovalerate structures were unambiguously established by semisynthesis.
Collapse
Affiliation(s)
- Quentin Dutertre
- Philip
Morris Product SA, Quai
Jeanrenaud 3, Neuchâtel 2000, Switzerland
- Laboratory
of Bioanalytical Chemistry, University of
Neuchâtel, Neuchâtel 2000, Switzerland
| | - Philippe A. Guy
- Philip
Morris Product SA, Quai
Jeanrenaud 3, Neuchâtel 2000, Switzerland
| | - Sylvain Sutour
- Neuchâtel
Platform of Analytical Chemistry (NPAC), University of Neuchâtel, Neuchâtel 2000, Switzerland
| | - Manuel C. Peitsch
- Philip
Morris Product SA, Quai
Jeanrenaud 3, Neuchâtel 2000, Switzerland
| | - Nikolai V. Ivanov
- Philip
Morris Product SA, Quai
Jeanrenaud 3, Neuchâtel 2000, Switzerland
| | - Gaetan Glauser
- Neuchâtel
Platform of Analytical Chemistry (NPAC), University of Neuchâtel, Neuchâtel 2000, Switzerland
| | - Stephan von Reuss
- Laboratory
of Bioanalytical Chemistry, University of
Neuchâtel, Neuchâtel 2000, Switzerland
- Neuchâtel
Platform of Analytical Chemistry (NPAC), University of Neuchâtel, Neuchâtel 2000, Switzerland
| |
Collapse
|
9
|
Yazdani A, Mendez-Giraldez R, Yazdani A, Wang RS, Schaid DJ, Kong SW, Hadi MR, Samiei A, Samiei E, Wittenbecher C, Lasky-Su J, Clish CB, Muehlschlegel JD, Marotta F, Loscalzo J, Mora S, Chasman DI, Larson MG, Elsea SH. Broadcasters, receivers, functional groups of metabolites, and the link to heart failure by revealing metabolomic network connectivity. Metabolomics 2024; 20:71. [PMID: 38972029 PMCID: PMC12060728 DOI: 10.1007/s11306-024-02141-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 06/10/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND AND OBJECTIVE Blood-based small molecule metabolites offer easy accessibility and hold significant potential for insights into health processes, the impact of lifestyle, and genetic variation on disease, enabling precise risk prevention. In a prospective study with records of heart failure (HF) incidence, we present metabolite profiling data from individuals without HF at baseline. METHODS We uncovered the interconnectivity of metabolites using data-driven and causal networks augmented with polygenic factors. Exploring the networks, we identified metabolite broadcasters, receivers, mediators, and subnetworks corresponding to functional classes of metabolites, and provided insights into the link between metabolomic architecture and regulation in health. We incorporated the network structure into the identification of metabolites associated with HF to control the effect of confounding metabolites. RESULTS We identified metabolites associated with higher and lower risk of HF incidence, such as glycine, ureidopropionic and glycocholic acids, and LPC 18:2. These associations were not confounded by the other metabolites due to uncovering the connectivity among metabolites and adjusting each association for the confounding metabolites. Examples of our findings include the direct influence of asparagine on glycine, both of which were inversely associated with HF. These two metabolites were influenced by polygenic factors and only essential amino acids, which are not synthesized in the human body and are obtained directly from the diet. CONCLUSION Metabolites may play a critical role in linking genetic background and lifestyle factors to HF incidence. Revealing the underlying connectivity of metabolites associated with HF strengthens the findings and facilitates studying complex conditions like HF.
Collapse
Affiliation(s)
- Azam Yazdani
- Division of Preventive Medicine, Department of Medicine, Brigham Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
- Harvard Data Science Initiative, The Broad Institute, Harvard Medical School, Boston, USA
| | | | - Akram Yazdani
- Division of Clinical and Translational Sciences, Department of Internal Medicine, The University of Texas Health Science Center at Houston, McGovern Medical School, Houston, USA
| | - Rui-Sheng Wang
- Department of Medicine, Brigham Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Daniel J. Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, USA
| | - Sek Won Kong
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
| | - M. Reza Hadi
- School of Mathematics, University of Science and Technology of Iran, Tehran, Iran
| | - Ahmad Samiei
- Division of Pulmonary Medicine, Boston Children’s Hospital, Boston, USA
| | | | - Clemens Wittenbecher
- Division of Food and Nutrition Science, Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Jessica Lasky-Su
- Department of Medicine, Brigham Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | | | | | - Francesco Marotta
- ReGenera R&D International for Aging Intervention and Vitality & Longevity Medical Science Commission, Femtec, Milano, Italy
| | - Joseph Loscalzo
- The Division of Cardiovascular Medicine, Department of Medicine, Brigham Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Samia Mora
- Department of Medicine, Brigham Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Daniel I. Chasman
- Department of Medicine, Brigham Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Martin G. Larson
- Department of Biostatistics, Boston University, Boston, MA 02118, USA
| | | |
Collapse
|
10
|
Yazdani A. WITHDRAWN: Broadcasters, receivers, functional groups of metabolites and the link to heart failure using polygenic factors. RESEARCH SQUARE 2024:rs.3.rs-3272974. [PMID: 37674714 PMCID: PMC10479558 DOI: 10.21203/rs.3.rs-3272974/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
The full text of this preprint has been withdrawn, as it was submitted in error. Therefore, the authors do not wish this work to be cited as a reference. Questions should be directed to the corresponding author.
Collapse
|
11
|
Yazdani A. WITHDRAWN: Broadcasters, receivers, functional groups of metabolites and the link to heart failure using polygenic factors. RESEARCH SQUARE 2024:rs.3.rs-3272974. [PMID: 37674714 PMCID: PMC10479558 DOI: 10.21203/rs.3.rs-3272974/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
The full text of this preprint has been withdrawn, as it was submitted in error. Therefore, the authors do not wish this work to be cited as a reference. Questions should be directed to the corresponding author.
Collapse
|
12
|
Yazdani A, Mendez-Giraldez R, Yazdani A, Schaid D, Won Kong S, Hadi M, Samiei A, Wittenbecher C, Lasky-Su J, Clish C, Marotta F, Kosorok M, Mora S, Muehlschlegel J, Chasman D, Larson M, Elsea S. Broadcasters, receivers, functional groups of metabolites and the link to heart failure progression using polygenic factors. RESEARCH SQUARE 2023:rs.3.rs-3246406. [PMID: 37645766 PMCID: PMC10462252 DOI: 10.21203/rs.3.rs-3246406/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
In a prospective study with records of heart failure (HF) incidence, we present metabolite profiling data from individuals without HF at baseline. We uncovered the interconnectivity of metabolites using data-driven and causal networks augmented with polygenic factors. Exploring the networks, we identified metabolite broadcasters, receivers, mediators, and subnetworks corresponding to functional classes of metabolites, and provided insights into the link between metabolomic architecture and regulation in health. We incorporated the network structure into the identification of metabolites associated with HF to control the effect of confounding metabolites. We identified metabolites associated with higher or lower risk of HF incidence, the associations that were not confounded by the other metabolites, such as glycine, ureidopropionic and glycocholic acids, and LPC 18:2. We revealed the underlying relationships of the findings. For example, asparagine directly influenced glycine, and both were inversely associated with HF. These two metabolites were influenced by polygenic factors and only essential amino acids which are not synthesized in the human body and come directly from the diet. Metabolites may play a critical role in linking genetic background and lifestyle factors to HF progression. Revealing the underlying connectivity of metabolites associated with HF strengthens the findings and facilitates a mechanistic understanding of HF progression.
Collapse
Affiliation(s)
| | | | - Akram Yazdani
- Division of Clinical and Translational Sciences, Department of Internal Medicine, at The University of Texas Health Science Center at Houston, McGovern Medical School
| | - Daniel Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902
| | | | - Mohamad Hadi
- School of Mathematics, University of science and technology of Iran, Tehran
| | - Ahmad Samiei
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | | | | | | | | | | | - Samia Mora
- Brigham and Women's Hospital and Harvard Medical School
| | | | | | | | | |
Collapse
|
13
|
Karunaratne E, Hill DW, Dührkop K, Böcker S, Grant DF. Combining Experimental with Computational Infrared and Mass Spectra for High-Throughput Nontargeted Chemical Structure Identification. Anal Chem 2023; 95:11901-11907. [PMID: 37540774 DOI: 10.1021/acs.analchem.3c00937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2023]
Abstract
The inability to identify the structures of most metabolites detected in environmental or biological samples limits the utility of nontargeted metabolomics. The most widely used analytical approaches combine mass spectrometry and machine learning methods to rank candidate structures contained in large chemical databases. Given the large chemical space typically searched, the use of additional orthogonal data may improve the identification rates and reliability. Here, we present results of combining experimental and computational mass and IR spectral data for high-throughput nontargeted chemical structure identification. Experimental MS/MS and gas-phase IR data for 148 test compounds were obtained from NIST. Candidate structures for each of the test compounds were obtained from PubChem (mean = 4444 candidate structures per test compound). Our workflow used CSI:FingerID to initially score and rank the candidate structures. The top 1000 ranked candidates were subsequently used for IR spectra prediction, scoring, and ranking using density functional theory (DFT-IR). Final ranking of the candidates was based on a composite score calculated as the average of the CSI:FingerID and DFT-IR rankings. This approach resulted in the correct identification of 88 of the 148 test compounds (59%). 129 of the 148 test compounds (87%) were ranked within the top 20 candidates. These identification rates are the highest yet reported when candidate structures are used from PubChem. Combining experimental and computational MS/MS and IR spectral data is a potentially powerful option for prioritizing candidates for final structure verification.
Collapse
Affiliation(s)
- Erandika Karunaratne
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States
| | - Dennis W Hill
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States
| | - Kai Dührkop
- Chair for Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena 07743, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena 07743, Germany
| | - David F Grant
- Department of Pharmaceutical Sciences, University of Connecticut, Storrs, Connecticut 06269, United States
| |
Collapse
|
14
|
Boelrijk J, van Herwerden D, Ensing B, Forré P, Samanipour S. Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data. J Cheminform 2023; 15:28. [PMID: 36829215 PMCID: PMC9960388 DOI: 10.1186/s13321-023-00699-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 02/13/2023] [Indexed: 02/26/2023] Open
Abstract
Non-target analysis combined with liquid chromatography high resolution mass spectrometry is considered one of the most comprehensive strategies for the detection and identification of known and unknown chemicals in complex samples. However, many compounds remain unidentified due to data complexity and limited number structures in chemical databases. In this work, we have developed and validated a novel machine learning algorithm to predict the retention index (r[Formula: see text]) values for structurally (un)known chemicals based on their measured fragmentation pattern. The developed model, for the first time, enabled the predication of r[Formula: see text] values without the need for the exact structure of the chemicals, with an [Formula: see text] of 0.91 and 0.77 and root mean squared error (RMSE) of 47 and 67 r[Formula: see text] units for the NORMAN ([Formula: see text]) and amide ([Formula: see text]) test sets, respectively. This fragment based model showed comparable accuracy in r[Formula: see text] prediction compared to conventional descriptor-based models that rely on known chemical structure, which obtained an [Formula: see text] of 0.85 with an RMSE of 67.
Collapse
Affiliation(s)
- Jim Boelrijk
- AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands. .,Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands.
| | - Denice van Herwerden
- grid.7177.60000000084992262Van’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands
| | - Bernd Ensing
- grid.7177.60000000084992262AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands ,Computational Chemistry Group, Van’t Hoff Institute for Molecular Sciences (HIMS), Amsterdam, The Netherlands
| | - Patrick Forré
- grid.7177.60000000084992262AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands ,grid.7177.60000000084992262Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands
| | - Saer Samanipour
- Van't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands. .,UvA Data Science Center, University of Amsterdam, Amsterdam, The Netherlands. .,Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Woolloongabba, Australia.
| |
Collapse
|
15
|
Huang YJ, Mukherjee R, Hsiao CK. Probabilistic edge inference of gene networks with markov random field-based bayesian learning. Front Genet 2022; 13:1034946. [DOI: 10.3389/fgene.2022.1034946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/24/2022] [Indexed: 11/11/2022] Open
Abstract
Current algorithms for gene regulatory network construction based on Gaussian graphical models focuses on the deterministic decision of whether an edge exists. Both the probabilistic inference of edge existence and the relative strength of edges are often overlooked, either because the computational algorithms cannot account for this uncertainty or because it is not straightforward in implementation. In this study, we combine the Bayesian Markov random field and the conditional autoregressive (CAR) model to tackle simultaneously these two tasks. The uncertainty of edge existence and the relative strength of edges can be measured and quantified based on a Bayesian model such as the CAR model and the spike-and-slab lasso prior. In addition, the strength of the edges can be utilized to prioritize the importance of the edges in a network graph. Simulations and a glioblastoma cancer study were carried out to assess the proposed model’s performance and to compare it with existing methods when a binary decision is of interest. The proposed approach shows stable performance and may provide novel structures with biological insights.
Collapse
|
16
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
17
|
Kuhn S, Tumer E, Colreavy-Donnelly S, Moreira Borges R. A pilot study for fragment identification using 2D NMR and deep learning. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022; 60:1052-1060. [PMID: 34480494 DOI: 10.1002/mrc.5212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 06/05/2021] [Accepted: 08/23/2021] [Indexed: 06/13/2023]
Abstract
This paper presents a proof of concept of a method to identify substructures in 2D NMR spectra of mixtures using a bespoke image-based convolutional neural network application. This is done using HSQC and HMBC spectra separately and in combination. The application can reliably detect substructures in pure compounds, using a simple network. Results indicate that it can work for mixtures when trained on pure compounds only. HMBC data and the combination of HMBC and HSQC show better results than HSQC alone in this pilot study.
Collapse
Affiliation(s)
- Stefan Kuhn
- School of Computer Science and Informatics, De Montfort University, Leicester, UK
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | | | | | - Ricardo Moreira Borges
- Instituto de Pesquisas de Produtos Naturais Walter Mors, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
18
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022; 51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 1306] [Impact Index Per Article: 435.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;
| |
Collapse
|
19
|
MSNovelist: de novo structure generation from mass spectra. Nat Methods 2022; 19:865-870. [PMID: 35637304 PMCID: PMC9262714 DOI: 10.1038/s41592-022-01486-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 04/07/2022] [Indexed: 12/29/2022]
Abstract
Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder–decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS2) spectra. In an evaluation with 3,863 MS2 spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS2 dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds. MSNovelist combines fingerprint prediction with an encoder–decoder neural network for de novo structure generation of small molecules from mass spectra.
Collapse
|
20
|
Abstract
Motivation Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases, allowing us to overcome this limitation. The best-performing in silico methods use machine learning to predict a molecular fingerprint from tandem mass spectra, then use the predicted fingerprint to search in a molecular structure database. Predicted molecular fingerprints are also of great interest for compound class annotation, de novo structure elucidation, and other tasks. So far, kernel support vector machines are the best tool for fingerprint prediction. However, they cannot be trained on all publicly available reference spectra because their training time scales cubically with the number of training data. Results We use the Nyström approximation to transform the kernel into a linear feature map. We evaluate two methods that use this feature map as input: a linear support vector machine and a deep neural network (DNN). For evaluation, we use a cross-validated dataset of 156 017 compounds and three independent datasets with 1734 compounds. We show that the combination of kernel method and DNN outperforms the kernel support vector machine, which is the current gold standard, as well as a DNN on tandem mass spectra on all evaluation datasets. Availability and implementation The deep kernel learning method for fingerprint prediction is part of the SIRIUS software, available at https://bio.informatik.uni-jena.de/software/sirius.
Collapse
Affiliation(s)
- Kai Dührkop
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
21
|
High-confidence structural annotation of metabolites absent from spectral libraries. Nat Biotechnol 2021; 40:411-421. [PMID: 34650271 PMCID: PMC8926923 DOI: 10.1038/s41587-021-01045-9] [Citation(s) in RCA: 131] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 08/04/2021] [Indexed: 12/14/2022]
Abstract
Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.
Collapse
|
22
|
Wang F, Liigand J, Tian S, Arndt D, Greiner R, Wishart DS. CFM-ID 4.0: More Accurate ESI-MS/MS Spectral Prediction and Compound Identification. Anal Chem 2021; 93:11692-11700. [PMID: 34403256 PMCID: PMC9064193 DOI: 10.1021/acs.analchem.1c01465] [Citation(s) in RCA: 180] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
In the field of metabolomics, mass spectrometry (MS) is the method most commonly used for identifying and annotating metabolites. As this typically involves matching a given MS spectrum against an experimentally acquired reference spectral library, this approach is limited by the coverage and size of such libraries (which typically number in the thousands). These experimental libraries can be greatly extended by predicting the MS spectra of known chemical structures (which number in the millions) to create computational reference spectral libraries. To facilitate the generation of predicted spectral reference libraries, we developed CFM-ID, a computer program that can accurately predict ESI-MS/MS spectrum for a given compound structure. CFM-ID is one of the best-performing methods for compound-to-mass-spectrum prediction and also one of the top tools for in silico mass-spectrum-to-compound identification. This work improves CFM-ID's ability to predict ESI-MS/MS spectra from compounds by (1) learning parameters from features based on the molecular topology, (2) adding a new approach to ring cleavage that models such cleavage as a sequence of simple chemical bond dissociations, and (3) expanding its hand-written rule-based predictor to cover more chemical classes, including acylcarnitines, acylcholines, flavonols, flavones, flavanones, and flavonoid glycosides. We demonstrate that this new version of CFM-ID (version 4.0) is significantly more accurate than previous CFM-ID versions in terms of both EI-MS/MS spectral prediction and compound identification. CFM-ID 4.0 is available at http://cfmid4.wishartlab.com/ as a web server and docker images can be downloaded at https://hub.docker.com/r/wishartlab/cfmid.
Collapse
Affiliation(s)
- Fei Wang
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1, Canada
| | - Jaanus Liigand
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Institute of Chemistry, University of Tartu, Tartu 50411, Estonia
| | - Siyang Tian
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - David Arndt
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
| | - Russell Greiner
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Department of Psychiatry, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Alberta Machine Intelligence Institute, Edmonton, AB T5J 3B1, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
23
|
Collins SL, Koo I, Peters JM, Smith PB, Patterson AD. Current Challenges and Recent Developments in Mass Spectrometry-Based Metabolomics. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2021; 14:467-487. [PMID: 34314226 DOI: 10.1146/annurev-anchem-091620-015205] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
High-resolution mass spectrometry (MS) has advanced the study of metabolism in living systems by allowing many metabolites to be measured in a single experiment. Although improvements in mass detector sensitivity have facilitated the detection of greater numbers of analytes, compound identification strategies, feature reduction software, and data sharing have not kept up with the influx of MS data. Here, we discuss the ongoing challenges with MS-based metabolomics, including de novo metabolite identification from mass spectra, differentiation of metabolites from environmental contamination, chromatographic separation of isomers, and incomplete MS databases. Because of their popularity and sensitive detection of small molecules, this review focuses on the challenges of liquid chromatography-mass spectrometry-based methods. We then highlight important instrumentational, experimental, and computational tools that have been created to address these challenges and how they have enabled the advancement of metabolomics research.
Collapse
Affiliation(s)
- Stephanie L Collins
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Imhoi Koo
- Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Jeffrey M Peters
- Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
| | - Philip B Smith
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Andrew D Patterson
- Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
| |
Collapse
|
24
|
Tinte MM, Chele KH, van der Hooft JJJ, Tugizimana F. Metabolomics-Guided Elucidation of Plant Abiotic Stress Responses in the 4IR Era: An Overview. Metabolites 2021; 11:445. [PMID: 34357339 PMCID: PMC8305945 DOI: 10.3390/metabo11070445] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 06/30/2021] [Accepted: 07/03/2021] [Indexed: 12/27/2022] Open
Abstract
Plants are constantly challenged by changing environmental conditions that include abiotic stresses. These are limiting their development and productivity and are subsequently threatening our food security, especially when considering the pressure of the increasing global population. Thus, there is an urgent need for the next generation of crops with high productivity and resilience to climate change. The dawn of a new era characterized by the emergence of fourth industrial revolution (4IR) technologies has redefined the ideological boundaries of research and applications in plant sciences. Recent technological advances and machine learning (ML)-based computational tools and omics data analysis approaches are allowing scientists to derive comprehensive metabolic descriptions and models for the target plant species under specific conditions. Such accurate metabolic descriptions are imperatively essential for devising a roadmap for the next generation of crops that are resilient to environmental deterioration. By synthesizing the recent literature and collating data on metabolomics studies on plant responses to abiotic stresses, in the context of the 4IR era, we point out the opportunities and challenges offered by omics science, analytical intelligence, computational tools and big data analytics. Specifically, we highlight technological advancements in (plant) metabolomics workflows and the use of machine learning and computational tools to decipher the dynamics in the chemical space that define plant responses to abiotic stress conditions.
Collapse
Affiliation(s)
- Morena M. Tinte
- Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa; (M.M.T.); (K.H.C.)
| | - Kekeletso H. Chele
- Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa; (M.M.T.); (K.H.C.)
| | | | - Fidele Tugizimana
- Department of Biochemistry, University of Johannesburg, Auckland Park, Johannesburg 2006, South Africa; (M.M.T.); (K.H.C.)
- International Research and Development Division, Omnia Group, Ltd., Johannesburg 2021, South Africa
| |
Collapse
|
25
|
Mass spectrometry based untargeted metabolomics for plant systems biology. Emerg Top Life Sci 2021; 5:189-201. [DOI: 10.1042/etls20200271] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 02/04/2021] [Accepted: 02/22/2021] [Indexed: 12/12/2022]
Abstract
Untargeted metabolomics enables the identification of key changes to standard pathways, but also aids in revealing other important and possibly novel metabolites or pathways for further analysis. Much progress has been made in this field over the past decade and yet plant metabolomics seems to still be an emerging approach because of the high complexity of plant metabolites and the number one challenge of untargeted metabolomics, metabolite identification. This final and critical stage remains the focus of current research. The intention of this review is to give a brief current state of LC–MS based untargeted metabolomics approaches for plant specific samples and to review the emerging solutions in mass spectrometer hardware and computational tools that can help predict a compound's molecular structure to improve the identification rate.
Collapse
|
26
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 2021; 49:D1388-D1395. [PMID: 33151290 DOI: 10.1093/nar/gkaa971(2020)] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/06/2020] [Accepted: 10/11/2020] [Indexed: 05/28/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| |
Collapse
|
27
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 2021; 49:D1388-D1395. [PMID: 33151290 PMCID: PMC7778930 DOI: 10.1093/nar/gkaa971] [Citation(s) in RCA: 2091] [Impact Index Per Article: 522.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/06/2020] [Accepted: 10/11/2020] [Indexed: 02/06/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| |
Collapse
|
28
|
|
29
|
Ludwig M, Nothias LF, Dührkop K, Koester I, Fleischauer M, Hoffmann MA, Petras D, Vargas F, Morsy M, Aluwihare L, Dorrestein PC, Böcker S. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-00234-6] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
30
|
Li Y, Kuhn M, Gavin AC, Bork P. Identification of metabolites from tandem mass spectra with a machine learning approach utilizing structural features. Bioinformatics 2020; 36:1213-1218. [PMID: 31605112 PMCID: PMC7703789 DOI: 10.1093/bioinformatics/btz736] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 07/30/2019] [Accepted: 09/25/2019] [Indexed: 01/11/2023] Open
Abstract
Motivation Untargeted mass spectrometry (MS/MS) is a powerful method for detecting metabolites in biological samples. However, fast and accurate identification of the metabolites’ structures from MS/MS spectra is still a great challenge. Results We present a new analysis method, called SubFragment-Matching (SF-Matching) that is based on the hypothesis that molecules with similar structural features will exhibit similar fragmentation patterns. We combine information on fragmentation patterns of molecules with shared substructures and then use random forest models to predict whether a given structure can yield a certain fragmentation pattern. These models can then be used to score candidate molecules for a given mass spectrum. For rapid identification, we pre-compute such scores for common biological molecular structure databases. Using benchmarking datasets, we find that our method has similar performance to CSI: FingerID and those very high accuracies can be achieved by combining our method with CSI: FingerID. Rarefaction analysis of the training dataset shows that the performance of our method will increase as more experimental data become available. Availability and implementation SF-Matching is available from http://www.bork.embl.de/Docu/sf_matching. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuanyue Li
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Michael Kuhn
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Anne-Claude Gavin
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit (MMPU), 69117 Heidelberg, Germany
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Molecular Medicine Partnership Unit (MMPU), 69117 Heidelberg, Germany.,Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany.,Department of Bioinformatics, Biocenter, University of Würzburg, 97074 Würzburg, Germany
| |
Collapse
|
31
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part I: Progress. Angew Chem Int Ed Engl 2020; 59:22858-22893. [DOI: 10.1002/anie.201909987] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/05/2023]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
32
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil I: Fortschritt. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909987] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
33
|
Leavell MD, Singh AH, Kaufmann-Malaga BB. High-throughput screening for improved microbial cell factories, perspective and promise. Curr Opin Biotechnol 2020; 62:22-28. [DOI: 10.1016/j.copbio.2019.07.002] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 07/24/2019] [Accepted: 07/27/2019] [Indexed: 01/11/2023]
|
34
|
Abstract
SIRIUS 4 is the best-in-class computational tool for metabolite identification from high-resolution tandem mass spectrometry data. It offers de novo molecular formula annotation with outstanding accuracy. When searching fragmentation spectra in a structure database, it reaches over 70% correct identifications. A predicted fingerprint, which indicates the presence or absence of thousands of molecular properties, helps to deduce information about the compound of interest even if it is not contained in any structure database. Here, we present best practices and describe how to leverage the full potential of SIRIUS 4, how to incorporate it into your own workflow, and how it adds value to the analysis of mass spectrometry data beyond spectral library search.
Collapse
|
35
|
SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat Methods 2019; 16:299-302. [PMID: 30886413 DOI: 10.1038/s41592-019-0344-8] [Citation(s) in RCA: 933] [Impact Index Per Article: 155.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 02/06/2019] [Indexed: 12/17/2022]
Abstract
Mass spectrometry is a predominant experimental technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (https://bio.informatik.uni-jena.de/sirius/), which provides a fast computational approach for molecular structure identification. SIRIUS 4 integrates CSI:FingerID for searching in molecular structure databases. Using SIRIUS 4, we achieved identification rates of more than 70% on challenging metabolomics datasets.
Collapse
|
36
|
Misra BB, Mohapatra S. Tools and resources for metabolomics research community: A 2017-2018 update. Electrophoresis 2018; 40:227-246. [PMID: 30443919 DOI: 10.1002/elps.201800428] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Revised: 11/09/2018] [Accepted: 11/09/2018] [Indexed: 01/09/2023]
Abstract
The scale at which MS- and NMR-based platforms generate metabolomics datasets for both research, core, and clinical facilities to address challenges in the various sciences-ranging from biomedical to agricultural-is underappreciated. Thus, metabolomics efforts spanning microbe, environment, plant, animal, and human systems have led to continual and concomitant growth of in silico resources for analysis and interpretation of these datasets. These software tools, resources, and databases drive the field forward to help keep pace with the amount of data being generated and the sophisticated and diverse analytical platforms that are being used to generate these metabolomics datasets. To address challenges in data preprocessing, metabolite annotation, statistical interrogation, visualization, interpretation, and integration, the metabolomics and informatics research community comes up with hundreds of tools every year. The purpose of the present review is to provide a brief and useful summary of more than 95 metabolomics tools, software, and databases that were either developed or significantly improved during 2017-2018. We hope to see this review help readers, developers, and researchers to obtain informed access to these thorough lists of resources for further improvisation, implementation, and application in due course of time.
Collapse
Affiliation(s)
- Biswapriya B Misra
- Department of Internal Medicine, Section of Molecular Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| | | |
Collapse
|