Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Nguyen DH, Nguyen CH, Mamitsuka H. ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra. Bioinformatics 2020;35:i164-i172. [PMID: 31510641 PMCID: PMC6612897 DOI: 10.1093/bioinformatics/btz319] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

For:	Nguyen DH, Nguyen CH, Mamitsuka H. ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra. Bioinformatics 2020;35:i164-i172. [PMID: 31510641 PMCID: PMC6612897 DOI: 10.1093/bioinformatics/btz319] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Number

Cited by Other Article(s)

Liu Y, De Vijlder T, Bittremieux W, Laukens K, Heyndrickx W. Current and future deep learning algorithms for tandem mass spectrometry (MS/MS)-based small molecule structure elucidation. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2025;39 Suppl 1:e9120. [PMID: 33955607 DOI: 10.1002/rcm.9120] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 04/13/2021] [Accepted: 04/29/2021] [Indexed: 06/12/2023]

Abstract

RATIONALE

Structure elucidation of small molecules has been one of the cornerstone applications of mass spectrometry for decades. Despite the increasing availability of software tools, structure elucidation from tandem mass spectrometry (MS/MS) data remains a challenging task, leaving many spectra unidentified. However, as an increasing number of reference MS/MS spectra are being curated at a repository scale and shared on public servers, there is an exciting opportunity to develop powerful new deep learning (DL) models for automated structure elucidation.

ARCHITECTURES

Recent early-stage DL frameworks mostly follow a "two-step approach" that translates MS/MS spectra to database structures after first predicting molecular descriptors. The related architectures could suffer from: (1) computational complexity because of the separate training of descriptor-specific classifiers, (2) the high dimensional nature of mass spectral data and information loss due to data preprocessing, (3) low substructure coverage and class imbalance problem of predefined molecular fingerprints. Inspired by successful DL frameworks employed in drug discovery fields, we have conceptualized and designed hypothetical DL architectures to tackle the above issues. For (1), we recommend multitask learning to achieve better performance with fewer classifiers by grouping structurally related descriptors. For (2) and (3), we introduce feature engineering to extract condensed and higher-order information from spectra and structure data. For instance, encoding spectra with subtrees and pre-calculated spectral patterns add peak interactions to the model input. Encoding structures with graph convolutional networks incorporates connectivity within a molecule. The joint embedding of spectra and structures can enable simultaneous spectral library and molecular database search.

CONCLUSIONS

In principle, given enough training data, adapted DL architectures, optimal hyperparameters and computing power, DL frameworks can predict small molecule structures, completely or at least partially, from MS/MS spectra. However, their performance and general applicability should be fairly evaluated against classical machine learning frameworks.

Collapse

Zheng F, You L, Zhao X, Lu X, Xu G. Predicting Tandem Mass Spectra of Small Molecules Using Graph Embedding of Precursor-Product Ion Pair Graph. Anal Chem 2024;96:19190-19195. [PMID: 39575948 DOI: 10.1021/acs.analchem.4c04375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]

Russo FF, Nowatzky Y, Jaeger C, Parr MK, Benner P, Muth T, Lisec J. Machine learning methods for compound annotation in non-targeted mass spectrometry-A brief overview of fingerprinting, in silico fragmentation and de novo methods. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2024;38:e9876. [PMID: 39180507 DOI: 10.1002/rcm.9876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/03/2024] [Accepted: 07/12/2024] [Indexed: 08/26/2024]

Beck A, Muhoberac M, Randolph CE, Beveridge CH, Wijewardhane PR, Kenttämaa HI, Chopra G. Recent Developments in Machine Learning for Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024;4:233-246. [PMID: 38910862 PMCID: PMC11191731 DOI: 10.1021/acsmeasuresciau.3c00060] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 12/27/2023] [Accepted: 01/22/2024] [Indexed: 06/25/2024]

Sandström H, Rissanen M, Rousu J, Rinke P. Data-Driven Compound Identification in Atmospheric Mass Spectrometry. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024;11:e2306235. [PMID: 38095508 PMCID: PMC10885664 DOI: 10.1002/advs.202306235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/04/2023] [Indexed: 02/24/2024]

Heid E, Greenman KP, Chung Y, Li SC, Graff DE, Vermeire FH, Wu H, Green WH, McGill CJ. Chemprop: A Machine Learning Package for Chemical Property Prediction. J Chem Inf Model 2024;64:9-17. [PMID: 38147829 PMCID: PMC10777403 DOI: 10.1021/acs.jcim.3c01250] [Citation(s) in RCA: 94] [Impact Index Per Article: 94.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 12/28/2023]

Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00577-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Ljoncheva M, Stepišnik T, Kosjek T, Džeroski S. Machine learning for identification of silylated derivatives from mass spectra. J Cheminform 2022;14:62. [PMID: 36109826 PMCID: PMC9476372 DOI: 10.1186/s13321-022-00636-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 07/31/2022] [Indexed: 11/10/2022] Open

Abstract Abstract Motivation Compound structure identification is using increasingly more sophisticated computational tools, among which machine learning tools are a recent addition that quickly gains in importance. These tools, of which the method titled Compound Structure Identification:Input Output Kernel Regression (CSI:IOKR) is an excellent example, have been used to elucidate compound structure from mass spectral (MS) data with significant accuracy, confidence and speed. They have, however, largely focused on data coming from liquid chromatography coupled to tandem mass spectrometry (LC–MS). Gas chromatography coupled to mass spectrometry (GC–MS) is an alternative which offers several advantages as compared to LC–MS, including higher data reproducibility. Of special importance is the substantial compound coverage offered by GC–MS, further expanded by derivatization procedures, such as silylation, which can improve the volatility, thermal stability and chromatographic peak shape of semi-volatile analytes. Despite these advantages and the increasing size of compound databases and MS libraries, GC–MS data have not yet been used by machine learning approaches to compound structure identification. Results This study presents a successful application of the CSI:IOKR machine learning method for the identification of environmental contaminants from GC–MS spectra. We use CSI:IOKR as an alternative to exhaustive search of MS libraries, independent of instrumental platform and data processing software. We use a comprehensive dataset of GC–MS spectra of trimethylsilyl derivatives and their molecular structures, derived from a large commercially available MS library, to train a model that maps between spectra and molecular structures. We test the learned model on a different dataset of GC–MS spectra of trimethylsilyl derivatives of environmental contaminants, generated in-house and made publicly available. The results show that 37% (resp. 50%) of the tested compounds are correctly ranked among the top 10 (resp. 20) candidate compounds suggested by the model. Even though spectral comparisons with reference standards or de novo structural elucidations are neccessary to validate the predictions, machine learning provides efficient candidate prioritization and reduction of the time spent for compound annotation. Collapse

Tian Z, Liu F, Li D, Fernie AR, Chen W. Strategies for structure elucidation of small molecules based on LC–MS/MS data from complex biological samples. Comput Struct Biotechnol J 2022;20:5085-5097. [PMID: 36187931 PMCID: PMC9489805 DOI: 10.1016/j.csbj.2022.09.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 09/03/2022] [Accepted: 09/03/2022] [Indexed: 11/06/2022] Open

Bach E, Rogers S, Williamson J, Rousu J. Probabilistic framework for integration of mass spectrum and retention time information in small molecule identification. Bioinformatics 2021;37:1724-1731. [PMID: 33244585 PMCID: PMC8289373 DOI: 10.1093/bioinformatics/btaa998] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 10/27/2020] [Accepted: 11/17/2020] [Indexed: 11/14/2022] Open

Krettler CA, Thallinger GG. A map of mass spectrometry-based in silico fragmentation prediction and compound identification in metabolomics. Brief Bioinform 2021;22:6184408. [PMID: 33758925 DOI: 10.1093/bib/bbab073] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/29/2021] [Accepted: 02/12/2021] [Indexed: 12/27/2022] Open

Perez De Souza L, Alseekh S, Brotman Y, Fernie AR. Network-based strategies in metabolomics data analysis and interpretation: from molecular networking to biological interpretation. Expert Rev Proteomics 2020;17:243-255. [PMID: 32380880 DOI: 10.1080/14789450.2020.1766975] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

O'Shea K, Misra BB. Software tools, databases and resources in metabolomics: updates from 2018 to 2019. Metabolomics 2020;16:36. [PMID: 32146531 DOI: 10.1007/s11306-020-01657-3] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 03/01/2020] [Indexed: 12/24/2022]

González-Riano C, Dudzik D, Garcia A, Gil-de-la-Fuente A, Gradillas A, Godzien J, López-Gonzálvez Á, Rey-Stolle F, Rojo D, Ruperez FJ, Saiz J, Barbas C. Recent Developments along the Analytical Process for Metabolomics Workflows. Anal Chem 2019;92:203-226. [PMID: 31625723 DOI: 10.1021/acs.analchem.9b04553] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Affiliation(s)

Carolina González-Riano Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain
Danuta Dudzik Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain.,Department of Biopharmaceutics and Pharmacodynamics, Faculty of Pharmacy , Medical University of Gdańsk , 80-210 Gdańsk , Poland
Antonia Garcia Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain
Alberto Gil-de-la-Fuente Department of Information Technology, Escuela Politécnica Superior , Universidad San Pablo-CEU , 28003 Madrid , Spain
Ana Gradillas Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain
Joanna Godzien Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain.,Clinical Research Centre , Medical University of Bialystok , 15-089 Bialystok , Poland
Ángeles López-Gonzálvez Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain
Fernanda Rey-Stolle Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain
David Rojo Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain
Francisco J Ruperez Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain
Jorge Saiz Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain
Coral Barbas Centre for Metabolomics and Bioanalysis (CEMBIO), Chemistry and Biochemistry Department, Pharmacy Faculty , Universidad San Pablo-CEU , Boadilla del Monte , 28668 Madrid , Spain

Collapse

Brouard C, Bassé A, d'Alché-Buc F, Rousu J. Improved Small Molecule Identification through Learning Combinations of Kernel Regression Models. Metabolites 2019;9:E160. [PMID: 31374904 PMCID: PMC6724104 DOI: 10.3390/metabo9080160] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 07/30/2019] [Accepted: 07/31/2019] [Indexed: 01/15/2023] Open