1
|
Zakir M, LeVatte MA, Wishart DS. RT-Pred: A web server for accurate, customized liquid chromatography retention time prediction of chemicals. J Chromatogr A 2025; 1747:465816. [PMID: 40023050 DOI: 10.1016/j.chroma.2025.465816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Revised: 02/21/2025] [Accepted: 02/23/2025] [Indexed: 03/04/2025]
Abstract
High-performance liquid chromatography (HPLC) together with mass spectrometry (MS) is routinely used to separate, identify and quantify chemicals. HPLC data also provides retention time (RT) which can be aligned with structural data. Recent developments in machine learning (ML) have improved our ability to predict RTs from known or postulated chemical structures, allowing RT data to be used more effectively in LC-MS-based compound identification. However, RT data is highly specific to each chromatographic method (CM) and hundreds of different CMs with interdependent parameters are used in separations. This has limited the application of ML-based RT predictions in compound identification. Here we introduce an easy-to-use RT prediction webserver (called RT-Pred) that predicts RTs for molecules across most chromatographic setups. RT-Pred not only supports its own in-house CM-specific RT predictors, it allows users to easily train a custom RT-Pred model using their own RT data on their own CM and to predict RTs with that custom model. RT-Pred also supports RT and compound searches against its own database of millions of predicted RTs spanning >40 different CMs. RT-Pred is also uniquely capable of accurately identifying compounds that will elute in the void volume or be retained on the column. Including this void/retained/eluted classifier significantly improves RT-Pred's performance. Tests indicate that RT-Pred had an average coefficient of determination (R²) of 0.95 over 20 different CMs. Comparisons of RT-Pred against other RT predictors showed that RT-Pred achieved lower mean absolute errors and higher R² scores than any other published RT predictor. RT-Pred is freely available at https://rtpred.ca.
Collapse
Affiliation(s)
- Mahi Zakir
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | - Marcia A LeVatte
- Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada; Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada; Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada.
| |
Collapse
|
2
|
Stienstra CMK, Nazdrajić E, Hopkins WS. From Reverse Phase Chromatography to HILIC: Graph Transformers Power Method-Independent Machine Learning of Retention Times. Anal Chem 2025; 97:4461-4472. [PMID: 39972614 DOI: 10.1021/acs.analchem.4c05859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Liquid chromatography (LC) is a cornerstone of analytical separations, but comparing the retention times (RTs) across different LC methods is challenging because of variations in experimental parameters such as column type and solvent gradient. Nevertheless, RTs are powerful metrics in tandem mass spectrometry (MS2) that can reduce false positive rates for metabolite annotation, differentiate isobaric species, and improve peptide identification. Here, we present Graphormer-RT, a novel graph transformer that performs the first single-model method-independent prediction of RTs. We use the RepoRT data set, which contains 142,688 reverse phase (RP) RTs (from 191 methods) and 4,373 HILIC RTs (from 49 methods). Our best RP model (trained and tested on 191 methods) achieved a test set mean average error (MAE) of 29.3 ± 0.6 s, comparable performance to the state-of-the-art model which was only trained on a single LC method. Our best-performing HILIC model achieved a test MAE = 42.4 ± 2.9 s. We expect that Graphormer-RT can be used as an LC "foundation model", where transfer learning can reduce the amount of training data needed for highly accurate "specialist" models applied to method-specific RP and HILIC tasks. These frameworks could enable the machine optimization of automated LC workflows, improved filtration of candidate structures using predicted RTs, and the in silico annotation of unknown analytes in LC-MS2 measurements.
Collapse
Affiliation(s)
- Cailum M K Stienstra
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Emir Nazdrajić
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - W Scott Hopkins
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- WaterFEL Free Electron Laser Laboratory, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong
| |
Collapse
|
3
|
Qu X, Jiang C, Shan M, Ke W, Chen J, Zhao Q, Hu Y, Liu J, Qin LP, Cheng G. Prediction of Proteolysis-Targeting Chimeras Retention Time Using XGBoost Model Incorporated with Chromatographic Conditions. J Chem Inf Model 2025; 65:613-625. [PMID: 39786356 DOI: 10.1021/acs.jcim.4c01732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
Proteolysis-targeting chimeras (PROTACs) are heterobifunctional molecules that target undruggable proteins, enhance selectivity and prevent target accumulation through catalytic activity. The unique structure of PROTACs presents challenges in structural identification and drug design. Liquid chromatography (LC), combined with mass spectrometry (MS), enhances compound annotation by providing essential retention time (RT) data, especially when MS alone is insufficient. However, predicting RT for PROTACs remains challenging. To address this, we compiled the PROTAC-RT data set from literature and evaluated the performance of four machine learning algorithms─extreme gradient boosting (XGBoost), random forest (RF), K-nearest neighbor (KNN) and support vector machines (SVM)─and a deep learning model, fully connected neural network (FCNN), using 24 molecular fingerprints and descriptors. Through screening combinations of molecular fingerprints, descriptors and chromatographic condition descriptors (CCs), we developed an optimized XGBoost model (XGBoost + moe206+Path + Charge + CCs) that achieved an R2 of 0.958 ± 0.027 and an RMSE of 0.934 ± 0.412. After hyperparameter tuning, the model's R2 improved to 0.963 ± 0.023, with an RMSE of 0.896 ± 0.374. The model showed strong predictive accuracy under new chromatographic separation conditions and was validated using six experimentally determined compounds. SHapley Additive exPlanations (SHAP) not only highlights the advantages of XGBoost but also emphasizes the importance of CCs and molecular features, such as bond variability, van der Waals surface area, and atomic charge states. The optimized XGBoost model combines moe206, path, charge descriptors, and CCs, providing a fast and precise method for predicting the RT of PROTACs compounds, thus facilitating their annotation.
Collapse
Affiliation(s)
- Xinhao Qu
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| | - Chen Jiang
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
- Universal Identification Technology (Hangzhou) Co., Ltd., Hangzhou 311199, China
| | - Mengyi Shan
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| | - Wenhao Ke
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, 1 Xiangshanzhi Road, Hangzhou 310024, China
| | - Jing Chen
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, People's Republic of China
| | - Qiming Zhao
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| | - Youhong Hu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, 1 Xiangshanzhi Road, Hangzhou 310024, China
| | - Jia Liu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, 1 Xiangshanzhi Road, Hangzhou 310024, China
| | - Lu-Ping Qin
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| | - Gang Cheng
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, People's Republic of China
| |
Collapse
|
4
|
Xu H, Wu W, Chen Y, Zhang D, Mo F. Explicit relation between thin film chromatography and column chromatography conditions from statistics and machine learning. Nat Commun 2025; 16:832. [PMID: 39828717 PMCID: PMC11743788 DOI: 10.1038/s41467-025-56136-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 01/09/2025] [Indexed: 01/22/2025] Open
Abstract
In chemistry, empirical paradigms prevail, especially within the realm of chromatography, where the selection of separation conditions frequently relies on the chemist's experience. However, the underlying rationale for such experiential knowledge has not been established or analysed. This study explicitly elucidates how chemists use thin-layer chromatography (TLC) to determine column chromatography (CC) conditions, employing statistical analysis and machine learning techniques. An experimental dataset of the CC is generated from the automatic platform developed in this study. On this basis, an "artificial intelligence (AI) experience" is generated through a knowledge discovery framework, where the relationship between the retardation factor (RF) value from TLC and retention volume from CC is unveiled in the form of explicit equations. These equations demonstrate satisfactory accuracy and generalizability, providing a scientific basis for the selection of the experimental conditions, and contributing to a better understanding of chromatography.
Collapse
Affiliation(s)
- Hao Xu
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
- BIC-ESAT, ERE, and SKLTCS, College of Engineering, Peking University, 100871, Beijing, P. R. China
- Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, Zhejiang, 315200, P. R. China
| | - Wenchao Wu
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
- School of Materials Science and Engineering, Peking University, 100871, Beijing, P. R. China
| | - Yuntian Chen
- Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, Zhejiang, 315200, P. R. China
- Zhejiang Key Laboratory of Industrial Intelligence and Digital Twin, Eastern Institute of Technology, Ningbo, Zhejiang, 315200, China
| | - Dongxiao Zhang
- Zhejiang Key Laboratory of Industrial Intelligence and Digital Twin, Eastern Institute of Technology, Ningbo, Zhejiang, 315200, China.
| | - Fanyang Mo
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzhen, 518055, China.
- School of Materials Science and Engineering, Peking University, 100871, Beijing, P. R. China.
- School of Advanced Materials, Peking University Shenzhen Graduate School, Shenzhen, 518055, China.
- Guangdong Provincial Key Laboratory of Nano-Micro Materials Research, Peking University Shenzhen Graduate School, Shenzhen, 518055, China.
| |
Collapse
|
5
|
Hupatz H, Rahu I, Wang WC, Peets P, Palm EH, Kruve A. Critical review on in silico methods for structural annotation of chemicals detected with LC/HRMS non-targeted screening. Anal Bioanal Chem 2025; 417:473-493. [PMID: 39138659 DOI: 10.1007/s00216-024-05471-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024]
Abstract
Non-targeted screening with liquid chromatography coupled to high-resolution mass spectrometry (LC/HRMS) is increasingly leveraging in silico methods, including machine learning, to obtain candidate structures for structural annotation of LC/HRMS features and their further prioritization. Candidate structures are commonly retrieved based on the tandem mass spectral information either from spectral or structural databases; however, the vast majority of the detected LC/HRMS features remain unannotated, constituting what we refer to as a part of the unknown chemical space. Recently, the exploration of this chemical space has become accessible through generative models. Furthermore, the evaluation of the candidate structures benefits from the complementary empirical analytical information such as retention time, collision cross section values, and ionization type. In this critical review, we provide an overview of the current approaches for retrieving and prioritizing candidate structures. These approaches come with their own set of advantages and limitations, as we showcase in the example of structural annotation of ten known and ten unknown LC/HRMS features. We emphasize that these limitations stem from both experimental and computational considerations. Finally, we highlight three key considerations for the future development of in silico methods.
Collapse
Affiliation(s)
- Henrik Hupatz
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden
| | - Ida Rahu
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
| | - Wei-Chieh Wang
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden
| | - Pilleriin Peets
- Institute of Biodiversity, Faculty of Biological Science, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Emma H Palm
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, 4367, Belvaux, Luxembourg
| | - Anneli Kruve
- Department of Materials and Environmental Chemistry, Stockholm University, Svante Arrhenius Väg 16, 114 18, Stockholm, Sweden.
- Stockholm University Center for Circular and Sustainable Systems (SUCCeSS), Stockholm University, 106 91, Stockholm, Sweden.
- Department of Environmental Science, Stockholm University, Svante Arrhenius Väg 8, 114 18, Stockholm, Sweden.
| |
Collapse
|
6
|
Liu Y, Yoshizawa AC, Ling Y, Okuda S. Insights into predicting small molecule retention times in liquid chromatography using deep learning. J Cheminform 2024; 16:113. [PMID: 39375739 PMCID: PMC11460055 DOI: 10.1186/s13321-024-00905-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 09/13/2024] [Indexed: 10/09/2024] Open
Abstract
In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in silico tools for mass spectrometry peak alignment and compound prediction have been developed; however, the list of candidate compounds remains extensive. Accurate RT prediction is important to exclude false candidates and facilitate metabolite annotation. Recent advancements in artificial intelligence (AI) have led to significant breakthroughs in the use of deep learning models in various fields. Release of a large RT dataset has mitigated the bottlenecks limiting the application of deep learning models, thereby improving their application in RT prediction tasks. This review lists the databases that can be used to expand training datasets and concerns the issue about molecular representation inconsistencies in datasets. It also discusses the application of AI technology for RT prediction, particularly in the 5 years following the release of the METLIN small molecule RT dataset. This review provides a comprehensive overview of the AI applications used for RT prediction, highlighting the progress and remaining challenges. SCIENTIFIC CONTRIBUTION: This article focuses on the advancements in small molecule retention time prediction in computational metabolomics over the past five years, with a particular emphasis on the application of AI technologies in this field. It reviews the publicly available datasets for small molecule retention time, the molecular representation methods, the AI algorithms applied in recent studies. Furthermore, it discusses the effectiveness of these models in assisting with the annotation of small molecule structures and the challenges that must be addressed to achieve practical applications.
Collapse
Affiliation(s)
- Yuting Liu
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Akiyasu C Yoshizawa
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Yiwei Ling
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan
| | - Shujiro Okuda
- Medical AI Center, Niigata University School of Medicine, Niigata City, Niigata, 951-8514, Japan.
| |
Collapse
|
7
|
Zhang Y, Liu F, Li XQ, Gao Y, Li KC, Zhang QH. Retention time dataset for heterogeneous molecules in reversed-phase liquid chromatography. Sci Data 2024; 11:946. [PMID: 39209861 PMCID: PMC11362277 DOI: 10.1038/s41597-024-03780-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 08/14/2024] [Indexed: 09/04/2024] Open
Abstract
Quantitative structure-property relationships have been extensively studied in the field of predicting retention times in liquid chromatography (LC). However, making transferable predictions is inherently complex because retention times are influenced by both the structure of the molecule and the chromatographic method used. Despite decades of development and numerous published machine learning models, the practical application of predicting small molecule retention time remains limited. The resulting models are typically limited to specific chromatographic conditions and the molecules used in their training and evaluation. Here, we have developed a comprehensive dataset comprising over 10,000 experimental retention times. These times were derived from 30 different reversed-phase liquid chromatography methods and pertain to a collection of 343 small molecules representing a wide range of chemical structures. These chromatographic methods encompass common LC setups for studying the retention behavior of small molecules. They offer a wide range of examples for modeling retention time with different LC setups.
Collapse
Affiliation(s)
- Yan Zhang
- Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Fei Liu
- Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China.
| | - Xiu Qin Li
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Yan Gao
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Kang Cong Li
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Qing He Zhang
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China.
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China.
| |
Collapse
|
8
|
Zhang Y, Liu F, Li XQ, Gao Y, Li KC, Zhang QH. Generic and accurate prediction of retention times in liquid chromatography by post-projection calibration. Commun Chem 2024; 7:54. [PMID: 38459241 PMCID: PMC10923921 DOI: 10.1038/s42004-024-01135-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 02/21/2024] [Indexed: 03/10/2024] Open
Abstract
Retention time predictions from molecule structures in liquid chromatography (LC) are increasingly used in MS-based targeted and untargeted analyses, providing supplementary evidence for molecule annotation and reducing experimental measurements. Nevertheless, different LC setups (e.g., differences in gradient, column, and/or mobile phase) give rise to many prediction models that can only accurately predict retention times for a specific chromatographic method (CM). Here, a generic and accurate method is present to predict retention times across different CMs, by introducing the concept of post-projection calibration. This concept builds on the direct projections of retention times between different CMs and uses 35 external calibrants to eliminate the impact of LC setups on projection accuracy. Results showed that post-projection calibration consistently achieved a median projection error below 3.2% of the elution time. The ranking results of putative candidates reached similar levels among different CMs. This work opens up broad possibilities for coordinating retention times between different laboratories and developing extensive retention databases.
Collapse
Affiliation(s)
- Yan Zhang
- Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Fei Liu
- Key Laboratory of Groundwater Conservation of MWR, China University of Geosciences, Beijing, 100083, People's Republic of China.
| | - Xiu Qin Li
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Yan Gao
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Kang Cong Li
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China
| | - Qing He Zhang
- Division of Chemical Metrology and Analytical Science, National Institute of Metrology, Beijing, 100029, People's Republic of China.
- Key Laboratory of Chemical Metrology and Applications on Nutrition and Health for State Market Regulation, Beijing, 100029, China.
| |
Collapse
|
9
|
Witting M. (Re-)use and (re-)analysis of publicly available metabolomics data. Proteomics 2023; 23:e2300032. [PMID: 37670538 DOI: 10.1002/pmic.202300032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 09/07/2023]
Abstract
Metabolomics, the systematic measurement of small molecules (<1000 Da) in a given biological sample, is a fast-growing field with many different applications. In contrast to transcriptomics and proteomics, sharing of data is not as widespread in metabolomics, though more scientists are sharing their data nowadays. However, to improve data analysis tools and develop new data analytical approaches and to improve metabolite annotation and identification, sharing of reference data is crucial. Here, different possibilities to share (metabolomics) data are reviewed and some recent approaches and applications regarding the (re-)use and (re-)analysis are highlighted.
Collapse
Affiliation(s)
- Michael Witting
- Metabolomics and Proteomics Core, Helmholtz Zentrum München, Neuherberg, Germany
- Chair of Analytical Food Chemistry, TUM School of Life Sciences, Freising-Weihenstephan, Germany
| |
Collapse
|
10
|
Kwon Y, Kwon H, Han J, Kang M, Kim JY, Shin D, Choi YS, Kang S. Retention Time Prediction through Learning from a Small Training Data Set with a Pretrained Graph Neural Network. Anal Chem 2023; 95:17273-17283. [PMID: 37955847 DOI: 10.1021/acs.analchem.3c03177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2023]
Abstract
Graph neural networks (GNNs) have shown remarkable performance in predicting the retention time (RT) for small molecules. However, the training data set for a particular target chromatographic system tends to exhibit scarcity, which poses a challenge because the experimental process for measuring RT is costly. To address this challenge, transfer learning has been used to leverage an abundant training data set from a related source task. In this study, we present an improved transfer learning method to better predict the RT of molecules for a target chromatographic system by learning from a small training data set with a pretrained GNN. We use a graph isomorphism network as the architecture of the GNN. The GNN is pretrained on the METLIN-SMRT data set and is then fine-tuned on the target training data set for a fixed number of training iterations using the limited-memory Broyden-Fletcher-Goldfarb-Shanno optimizer with a learning rate decay. We demonstrate that the proposed method achieves superior predictive performance on various chromatographic systems compared with that of the existing transfer learning methods, especially when only a small training data set is available for use. A potential avenue for future research is to leverage multiple small training data sets from different chromatographic systems to further enhance the generalization performance.
Collapse
Affiliation(s)
- Youngchun Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Hyukju Kwon
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
- Department of Chemistry, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Jongmin Han
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Myeonginn Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| | - Ji-Yeong Kim
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Dongyeeb Shin
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Youn-Suk Choi
- Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon 16678, Republic of Korea
| | - Seokho Kang
- Department of Industrial Engineering, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea
| |
Collapse
|
11
|
Lin W, Mellinghaus K, Rodriguez-Mateos A, Globisch D. Identification of nutritional biomarkers through highly sensitive and chemoselective metabolomics. Food Chem 2023; 425:136481. [PMID: 37276670 DOI: 10.1016/j.foodchem.2023.136481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 04/19/2023] [Accepted: 05/26/2023] [Indexed: 06/07/2023]
Abstract
The importance of a healthy diet for humans is known for decades. The elucidation of key molecules responsible for the beneficial and adverse dietary effects is slowly developing as the tools are missing. Carbonyl-containing metabolites are a common bioproducts through conversion of diet by the microbiome. In here, we have utilized our recently developed mass spectrometric methodology based on chemoselective conjugation of carbonyl-metabolites. The method has been applied for urine sample analysis from a dietary (poly)phenol intervention study (N = 78 individuals) for the first time. We have identified a series of carbonyl-metabolites of dietary origin and the chemical structure was validated for 30 metabolites. Our sensitive analysis led to the discovery of four unknown dietary markers with high sensitivity and selectivity (AUC > 0.91). Our chemical metabolomics method has been successfully applied for large-scale analysis and provides the basis for targeted metabolomics to identify unknown nutritional and disease-related biomarkers.
Collapse
Affiliation(s)
- Weifeng Lin
- Department of Chemistry - BMC, Science for Life Laboratory, Uppsala University, Box 576, SE-75124 Uppsala, Sweden
| | - Kiana Mellinghaus
- Department of Chemistry - BMC, Science for Life Laboratory, Uppsala University, Box 576, SE-75124 Uppsala, Sweden
| | - Ana Rodriguez-Mateos
- Department of Nutritional Sciences, School of Life Course and Population Sciences, Faculty of Life Sciences and Medicine, King's College London, UK
| | - Daniel Globisch
- Department of Chemistry - BMC, Science for Life Laboratory, Uppsala University, Box 576, SE-75124 Uppsala, Sweden.
| |
Collapse
|
12
|
Xu H, Lin J, Zhang D, Mo F. Retention time prediction for chromatographic enantioseparation by quantile geometry-enhanced graph neural network. Nat Commun 2023; 14:3095. [PMID: 37248214 DOI: 10.1038/s41467-023-38853-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 05/17/2023] [Indexed: 05/31/2023] Open
Abstract
The enantioseparation of chiral molecules is a crucial and challenging task in the field of experimental chemistry, often requiring extensive trial and error with different experimental settings. To overcome this challenge, here we show a research framework that employs machine learning techniques to predict retention times of enantiomers and facilitate chromatographic enantioseparation. A documentary dataset of chiral molecular retention times in high-performance liquid chromatography (CMRT dataset) is established to handle the challenge of data acquisition. A quantile geometry-enhanced graph neural network is proposed to learn the molecular structure-retention time relationship, which shows a satisfactory predictive ability for enantiomers. The domain knowledge of chromatography is incorporated into the machine learning model to achieve multi-column prediction, which paves the way for chromatographic enantioseparation prediction by calculating the separation probability. The proposed research framework works well in retention time prediction and chromatographic enantioseparation facilitation, which sheds light on the application of machine learning techniques to the experimental scene and improves the efficiency of experimenters to speed up scientific discovery.
Collapse
Affiliation(s)
- Hao Xu
- School of Materials Science and Engineering, Peking University, 100871, Beijing, P. R. China
- BIC-ESAT, ERE, and SKLTCS, College of Engineering, Peking University, 100871, Beijing, P. R. China
| | - Jinglong Lin
- School of Materials Science and Engineering, Peking University, 100871, Beijing, P. R. China
| | - Dongxiao Zhang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, 315200, Ningbo, Zhejiang, P. R. China.
- Department of Mathematics and Theories, Peng Cheng Laboratory, 518000, Shenzhen, Guangdong, P. R. China.
| | - Fanyang Mo
- School of Materials Science and Engineering, Peking University, 100871, Beijing, P. R. China.
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, 518055, Shenzhen, P. R. China.
| |
Collapse
|
13
|
Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00577-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
AbstractStructural annotation of small molecules in biological samples remains a key bottleneck in untargeted metabolomics, despite rapid progress in predictive methods and tools during the past decade. Liquid chromatography–tandem mass spectrometry, one of the most widely used analysis platforms, can detect thousands of molecules in a sample, the vast majority of which remain unidentified even with best-of-class methods. Here we present LC-MS2Struct, a machine learning framework for structural annotation of small-molecule data arising from liquid chromatography–tandem mass spectrometry (LC-MS2) measurements. LC-MS2Struct jointly predicts the annotations for a set of mass spectrometry features in a sample, using a novel structured prediction model trained to optimally combine the output of state-of-the-art MS2 scorers and observed retention orders. We evaluate our method on a dataset covering all publicly available reversed-phase LC-MS2 data in the MassBank reference database, including 4,327 molecules measured using 18 different LC conditions from 16 contributors, greatly expanding the chemical analytical space covered in previous multi-MS2 scorer evaluations. LC-MS2Struct obtains significantly higher annotation accuracy than earlier methods and improves the annotation accuracy of state-of-the-art MS2 scorers by up to 106%. The use of stereochemistry-aware molecular fingerprints improves prediction performance, which highlights limitations in existing approaches and has strong implications for future computational LC-MS2 developments.
Collapse
|
14
|
Harrieder EM, Kretschmer F, Böcker S, Witting M. Current state-of-the-art of separation methods used in LC-MS based metabolomics and lipidomics. J Chromatogr B Analyt Technol Biomed Life Sci 2021; 1188:123069. [PMID: 34879285 DOI: 10.1016/j.jchromb.2021.123069] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/10/2021] [Accepted: 11/24/2021] [Indexed: 12/23/2022]
Abstract
Metabolomics deals with the large-scale analysis of metabolites, belonging to numerous compound classes and showing an extremely high chemical diversity and complexity. Lipidomics, being a subcategory of metabolomics, analyzes the cellular lipid species. Both require state-of-the-art analytical methods capable of accessing the underlying chemical complexity. One of the major techniques used for the analysis of metabolites and lipids is Liquid Chromatography-Mass Spectrometry (LC-MS), offering both different selectivities in LC separation and high sensitivity in MS detection. Chromatography can be divided into different modes, based on the properties of the employed separation system. The most popular ones are Reversed-Phase (RP) separation for non- to mid-polar molecules and Hydrophilic Interaction Liquid Chromatography (HILIC) for polar molecules. So far, no single analysis method exists that can cover the entire range of metabolites or lipids, due to the huge chemical diversity. Consequently, different separation methods have been used for different applications and research questions. In this review, we explore the current use of LC-MS in metabolomics and lipidomics. As a proxy, we examined the use of chromatographic methods in the public repositories EBI MetaboLights and NIH Metabolomics Workbench. We extracted 1484 method descriptions, collected separation metadata and generated an overview on the current use of columns, eluents, etc. Based on this overview, we reviewed current practices and identified potential future trends as well as required improvements that may allow us to increase metabolite coverage, throughput or both simultaneously.
Collapse
Affiliation(s)
- Eva-Maria Harrieder
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Fleming Kretschmer
- Chair of Bioinformatics, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany
| | - Sebastian Böcker
- Chair of Bioinformatics, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany
| | - Michael Witting
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; Metabolomics and Proteomics Core, Helmholtz Zentrum München, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; Chair of Analytical Food Chemistry, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany.
| |
Collapse
|