1
|
Xu Y, Cao L, Chen Y, Zhang Z, Liu W, Li H, Ding C, Pu J, Qian K, Xu W. Integrating Machine Learning in Metabolomics: A Path to Enhanced Diagnostics and Data Interpretation. SMALL METHODS 2024:e2400305. [PMID: 38682615 DOI: 10.1002/smtd.202400305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 04/07/2024] [Indexed: 05/01/2024]
Abstract
Metabolomics, leveraging techniques like NMR and MS, is crucial for understanding biochemical processes in pathophysiological states. This field, however, faces challenges in metabolite sensitivity, data complexity, and omics data integration. Recent machine learning advancements have enhanced data analysis and disease classification in metabolomics. This study explores machine learning integration with metabolomics to improve metabolite identification, data efficiency, and diagnostic methods. Using deep learning and traditional machine learning, it presents advancements in metabolic data analysis, including novel algorithms for accurate peak identification, robust disease classification from metabolic profiles, and improved metabolite annotation. It also highlights multiomics integration, demonstrating machine learning's potential in elucidating biological phenomena and advancing disease diagnostics. This work contributes significantly to metabolomics by merging it with machine learning, offering innovative solutions to analytical challenges and setting new standards for omics data analysis.
Collapse
Affiliation(s)
- Yudian Xu
- Department of Traditional Chinese Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| | - Linlin Cao
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Yifan Chen
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Ziyue Zhang
- School of Biomedical Engineering, Institute of Medical Robotics and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Wanshan Liu
- School of Biomedical Engineering, Institute of Medical Robotics and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - He Li
- Department of Traditional Chinese Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| | - Chenhuan Ding
- Department of Traditional Chinese Medicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, P. R. China
| | - Jun Pu
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| | - Kun Qian
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
- School of Biomedical Engineering, Institute of Medical Robotics and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, P. R. China
| | - Wei Xu
- State Key Laboratory for Oncogenes and Related Genes, Division of Cardiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, 160 Pujian Road, Shanghai, 200127, P. R. China
| |
Collapse
|
2
|
Cheng KP, Shen WX, Jiang YY, Chen Y, Chen YZ, Tan Y. Deep learning of 2D-Restructured gene expression representations for improved low-sample therapeutic response prediction. Comput Biol Med 2023; 164:107245. [PMID: 37480677 DOI: 10.1016/j.compbiomed.2023.107245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/27/2023] [Accepted: 07/07/2023] [Indexed: 07/24/2023]
Abstract
Clinical outcome prediction is important for stratified therapeutics. Machine learning (ML) and deep learning (DL) methods facilitate therapeutic response prediction from transcriptomic profiles of cells and clinical samples. Clinical transcriptomic DL is challenged by the low-sample sizes (34-286 subjects), high-dimensionality (up to 21,653 genes) and unordered nature of clinical transcriptomic data. The established methods rely on ML algorithms at accuracy levels of 0.6-0.8 AUC/ACC values. Low-sample DL algorithms are needed for enhanced prediction capability. Here, an unsupervised manifold-guided algorithm was employed for restructuring transcriptomic data into ordered image-like 2D-representations, followed by efficient DL of these 2D-representations with deep ConvNets. Our DL models significantly outperformed the state-of-the-art (SOTA) ML models on 82% of 17 low-sample benchmark datasets (53% with >0.05 AUC/ACC improvement). They are more robust than the SOTA models in cross-cohort prediction tasks, and in identifying robust biomarkers and response-dependent variational patterns consistent with experimental indications.
Collapse
Affiliation(s)
- Kai Ping Cheng
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China
| | - Wan Xiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, Center for Computational Science and Engineering, National University of Singapore, 117543, Singapore
| | - Yu Yang Jiang
- School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, PR China
| | - Yan Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen, 518132, PR China.
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China; The Institute of Drug Discovery Technology, Ningbo University, Ningbo, 315211, PR China; Shenzhen Kivita Innovative Drug Discovery Institute, Shenzhen, 518110, PR China.
| |
Collapse
|
3
|
Wang T, Tan Y, Chen YZ, Tan C. Infrared Spectral Analysis for Prediction of Functional Groups Based on Feature-Aggregated Deep Learning. J Chem Inf Model 2023; 63:4615-4622. [PMID: 37531205 DOI: 10.1021/acs.jcim.3c00749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Infrared (IR) spectroscopy is a powerful and versatile tool for analyzing functional groups in organic compounds. A complex and time-consuming interpretation of massive unknown spectra usually requires knowledge of chemistry and spectroscopy. This paper presents a new deep learning method for transforming IR spectral features into intuitive imagelike feature maps and prediction of major functional groups. We obtained 8272 gas-phase IR spectra from the NIST Chemistry WebBook. Feature maps are constructed using the intrinsic correlation of spectral data, and prediction models are developed based on convolutional neural networks. Twenty-one major functional groups for each molecule are successfully identified using binary and multilabel models without expert guidance and feature selection. The multilabel classification model can produce all prediction results simultaneously for rapid characterization. Further analysis of the detailed substructures indicates that our model is capable of obtaining abundant structural information from IR spectra for a comprehensive investigation. The interpretation of our model reveals that the peaks of most interest are similar to those often considered by spectroscopists. In addition to demonstrating great potential for spectral identification, our method may contribute to the development of automated analyses in many fields.
Collapse
Affiliation(s)
- Tianyi Wang
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Institute of Biomedical Health Technology and Engineering, Shenzhen Bay Laboratory, Shenzhen 518132, P.R. China
| | - Chunyan Tan
- The State Key Laboratory of Chemical Oncogenomics, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| |
Collapse
|
4
|
Greenberg ZF, Graim KS, He M. Towards artificial intelligence-enabled extracellular vesicle precision drug delivery. Adv Drug Deliv Rev 2023:114974. [PMID: 37356623 DOI: 10.1016/j.addr.2023.114974] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 06/27/2023]
Abstract
Extracellular Vesicles (EVs), particularly exosomes, recently exploded into nanomedicine as an emerging drug delivery approach due to their superior biocompatibility, circulating stability, and bioavailability in vivo. However, EV heterogeneity makes molecular targeting precision a critical challenge. Deciphering key molecular drivers for controlling EV tissue targeting specificity is in great need. Artificial intelligence (AI) brings powerful prediction ability for guiding the rational design of engineered EVs in precision control for drug delivery. This review focuses on cutting-edge nano-delivery via integrating large-scale EV data with AI to develop AI-directed EV therapies and illuminate the clinical translation potential. We briefly review the current status of EVs in drug delivery, including the current frontier, limitations, and considerations to advance the field. Subsequently, we detail the future of AI in drug delivery and its impact on precision EV delivery. Our review discusses the current universal challenge of standardization and critical considerations when using AI combined with EVs for precision drug delivery. Finally, we will conclude this review with a perspective on future clinical translation led by a combined effort of AI and EV research.
Collapse
Affiliation(s)
- Zachary F Greenberg
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida, 32610, USA
| | - Kiley S Graim
- Department of Computer & Information Science & Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, 32610, USA
| | - Mei He
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida, 32610, USA.
| |
Collapse
|
5
|
Shen WX, Chen YZ. Toward ordered -omics data science: Researchers on the magic of turning metagenomic chaos into image-like patterns. PATTERNS (NEW YORK, N.Y.) 2023; 4:100673. [PMID: 36699736 PMCID: PMC9868643 DOI: 10.1016/j.patter.2022.100673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Wan Xiang Shen, a postdoctoral researcher at National University of Singapore, and Yu Zong Chen, the PI of the Bioinformatics and Drug Design (BIDD) group, have developed an AI pipeline for enhanced deep learning of metagenomic data. Their Patterns paper highlights the advantages of unsupervised data restructuring in microbiome-based disease prediction and biomarker discovery. They talk about their view of data science and the backstory of the article published in Patterns.
Collapse
Affiliation(s)
- Wan Xiang Shen
- Bioinformatics and Drug Design (BIDD) Group and Center for Computational Science and Engineering, Department of Pharmacy, National University of Singapore, Singapore 117559, Singapore,Department of Chemistry, Faculty of Science, National University of Singapore, Singapore 117543, Singapore,Corresponding author
| | - Yu Zong Chen
- Bioinformatics and Drug Design (BIDD) Group and Center for Computational Science and Engineering, Department of Pharmacy, National University of Singapore, Singapore 117559, Singapore,The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, PR China,Corresponding author
| |
Collapse
|
6
|
Lee Y, Cappellato M, Di Camillo B. Machine learning-based feature selection to search stable microbial biomarkers: application to inflammatory bowel disease. Gigascience 2022; 12:giad083. [PMID: 37882604 PMCID: PMC10600917 DOI: 10.1093/gigascience/giad083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 08/23/2023] [Accepted: 09/17/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND Biomarker discovery exploiting feature importance of machine learning has risen recently in the microbiome landscape with its high predictive performance in several disease states. To have a concrete selection among a high number of features, recursive feature elimination (RFE) has been widely used in the bioinformatics field. However, machine learning-based RFE has factors that decrease the stability of feature selection. In this article, we suggested methods to improve stability while sustaining performance. RESULTS We exploited the abundance matrices of the gut microbiome (283 taxa at species level and 220 at genus level) to classify between patients with inflammatory bowel disease (IBD) and healthy control (1,569 samples). We found that applying an already published data transformation before RFE improves feature stability significantly. Moreover, we performed an in-depth evaluation of different variants of the data transformation and identify those that demonstrate better improvement in stability while not sacrificing classification performance. To ensure a robust comparison, we evaluated stability using various similarity metrics, distances, the common number of features, and the ability to filter out noise features. We were able to confirm that the mapping by the Bray-Curtis similarity matrix before RFE consistently improves the stability while maintaining good performance. Multilayer perceptron algorithm exhibited the highest performance among 8 different machine learning algorithms when a large number of features (a few hundred) were considered based on the best performance across 100 bootstrapped internal test sets. Conversely, when utilizing only a limited number of biomarkers as a trade-off between optimal performance and method generalizability, the random forest algorithm demonstrated the best performance. Using the optimal pipeline we developed, we identified 14 biomarkers for IBD at the species level and analyzed their roles using Shapley additive explanations. CONCLUSION Taken together, our work not only showed how to improve biomarker discovery in the metataxonomic field without sacrificing classification performance but also provided useful insights for future comparative studies.
Collapse
Affiliation(s)
- Youngro Lee
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Korea
- Institute of Engineering Research at Seoul National University, Seoul, 08826, Korea
| | - Marco Cappellato
- Department of Information Engineering, University of Padova, Padova, 35122, Italy
| | - Barbara Di Camillo
- Department of Information Engineering, University of Padova, Padova, 35122, Italy
| |
Collapse
|
7
|
Shen WX, Liang SR, Jiang YY, Chen YZ. Enhanced metagenomic deep learning for disease prediction and consistent signature recognition by restructured microbiome 2D representations. PATTERNS (NEW YORK, N.Y.) 2022; 4:100658. [PMID: 36699735 PMCID: PMC9868677 DOI: 10.1016/j.patter.2022.100658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/15/2022] [Accepted: 11/15/2022] [Indexed: 12/23/2022]
Abstract
Metagenomic analysis has been explored for disease diagnosis and biomarker discovery. Low sample sizes, high dimensionality, and sparsity of metagenomic data challenge metagenomic investigations. Here, an unsupervised microbial embedding, grouping, and mapping algorithm (MEGMA) was developed to transform metagenomic data into individualized multichannel microbiome 2D representation by manifold learning and clustering of microbial profiles (e.g., composition, abundance, hierarchy, and taxonomy). These 2D representations enable enhanced disease prediction by established ConvNet-based AggMapNet models, outperforming the commonly used machine learning and deep learning models in metagenomic benchmark datasets. These 2D representations combined with AggMapNet explainable module robustly identified more reliable and replicable disease-prediction microbes (biomarkers). Employing the MEGMA-AggMapNet pipeline for biomarker identification from 5 disease datasets, 84% of the identified biomarkers have been described in over 74 distinct works as important for these diseases. Moreover, the method also discovered highly consistent sets of biomarkers in cross-cohort colorectal cancer (CRC) patients and microbial shifts in different CRC stages.
Collapse
Affiliation(s)
- Wan Xiang Shen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543, Singapore
| | - Shu Ran Liang
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
| | - Yu Yang Jiang
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Corresponding author
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Shenzhen Bay Laboratory, Shenzhen 518000, China,Corresponding author
| |
Collapse
|
8
|
Jin Y, Du N, Huang Y, Shen W, Tan Y, Chen YZ, Dou WT, He XP, Yang Z, Xu N, Tan C. Fluorescence Analysis of Circulating Exosomes for Breast Cancer Diagnosis Using a Sensor Array and Deep Learning. ACS Sens 2022; 7:1524-1532. [PMID: 35512281 DOI: 10.1021/acssensors.2c00259] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Emerging liquid biopsy methods for investigating biomarkers in bodily fluids such as blood, saliva, or urine can be used to perform noninvasive cancer detection. However, the complexity and heterogeneity of exosomes require improved methods to achieve the desired sensitivity and accuracy. Herein, we report our study on developing a breast cancer liquid biopsy system, including a fluorescence sensor array and deep learning (DL) tool AggMapNet. In particular, we used a 12-unit sensor array composed of conjugated polyelectrolytes, fluorophore-labeled peptides, and monosaccharides or glycans to collect fluorescence signals from cells and exosomes. Linear discriminant analysis (LDA) processed the fluorescence spectral data of cells and cell-derived exosomes, demonstrating successful discrimination between normal and different cancerous cells and 100% accurate classification of different BC cells. For heterogeneous plasma-derived exosome analysis, CNN-based DL tool AggMapNet was applied to transform the unordered fluorescence spectra into feature maps (Fmaps), which gave a straightforward visual demonstration of the difference between healthy donors and BC patients with 100% prediction accuracy. Our work indicates that our fluorescent sensor array and DL model can be used as a promising noninvasive method for BC diagnosis.
Collapse
Affiliation(s)
- Yuyao Jin
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Nan Du
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Yuanfang Huang
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Wanxiang Shen
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore 117543, Singapore
| | - Ying Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Yu Zong Chen
- Shenzhen Bay Laboratory, Shenzhen 518055, P. R. China
| | - Wei-Tao Dou
- Key Laboratory for Advanced Materials and Joint International Research Laboratory of Precision Chemistry and Molecular Engineering, Feringa Nobel Prize Scientist Joint Research Center, School of Chemistry and Molecular Engineering, Frontiers Center for Materiobiology and Dynamic Chemistry, East China University of Science and Technology, 130 Meilong RD, Shanghai 200237, P. R. China
| | - Xiao-Peng He
- Key Laboratory for Advanced Materials and Joint International Research Laboratory of Precision Chemistry and Molecular Engineering, Feringa Nobel Prize Scientist Joint Research Center, School of Chemistry and Molecular Engineering, Frontiers Center for Materiobiology and Dynamic Chemistry, East China University of Science and Technology, 130 Meilong RD, Shanghai 200237, P. R. China
| | - Zijian Yang
- Department of Breast and Thyroid Surgery, Peking University Shenzhen Hospital, Shenzhen 518034, P. R. China
| | - Naihan Xu
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| | - Chunyan Tan
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
- Open FIESTA, Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, P. R. China
| |
Collapse
|