1
|
Alvarez MRS, Holmes XA, Oloumi A, Grijaldo-Alvarez SJ, Schindler R, Zhou Q, Yadlapati A, Silsirivanit A, Lebrilla CB. Integration of RNAseq transcriptomics and N-glycomics reveal biosynthetic pathways and predict structure-specific N-glycan expression. Chem Sci 2025; 16:7155-7172. [PMID: 40191131 PMCID: PMC11970275 DOI: 10.1039/d5sc00467e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Accepted: 03/20/2025] [Indexed: 04/09/2025] Open
Abstract
The processes involved in protein N-glycosylation represent new therapeutic targets for diseases but their stepwise and overlapping biosynthetic processes make it challenging to identify the specific glycogenes involved. In this work, we aimed to elucidate the interactions between glycogene expression and N-glycan abundance by constructing supervised machine-learning models for each N-glycan composition. Regression models were trained to predict N-glycan abundance (response variable) from glycogene expression (predictors) using paired LC-MS/MS N-glycomic and 3'-TagSeq transcriptomic datasets from cells derived from multiple tissue origins and treatment conditions. The datasets include cells from several tissue origins - B cell, brain, colon, lung, muscle, prostate - encompassing nearly 400 N-glycan compounds and over 160 glycogenes filtered from an 18 000-gene transcriptome. Accurate models (validation R 2 > 0.8) predicted N-glycan abundance across cell types, including GLC01 (lung cancer), CCD19-Lu (lung fibroblast), and Tib-190 (B cell). Model importance scores ranked glycogene contributions to N-glycan predictions, revealing significant glycogene associations with specific N-glycan types. The predictions were consistent across input cell quantities, unlike LC-MS/MS glycomics which showed inconsistent results. This suggests that the models can reliably predict N-glycosylation even in samples with low cell amounts and by extension, single-cell samples. These findings can provide insights into cellular N-glycosylation machinery, offering potential therapeutic strategies for diseases linked to aberrant glycosylation, such as cancer, and neurodegenerative and autoimmune disorders.
Collapse
Affiliation(s)
| | - Xavier A Holmes
- Department of Chemistry, University of California, Davis Davis California USA
| | - Armin Oloumi
- Department of Chemistry, University of California, Davis Davis California USA
| | | | - Ryan Schindler
- Department of Chemistry, University of California, Davis Davis California USA
| | - Qingwen Zhou
- Department of Chemistry, University of California, Davis Davis California USA
| | - Anirudh Yadlapati
- Department of Chemistry, University of California, Davis Davis California USA
| | - Atit Silsirivanit
- Department of Biochemistry, Faculty of Medicine, Khon Kaen University Khon Kaen Thailand
| | - Carlito B Lebrilla
- Department of Chemistry, University of California, Davis Davis California USA
- Department of Chemistry, Biochemistry, Molecular, Cellular and Developmental Biology Graduate Group, University of California, Davis Davis California USA
| |
Collapse
|
2
|
Chrysinas P, Venkatesan S, Ang I, Ghosh V, Chen C, Neelamegham S, Gunawan R. Cell- and tissue-specific glycosylation pathways informed by single-cell transcriptomics. NAR Genom Bioinform 2024; 6:lqae169. [PMID: 39703423 PMCID: PMC11655298 DOI: 10.1093/nargab/lqae169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 11/06/2024] [Accepted: 11/21/2024] [Indexed: 12/21/2024] Open
Abstract
While single-cell studies have made significant impacts in various subfields of biology, they lag in the Glycosciences. To address this gap, we analyzed single-cell glycogene expressions in the Tabula Sapiens dataset of human tissues and cell types using a recent glycosylation-specific gene ontology (GlycoEnzOnto). At the median sequencing (count) depth, ∼40-50 out of 400 glycogenes were detected in individual cells. Upon increasing the sequencing depth, the number of detectable glycogenes saturates at ∼200 glycogenes, suggesting that the average human cell expresses about half of the glycogene repertoire. Hierarchies in glycogene and glycopathway expressions emerged from our analysis: nucleotide-sugar synthesis and transport exhibited the highest gene expressions, followed by genes for core enzymes, glycan modification and extensions, and finally terminal modifications. Interestingly, the same cell types showed variable glycopathway expressions based on their organ or tissue origin, suggesting nuanced cell- and tissue-specific glycosylation patterns. Probing deeper into the transcription factors (TFs) of glycogenes, we identified distinct groupings of TFs controlling different aspects of glycosylation: core biosynthesis, terminal modifications, etc. We present webtools to explore the interconnections across glycogenes, glycopathways and TFs regulating glycosylation in human cell/tissue types. Overall, the study presents an overview of glycosylation across multiple human organ systems.
Collapse
Affiliation(s)
- Panagiotis Chrysinas
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, 308 Furnas Hall, Buffalo, NY 14260, USA
| | - Shriramprasad Venkatesan
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, 308 Furnas Hall, Buffalo, NY 14260, USA
| | - Isaac Ang
- Department of Computer Science, University of Illinois Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801, USA
| | - Vishnu Ghosh
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, 308 Furnas Hall, Buffalo, NY 14260, USA
| | - Changyou Chen
- Department of Computer Science and Engineering, University at Buffalo-SUNY, 338 Davis Hall, Buffalo, NY 14260, USA
| | - Sriram Neelamegham
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, 308 Furnas Hall, Buffalo, NY 14260, USA
| | - Rudiyanto Gunawan
- Department of Chemical and Biological Engineering, University at Buffalo-SUNY, 308 Furnas Hall, Buffalo, NY 14260, USA
| |
Collapse
|
3
|
Keisham S, Tateno H. Emerging technologies for single-cell glycomics. BBA ADVANCES 2024; 6:100125. [PMID: 39687516 PMCID: PMC11646792 DOI: 10.1016/j.bbadva.2024.100125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 11/20/2024] [Accepted: 11/25/2024] [Indexed: 12/18/2024] Open
Abstract
Glycans are present on virtually all cellular surfaces and are important regulators of multicellular communications. Advances in single-cell omics technologies have revolutionized life science research by elucidating cellular heterogeneity through integrated multimodal analyses, providing a comprehensive view of cellular functions. However, dissecting the heterogeneity of glycans at the single-cell level has been challenging due to their structural complexity and unamplifiable nature. Recently, we developed a novel technology called single-cell glycan and RNA sequencing (scGR-seq), which converts glycan information into genetic information using DNA-barcoded lectins, amplifies it by PCR, and simultaneously measures the glycome and transcriptome in thousands of single cells on a next-generation sequencer. In this mini-review, we review the recent advances in single-cell glycomics, focusing on our scGR-seq technology.
Collapse
Affiliation(s)
- Sunada Keisham
- Cellular and Molecular Biotechnology Research Institute, Multicellular System Regulation Research Group, National Institute of Advanced Industrial Science and Technology (AIST), Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
- Ph.D. Program in Human Biology, School of Integrative and Global Majors, University of Tsukuba, Tsukuba, Japan
| | - Hiroaki Tateno
- Cellular and Molecular Biotechnology Research Institute, Multicellular System Regulation Research Group, National Institute of Advanced Industrial Science and Technology (AIST), Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
- Ph.D. Program in Human Biology, School of Integrative and Global Majors, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
4
|
Gao S, Wang Y, Wang J, Dong Y. Leveraging explainable deep learning methodologies to elucidate the biological underpinnings of Huntington's disease using single-cell RNA sequencing data. BMC Genomics 2024; 25:930. [PMID: 39367331 PMCID: PMC11451194 DOI: 10.1186/s12864-024-10855-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 09/30/2024] [Indexed: 10/06/2024] Open
Abstract
BACKGROUND Huntington's disease (HD) is a hereditary neurological disorder caused by mutations in HTT, leading to neuronal degeneration. Traditionally, HD is associated with the misfolding and aggregation of mutant huntingtin due to an extended polyglutamine domain encoded by an expanded CAG tract. However, recent research has also highlighted the role of global transcriptional dysregulation in HD pathology. However, understanding the intricate relationship between mRNA expression and HD at the cellular level remains challenging. Our study aimed to elucidate the underlying mechanisms of HD pathology using single-cell sequencing data. RESULTS We used single-cell RNA sequencing analysis to determine differential gene expression patterns between healthy and HD cells. HD cells were effectively modeled using a residual neural network (ResNet), which outperformed traditional and convolutional neural networks. Despite the efficacy of our approach, the F1 score for the test set was 96.53%. Using the SHapley Additive exPlanations (SHAP) algorithm, we identified genes influencing HD prediction and revealed their roles in HD pathobiology, such as in the regulation of cellular iron metabolism and mitochondrial function. SHAP analysis also revealed low-abundance genes that were overlooked by traditional differential expression analysis, emphasizing its effectiveness in identifying biologically relevant genes for distinguishing between healthy and HD cells. Overall, the integration of single-cell RNA sequencing data and deep learning models provides valuable insights into HD pathology. CONCLUSION We developed the model capable of analyzing HD at single-cell transcriptomic level.
Collapse
Affiliation(s)
- Shichen Gao
- School of Life Sciences, Anhui University, Hefei, 230601, China
- College of Biology and Food Engineering, Chuzhou University, Chuzhou, 239000, China
| | - Yadong Wang
- School of Life Sciences, Anhui University, Hefei, 230601, China
- College of Biology and Food Engineering, Chuzhou University, Chuzhou, 239000, China
| | - Jiajia Wang
- College of Biology and Food Engineering, Chuzhou University, Chuzhou, 239000, China
| | - Yan Dong
- College of Biology and Food Engineering, Chuzhou University, Chuzhou, 239000, China.
| |
Collapse
|
5
|
van Hilten A, Katz S, Saccenti E, Niessen WJ, Roshchupkin GV. Designing interpretable deep learning applications for functional genomics: a quantitative analysis. Brief Bioinform 2024; 25:bbae449. [PMID: 39293804 PMCID: PMC11410376 DOI: 10.1093/bib/bbae449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 08/07/2024] [Accepted: 08/28/2024] [Indexed: 09/20/2024] Open
Abstract
Deep learning applications have had a profound impact on many scientific fields, including functional genomics. Deep learning models can learn complex interactions between and within omics data; however, interpreting and explaining these models can be challenging. Interpretability is essential not only to help progress our understanding of the biological mechanisms underlying traits and diseases but also for establishing trust in these model's efficacy for healthcare applications. Recognizing this importance, recent years have seen the development of numerous diverse interpretability strategies, making it increasingly difficult to navigate the field. In this review, we present a quantitative analysis of the challenges arising when designing interpretable deep learning solutions in functional genomics. We explore design choices related to the characteristics of genomics data, the neural network architectures applied, and strategies for interpretation. By quantifying the current state of the field with a predefined set of criteria, we find the most frequent solutions, highlight exceptional examples, and identify unexplored opportunities for developing interpretable deep learning models in genomics.
Collapse
Affiliation(s)
- Arno van Hilten
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Sonja Katz
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, 6700 HB Wageningen WE, The Netherlands
| | - Wiro J Niessen
- Department of Imaging Physics, Delft University of Technology, 2628 CD Delft, The Netherlands
| | - Gennady V Roshchupkin
- Department of Radiology and Nuclear Medicine, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- Department of Epidemiology, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| |
Collapse
|
6
|
Li M, Guo H, Wang K, Kang C, Yin Y, Zhang H. AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification. Comput Biol Med 2024; 177:108614. [PMID: 38796884 DOI: 10.1016/j.compbiomed.2024.108614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 02/27/2024] [Accepted: 05/11/2024] [Indexed: 05/29/2024]
Abstract
Integration analysis of cancer multi-omics data for pan-cancer classification has the potential for clinical applications in various aspects such as tumor diagnosis, analyzing clinically significant features, and providing precision medicine. In these applications, the embedding and feature selection on high-dimensional multi-omics data is clinically necessary. Recently, deep learning algorithms become the most promising cancer multi-omic integration analysis methods, due to the powerful capability of capturing nonlinear relationships. Developing effective deep learning architectures for cancer multi-omics embedding and feature selection remains a challenge for researchers in view of high dimensionality and heterogeneity. In this paper, we propose a novel two-phase deep learning model named AVBAE-MODFR for pan-cancer classification. AVBAE-MODFR achieves embedding by a multi2multi autoencoder based on the adversarial variational Bayes method and further performs feature selection utilizing a dual-net-based feature ranking method. AVBAE-MODFR utilizes AVBAE to pre-train the network parameters, which improves the classification performance and enhances feature ranking stability in MODFR. Firstly, AVBAE learns high-quality representation among multiple omics features for unsupervised pan-cancer classification. We design an efficient discriminator architecture to distinguish the latent distributions for updating forward variational parameters. Secondly, we propose MODFR to simultaneously evaluate multi-omics feature importance for feature selection by training a designed multi2one selector network, where the efficient evaluation approach based on the average gradient of random mask subsets can avoid bias caused by input feature drift. We conduct experiments on the TCGA pan-cancer dataset and compare it with four state-of-the-art methods for each phase. The results show the superiority of AVBAE-MODFR over SOTA methods.
Collapse
Affiliation(s)
- Minghe Li
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Huike Guo
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Keao Wang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Chuanze Kang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, NE, USA
| | - Han Zhang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin, China.
| |
Collapse
|
7
|
Keisham S, Saito S, Kowashi S, Tateno H. Droplet-Based Glycan and RNA Sequencing for Profiling the Distinct Cellular Glyco-States in Single Cells. SMALL METHODS 2024; 8:e2301338. [PMID: 38164999 DOI: 10.1002/smtd.202301338] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/18/2023] [Indexed: 01/03/2024]
Abstract
Plate-based single-cell glycan and RNA sequencing (scGR-seq) is previously developed to realize the integrated analysis of glycome and transcriptome in single cells. However, the sample size is limited to only a few hundred cells. Here, a droplet-based scGR-seq is developed to address this issue by adopting a 10x Chromium platform to simultaneously profile ten thousand cells' glycome and transcriptome in single cells. To establish droplet-based scGR-seq, a comparative analysis of two distinct cell lines is performed: pancreatic ductal adenocarcinoma cells and normal pancreatic duct cells. Droplet-based scGR-seq revealed distinct glycan profiles between the two cell lines that showed a strong correlation with the results obtained by flow cytometry. Next, droplet-based scGR-seq is applied to a more complex sample: peripheral blood mononuclear cells (PBMC) containing various immune cells. The method can systematically map the glycan signature for each immune cell in PBMC as well as glycan alterations by cell lineage. Prediction of the association between the glycan expression and the gene expression using regression analysis ultimately leads to the identification of a glycan epitope that impacts cellular functions. In conclusion, the droplet-based scGR-seq realizes the high-throughput profiling of the distinct cellular glyco-states in single cells.
Collapse
Affiliation(s)
- Sunanda Keisham
- Cellular and Molecular Biotechnology Research Institute, Multicellular System Regulation Research Group, National Institute of Advanced Industrial Science and Technology (AIST), Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8566, Japan
- Ph.D. Program in Human Biology, School of Integrative and Global Majors, University of Tsukuba, Tsukuba, Ibaraki, 305-8566, Japan
| | - Sayoko Saito
- Cellular and Molecular Biotechnology Research Institute, Multicellular System Regulation Research Group, National Institute of Advanced Industrial Science and Technology (AIST), Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8566, Japan
| | - Satori Kowashi
- Cellular and Molecular Biotechnology Research Institute, Multicellular System Regulation Research Group, National Institute of Advanced Industrial Science and Technology (AIST), Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8566, Japan
| | - Hiroaki Tateno
- Cellular and Molecular Biotechnology Research Institute, Multicellular System Regulation Research Group, National Institute of Advanced Industrial Science and Technology (AIST), Central 6, 1-1-1 Higashi, Tsukuba, Ibaraki, 305-8566, Japan
- Ph.D. Program in Human Biology, School of Integrative and Global Majors, University of Tsukuba, Tsukuba, Ibaraki, 305-8566, Japan
| |
Collapse
|
8
|
Luo Z, Wang R, Sun Y, Liu J, Chen Z, Zhang YJ. Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction. Brief Bioinform 2024; 25:bbad534. [PMID: 38279650 PMCID: PMC10818170 DOI: 10.1093/bib/bbad534] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 11/19/2023] [Accepted: 12/15/2024] [Indexed: 01/28/2024] Open
Abstract
As the application of large language models (LLMs) has broadened into the realm of biological predictions, leveraging their capacity for self-supervised learning to create feature representations of amino acid sequences, these models have set a new benchmark in tackling downstream challenges, such as subcellular localization. However, previous studies have primarily focused on either the structural design of models or differing strategies for fine-tuning, largely overlooking investigations into the nature of the features derived from LLMs. In this research, we propose different ESM2 representation extraction strategies, considering both the character type and position within the ESM2 input sequence. Using model dimensionality reduction, predictive analysis and interpretability techniques, we have illuminated potential associations between diverse feature types and specific subcellular localizations. Particularly, the prediction of Mitochondrion and Golgi apparatus prefer segments feature closer to the N-terminal, and phosphorylation site-based features could mirror phosphorylation properties. We also evaluate the prediction performance and interpretability robustness of Random Forest and Deep Neural Networks with varied feature inputs. This work offers novel insights into maximizing LLMs' utility, understanding their mechanisms, and extracting biological domain knowledge. Furthermore, we have made the code, feature extraction API, and all relevant materials available at https://github.com/yujuan-zhang/feature-representation-for-LLMs.
Collapse
Affiliation(s)
- Zeyu Luo
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| | - Rui Wang
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| | - Yawen Sun
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| | - Junhao Liu
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| | - Zongqing Chen
- School of Mathematical Sciences, Chongqing Normal University, Chongqing 400047, China
| | - Yu-Juan Zhang
- Chongqing Key Laboratory of Vector Insects, Chongqing Key Laboratory of Animal Biology, College of Life Science, Chongqing Normal University, Chongqing 401331, China
| |
Collapse
|
9
|
Widmalm G. Glycan Shape, Motions, and Interactions Explored by NMR Spectroscopy. JACS AU 2024; 4:20-39. [PMID: 38274261 PMCID: PMC10807006 DOI: 10.1021/jacsau.3c00639] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 01/27/2024]
Abstract
Glycans in the form of oligosaccharides, polysaccharides, and glycoconjugates are ubiquitous in nature, and their structures range from linear assemblies to highly branched and decorated constructs. Solution state NMR spectroscopy facilitates elucidation of preferred conformations and shapes of the saccharides, motions, and dynamic aspects related to processes over time as well as the study of transient interactions with proteins. Identification of intermolecular networks at the atomic level of detail in recognition events by carbohydrate-binding proteins known as lectins, unraveling interactions with antibodies, and revealing substrate scope and action of glycosyl transferases employed for synthesis of oligo- and polysaccharides may efficiently be analyzed by NMR spectroscopy. By utilizing NMR active nuclei present in glycans and derivatives thereof, including isotopically enriched compounds, highly detailed information can be obtained by the experiments. Subsequent analysis may be aided by quantum chemical calculations of NMR parameters, machine learning-based methodologies and artificial intelligence. Interpretation of the results from NMR experiments can be complemented by extensive molecular dynamics simulations to obtain three-dimensional dynamic models, thereby clarifying molecular recognition processes involving the glycans.
Collapse
Affiliation(s)
- Göran Widmalm
- Department of Organic Chemistry,
Arrhenius Laboratory, Stockholm University, S-106 91 Stockholm, Sweden
| |
Collapse
|
10
|
Prasad SS, Deo RC, Salcedo-Sanz S, Downs NJ, Casillas-Pérez D, Parisi AV. Enhanced joint hybrid deep neural network explainable artificial intelligence model for 1-hr ahead solar ultraviolet index prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 241:107737. [PMID: 37573641 DOI: 10.1016/j.cmpb.2023.107737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/16/2023] [Accepted: 07/27/2023] [Indexed: 08/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Exposure to solar ultraviolet (UV) radiation can cause malignant keratinocyte cancer and eye disease. Developing a user-friendly, portable, real-time solar UV alert system especially or wearable electronic mobile devices can help reduce the exposure to UV as a key measure for personal and occupational management of the UV risks. This research aims to design artificial intelligence-inspired early warning tool tailored for short-term forecasting of UV index (UVI) integrating satellite-derived and ground-based predictors for Australian hotspots receiving high UV exposures. The study further improves the trustworthiness of the newly designed tool using an explainable artificial intelligence approach. METHODS An enhanced joint hybrid explainable deep neural network model (called EJH-X-DNN) is constructed involving two phases of feature selection and hyperparameter tuning using Bayesian optimization. A comprehensive assessment of EJH-X- DNN is conducted with six other competing benchmarked models. The proposed model is explained locally and globally using robust model-agnostic explainable artificial intelligence frameworks such as Local Interpretable Model-Agnostic Explanations (LIME), Shapley additive explanations (SHAP), and permutation feature importance (PFI). RESULTS The newly proposed model outperformed all benchmarked models for forecasting hourly horizons UVI, with correlation coefficients of 0.900, 0.960, 0.897, and 0.913, respectively, for Darwin, Alice Springs, Townsville, and Emerald hotspots. According to the combined local and global explainable model outcomes, the site-based results indicate that antecedent lagged memory of UVI and solar zenith angle are influential features. Predictions made by EJH-X-DNN model are strongly influenced by factors such as ozone effect, cloud conditions, and precipitation. CONCLUSION With its superiority and skillful interpretation, the UVI prediction system reaffirms its benefits for providing real-time UV alerts to mitigate risks of skin and eye health complications, reducing healthcare costs and contributing to outdoor exposure policy.
Collapse
Affiliation(s)
- Salvin S Prasad
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, QLD 4300, Australia.
| | - Ravinesh C Deo
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, QLD 4300, Australia.
| | - Sancho Salcedo-Sanz
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, QLD 4300, Australia; Department of Signal Processing and Communications, Universidad de Alcalá, Alcalá de Henares, 28805, Madrid, Spain.
| | - Nathan J Downs
- School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD 4350, Australia.
| | - David Casillas-Pérez
- Department of Signal Processing and Communications, Universidad Rey Juan Carlos, Fuenlabrada, 28942, Madrid, Spain.
| | - Alfio V Parisi
- School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD 4350, Australia.
| |
Collapse
|