1
|
Xiong Y, Chen C, He C, Yang X, Cheng W. Identification of shared gene signatures and biological mechanisms between preeclampsia and polycystic ovary syndrome. Heliyon 2024; 10:e29225. [PMID: 38638956 PMCID: PMC11024567 DOI: 10.1016/j.heliyon.2024.e29225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/24/2024] [Accepted: 04/03/2024] [Indexed: 04/20/2024] Open
Abstract
Preeclampsia (PE) is one of the most common complications of pregnancy and polycystic ovary syndrome (PCOS) is a prevalent metabolic and endocrinopathy disorder in women of reproductive age. Identifying the shared genetic signatures and molecular mechanisms between PCOS and PE was the objective of this study. The intersections of WGCNA module genes, PPI module genes, and PPI hub genes revealed that 8 immunity-related genes might be shared causative genes of PE and PCOS. Further, qRT-PCR results showed that TSIX/miR-223-3p/DDX58 might play a crucial role in immune dysregulation in PE and PCOS and Spearman rank correlation analysis results illustrated the potential of DDX58 as a novel diagnostic and therapeutic target for PE and PCOS. Our study demonstrated a common disease pathway model TSIX/miR-223-3p/DDX58, illustrating that immune dysregulation may be a possible mechanism of PE and PCOS, and revealed that DDX58 might be a novel predictive target for PE and PCOS.
Collapse
Affiliation(s)
- Yaoxi Xiong
- International Peace Maternity and Child Health Hospital, School of Medicine, Shanghai Jiao Tong University, 200030, Shanghai, China
- Shanghai Key Laboratory of Embryo Original Disease, 200030, Shanghai, China
| | - Chao Chen
- International Peace Maternity and Child Health Hospital, School of Medicine, Shanghai Jiao Tong University, 200030, Shanghai, China
- Shanghai Key Laboratory of Embryo Original Disease, 200030, Shanghai, China
| | - Chengrong He
- International Peace Maternity and Child Health Hospital, School of Medicine, Shanghai Jiao Tong University, 200030, Shanghai, China
- Shanghai Key Laboratory of Embryo Original Disease, 200030, Shanghai, China
| | - Xingyu Yang
- International Peace Maternity and Child Health Hospital, School of Medicine, Shanghai Jiao Tong University, 200030, Shanghai, China
- Shanghai Key Laboratory of Embryo Original Disease, 200030, Shanghai, China
| | - Weiwei Cheng
- International Peace Maternity and Child Health Hospital, School of Medicine, Shanghai Jiao Tong University, 200030, Shanghai, China
| |
Collapse
|
2
|
Barakat A, Munro G, Heegaard AM. Finding new analgesics: Computational pharmacology faces drug discovery challenges. Biochem Pharmacol 2024; 222:116091. [PMID: 38412924 DOI: 10.1016/j.bcp.2024.116091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/10/2024] [Accepted: 02/23/2024] [Indexed: 02/29/2024]
Abstract
Despite the worldwide prevalence and huge burden of pain, pain is an undertreated phenomenon. Currently used analgesics have several limitations regarding their efficacy and safety. The discovery of analgesics possessing a novel mechanism of action has faced multiple challenges, including a limited understanding of biological processes underpinning pain and analgesia and poor animal-to-human translation. Computational pharmacology is currently employed to face these challenges. In this review, we discuss the theory, methods, and applications of computational pharmacology in pain research. Computational pharmacology encompasses a wide variety of theoretical concepts and practical methodological approaches, with the overall aim of gaining biological insight through data acquisition and analysis. Data are acquired from patients or animal models with pain or analgesic treatment, at different levels of biological organization (molecular, cellular, physiological, and behavioral). Distinct methodological algorithms can then be used to analyze and integrate data. This helps to facilitate the identification of biological molecules and processes associated with pain phenotype, build quantitative models of pain signaling, and extract translatable features between humans and animals. However, computational pharmacology has several limitations, and its predictions can provide false positive and negative findings. Therefore, computational predictions are required to be validated experimentally before drawing solid conclusions. In this review, we discuss several case study examples of combining and integrating computational tools with experimental pain research tools to meet drug discovery challenges.
Collapse
Affiliation(s)
- Ahmed Barakat
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Department of Pharmacology and Toxicology, Faculty of Pharmacy, Assiut University, Assiut, Egypt.
| | | | - Anne-Marie Heegaard
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
3
|
Bao LX, Luo ZM, Zhu XL, Xu YY. Automated identification of protein expression intensity and classification of protein cellular locations in mouse brain regions from immunofluorescence images. Med Biol Eng Comput 2024; 62:1105-1119. [PMID: 38150111 DOI: 10.1007/s11517-023-02985-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 11/28/2023] [Indexed: 12/28/2023]
Abstract
Knowledge of protein expression in mammalian brains at regional and cellular levels can facilitate understanding of protein functions and associated diseases. As the mouse brain is a typical mammalian brain considering cell type and structure, several studies have been conducted to analyze protein expression in mouse brains. However, labeling protein expression using biotechnology is costly and time-consuming. Therefore, automated models that can accurately recognize protein expression are needed. Here, we constructed machine learning models to automatically annotate the protein expression intensity and cellular location in different mouse brain regions from immunofluorescence images. The brain regions and sub-regions were segmented through learning image features using an autoencoder and then performing K-means clustering and registration to align with the anatomical references. The protein expression intensities for those segmented structures were computed on the basis of the statistics of the image pixels, and patch-based weakly supervised methods and multi-instance learning were used to classify the cellular locations. Results demonstrated that the models achieved high accuracy in the expression intensity estimation, and the F1 score of the cellular location prediction was 74.5%. This work established an automated pipeline for analyzing mouse brain images and provided a foundation for further study of protein expression and functions.
Collapse
Affiliation(s)
- Lin-Xia Bao
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Zhuo-Ming Luo
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Xi-Liang Zhu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China.
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China.
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China.
| |
Collapse
|
4
|
Luximon DC, Neylon J, Ritter T, Agazaryan N, Hegde JV, Steinberg ML, Low DA, Lamb JM. Results of an Artificial Intelligence-Based Image Review System to Detect Patient Misalignment Errors in a Multi-institutional Database of Cone Beam Computed Tomography-Guided Radiation Therapy. Int J Radiat Oncol Biol Phys 2024:S0360-3016(24)00392-4. [PMID: 38485098 DOI: 10.1016/j.ijrobp.2024.02.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 02/15/2024] [Accepted: 02/28/2024] [Indexed: 04/17/2024]
Abstract
PURPOSE Present knowledge of patient setup and alignment errors in image guided radiation therapy (IGRT) relies on voluntary reporting, which is thought to underestimate error frequencies. A manual retrospective patient-setup misalignment error search is infeasible owing to the bulk of cases to be reviewed. We applied a deep learning-based misalignment error detection algorithm (EDA) to perform a fully automated retrospective error search of clinical IGRT databases and determine an absolute gross patient misalignment error rate. METHODS AND MATERIALS The EDA was developed to analyze the registration between planning scans and pretreatment cone beam computed tomography scans, outputting a misalignment score ranging from 0 (most unlikely) to 1 (most likely). The algorithm was trained using simulated translational errors on a data set obtained from 680 patients treated at 2 radiation therapy clinics between 2017 and 2022. A receiver operating characteristic analysis was performed to obtain target thresholds. DICOM Query and Retrieval software was integrated with the EDA to interact with the clinical database and fully automate data retrieval and analysis during a retrospective error search from 2016 to 2017 and from 2021 to 2022 for the 2 institutions, respectively. Registrations were flagged for human review using both a hard-thresholding method and a prediction trending analysis over each individual patient's treatment course. Flagged registrations were manually reviewed and categorized as errors (>1 cm misalignment at the target) or nonerrors. RESULTS A total of 17,612 registrations were analyzed by the EDA, resulting in 7.7% flagged events. Three previously reported errors were successfully flagged by the EDA, and 4 previously unreported vertebral body misalignment errors were discovered during case reviews. False positive cases often displayed substantial image artifacts, patient rotation, and soft tissue anatomy changes. CONCLUSIONS Our results validated the clinical utility of the EDA for bulk image reviews and highlighted the reliability and safety of IGRT, with an absolute gross patient misalignment error rate of 0.04% ± 0.02% per delivered fraction.
Collapse
Affiliation(s)
- Dishane C Luximon
- Department of Radiation Oncology, David Geffen School of Medicine, University of California, Los Angeles, California.
| | - Jack Neylon
- Department of Radiation Oncology, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Timothy Ritter
- Department of Medical Physics, Virginia Commonwealth University, Richmond, Virginia
| | - Nzhde Agazaryan
- Department of Radiation Oncology, David Geffen School of Medicine, University of California, Los Angeles, California
| | - John V Hegde
- Department of Radiation Oncology, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Michael L Steinberg
- Department of Radiation Oncology, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Daniel A Low
- Department of Radiation Oncology, David Geffen School of Medicine, University of California, Los Angeles, California
| | - James M Lamb
- Department of Radiation Oncology, David Geffen School of Medicine, University of California, Los Angeles, California
| |
Collapse
|
5
|
Wang T, Li Z, Zhao S, Liu Y, Guo W, Alarcòn Rodrìguez R, Wu Y, Wei R. Characterizing hedgehog pathway features in senescence associated osteoarthritis through Integrative multi-omics and machine learning analysis. Front Genet 2024; 15:1255455. [PMID: 38444758 PMCID: PMC10912584 DOI: 10.3389/fgene.2024.1255455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 02/06/2024] [Indexed: 03/07/2024] Open
Abstract
Purpose: Osteoarthritis (OA) is a disease of senescence and inflammation. Hedgehog's role in OA mechanisms is unclear. This study combines Bulk RNA-seq and scRNA-seq to identify Hedgehog-associated genes in OA, investigating their impact on the pathogenesis of OA. Materials and methods: Download and merge eight bulk-RNA seq datasets from GEO, also obtain a scRNA-seq dataset for validation and analysis. Analyze Hedgehog pathway activity in OA using bulk-RNA seq datasets. Use ten machine learning algorithms to identify important Hedgehog-associated genes, validate predictive models. Perform GSEA to investigate functional implications of identified Hedgehog-associated genes. Assess immune infiltration in OA using Cibersort and MCP-counter algorithms. Utilize ConsensusClusterPlus package to identify Hedgehog-related subgroups. Conduct WGCNA to identify key modules enriched based on Hedgehog-related subgroups. Characterization of genes by methylation and GWAS analysis. Evaluate Hedgehog pathway activity, expression of hub genes, pseudotime, and cell communication, in OA chondrocytes using scRNA-seq dataset. Validate Hedgehog-associated gene expression levels through Real-time PCR analysis. Results: The activity of the Hedgehog pathway is significantly enhanced in OA. Additionally, nine important Hedgehog-associated genes have been identified, and the predictive models built using these genes demonstrate strong predictive capabilities. GSEA analysis indicates a significant positive correlation between all seven important Hedgehog-associated genes and lysosomes. Consensus clustering reveals the presence of two hedgehog-related subgroups. In Cluster 1, Hedgehog pathway activity is significantly upregulated and associated with inflammatory pathways. WGCNA identifies that genes in the blue module are most significantly correlated with Cluster 1 and Cluster 2, as well as being involved in extracellular matrix and collagen-related pathways. Single-cell analysis confirms the significant upregulation of the Hedgehog pathway in OA, along with expression changes observed in 5 genes during putative temporal progression. Cell communication analysis suggests an association between low-scoring chondrocytes and macrophages. Conclusion: The Hedgehog pathway is significantly activated in OA and is associated with the extracellular matrix and collagen proteins. It plays a role in regulating immune cells and immune responses.
Collapse
Affiliation(s)
- Tao Wang
- Department of Orthopedic Joint, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Zhengrui Li
- Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shijian Zhao
- Department of Cardiology, The Affiliated Cardiovascular Hospital of Kunming Medical University (Fuwai Yunnan Cardiovascular Hospital), Kunming, Yunnan, China
| | - Ying Liu
- Department of Rehabilitation Medicine, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Wenliang Guo
- Department of Rehabilitation Medicine, The Eighth Affiliated Hospital of Guangxi Medical University, Guigang, Guangxi, China
| | | | - Yinteng Wu
- Department of Orthopedic and Trauma Surgery, the First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Ruqiong Wei
- Department of Rehabilitation Medicine, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
6
|
Dube F, Delhomme N, Martin F, Hinas A, Åbrink M, Svärd S, Tydén E. Gene co-expression network analysis reveal core responsive genes in Parascaris univalens tissues following ivermectin exposure. PLoS One 2024; 19:e0298039. [PMID: 38359071 PMCID: PMC10868809 DOI: 10.1371/journal.pone.0298039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 01/17/2024] [Indexed: 02/17/2024] Open
Abstract
Anthelmintic resistance in equine parasite Parascaris univalens, compromises ivermectin (IVM) effectiveness and necessitates an in-depth understanding of its resistance mechanisms. Most research, primarily focused on holistic gene expression analyses, may overlook vital tissue-specific responses and often limit the scope of novel genes. This study leveraged gene co-expression network analysis to elucidate tissue-specific transcriptional responses and to identify core genes implicated in the IVM response in P. univalens. Adult worms (n = 28) were exposed to 10-11 M and 10-9 M IVM in vitro for 24 hours. RNA-sequencing examined transcriptional changes in the anterior end and intestine. Differential expression analysis revealed pronounced tissue differences, with the intestine exhibiting substantially more IVM-induced transcriptional activity. Gene co-expression network analysis identified seven modules significantly associated with the response to IVM. Within these, 219 core genes were detected, largely expressed in the intestinal tissue and spanning diverse biological processes with unspecific patterns. After 10-11 M IVM, intestinal tissue core genes showed transcriptional suppression, cell cycle inhibition, and ribosomal alterations. Interestingly, genes PgR028_g047 (sorb-1), PgB01_g200 (gmap-1) and PgR046_g017 (col-37 & col-102) switched from downregulation at 10-11 M to upregulation at 10-9 M IVM. The 10-9 M concentration induced expression of cuticle and membrane integrity core genes in the intestinal tissue. No clear core gene patterns were visible in the anterior end after 10-11 M IVM. However, after 10-9 M IVM, the anterior end mostly displayed downregulation, indicating disrupted transcriptional regulation. One interesting finding was the non-modular calcium-signaling gene, PgR047_g066 (gegf-1), which uniquely connected 71 genes across four modules. These genes were enriched for transmembrane signaling activity, suggesting that PgR047_g066 (gegf-1) could have a key signaling role. By unveiling tissue-specific expression patterns and highlighting biological processes through unbiased core gene detection, this study reveals intricate IVM responses in P. univalens. These findings suggest alternative drug uptake of IVM and can guide functional validations to further IVM resistance mechanism understanding.
Collapse
Affiliation(s)
- Faruk Dube
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Nicolas Delhomme
- Umeå Plant Science Centre (UPSC), Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, Umeå, Sweden
| | - Frida Martin
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Andrea Hinas
- Department of Cell and Molecular Biology, Uppsala University, Uppsala Sweden
| | - Magnus Åbrink
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Staffan Svärd
- Department of Cell and Molecular Biology, Uppsala University, Uppsala Sweden
| | - Eva Tydén
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
7
|
Shafique A, Gonzalez R, Pantanowitz L, Tan PH, Machado A, Cree IA, Tizhoosh HR. A Preliminary Investigation into Search and Matching for Tumor Discrimination in World Health Organization Breast Taxonomy Using Deep Networks. Mod Pathol 2024; 37:100381. [PMID: 37939901 PMCID: PMC10891482 DOI: 10.1016/j.modpat.2023.100381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/26/2023] [Accepted: 10/31/2023] [Indexed: 11/10/2023]
Abstract
Breast cancer is one of the most common cancers affecting women worldwide. It includes a group of malignant neoplasms with a variety of biological, clinical, and histopathologic characteristics. There are more than 35 different histologic forms of breast lesions that can be classified and diagnosed histologically according to cell morphology, growth, and architecture patterns. Recently, deep learning, in the field of artificial intelligence, has drawn a lot of attention for the computerized representation of medical images. Searchable digital atlases can provide pathologists with patch-matching tools, allowing them to search among evidently diagnosed and treated archival cases, a technology that may be regarded as computational second opinion. In this study, we indexed and analyzed the World Health Organization breast taxonomy (Classification of Tumors fifth ed.) spanning 35 tumor types. We visualized all tumor types using deep features extracted from a state-of-the-art deep-learning model, pretrained on millions of diagnostic histopathology images from the Cancer Genome Atlas repository. Furthermore, we tested the concept of a digital "atlas" as a reference for search and matching with rare test cases. The patch similarity search within the World Health Organization breast taxonomy data reached >88% accuracy when validating through "majority vote" and >91% accuracy when validating using top n tumor types. These results show for the first time that complex relationships among common and rare breast lesions can be investigated using an indexed digital archive.
Collapse
Affiliation(s)
- Abubakr Shafique
- Rhazes Lab, Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota; Kimia Lab, University of Waterloo, Waterloo, Ontario, Canada
| | - Ricardo Gonzalez
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - Liron Pantanowitz
- Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | - Puay Hoon Tan
- Women's Imaging Centre, Luma Medical Centre, Singapore
| | - Alberto Machado
- WHO Classification of Tumours Group, International Agency for Research on Cancer, Lyon, France
| | - Ian A Cree
- WHO Classification of Tumours Group, International Agency for Research on Cancer, Lyon, France
| | - Hamid R Tizhoosh
- Rhazes Lab, Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota; Kimia Lab, University of Waterloo, Waterloo, Ontario, Canada.
| |
Collapse
|
8
|
Khine AH, Wettayaprasit W, Duangsuwan J. A new word embedding model integrated with medical knowledge for deep learning-based sentiment classification. Artif Intell Med 2024; 148:102758. [PMID: 38325934 DOI: 10.1016/j.artmed.2023.102758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 11/19/2023] [Accepted: 12/29/2023] [Indexed: 02/09/2024]
Abstract
The development of intelligent systems that use social media data for decision-making processes in numerous domains such as politics, business, marketing, and finance, has been made possible by the popularity of social media platforms. However, the utilization of textual data from social media in the healthcare management industry is still somewhat limited when it is compared to other industries. Investigating how current machine learning and natural language processing technologies can be used in the healthcare industry to gauge public sentiment is an important study. Earlier works on healthcare sentiment analysis have utilized traditional word embedding models trained on the general and medical corpus. However, integration of medical knowledge to pre-trained word embedding models has not been considered yet. Word embedding models trained on the general corpus led to the problem of lacking medical knowledge and the models trained on the small size of the medical corpus have limitations in capturing semantic and syntactic properties. This research proposes a new word embedding model named Word Embedding Integrated with Medical Knowledge Vector (WE-iMKVec). The proposed model integrates sentiment lexicons and medical knowledgebases into the pre-trained word embedding to enrich the properties of word embedding. A new medical-aware sentiment polarity score is proposed for the utilization in learning neural-network sentiment and these vectors incorporate with the original pre-trained word vectors. The resulting vectors are enriched with lexicon vectors and the medical knowledge vectors: Adverse Drug Reaction (ADR) vector and Unified Medical Language System (UMLS) vector are used to build the proposed WE-iMKVec model. WE-iMKVec is validated on the five different social media healthcare review datasets and the empirical results showed its superiority over traditional word embedding models in medical sentiment analysis. The highest improvement can be found in the patients.info medical condition dataset where the proposed model outperforms three conventional word2vec models (Google-News, PubMed-PMC, and Drug Reviews) by 12.7 %, 31.4 %, and 25.4 % respectively in terms of F1 score.
Collapse
Affiliation(s)
- Aye Hninn Khine
- Artificial Intelligence Research Lab, Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Thailand
| | - Wiphada Wettayaprasit
- Artificial Intelligence Research Lab, Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Thailand
| | - Jarunee Duangsuwan
- Artificial Intelligence Research Lab, Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Thailand.
| |
Collapse
|
9
|
Xiang J, Sun Y, Wu X, Guo Y, Xue J, Niu Y, Cui X. Abnormal Spatial and Temporal Overlap of Time-Varying Brain Functional Networks in Patients with Schizophrenia. Brain Sci 2023; 14:40. [PMID: 38248255 PMCID: PMC10813230 DOI: 10.3390/brainsci14010040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 12/25/2023] [Accepted: 12/27/2023] [Indexed: 01/23/2024] Open
Abstract
Schizophrenia (SZ) is a complex psychiatric disorder with unclear etiology and pathological features. Neuroscientists are increasingly proposing that schizophrenia is an abnormality in the dynamic organization of brain networks. Previous studies have found that the dynamic brain networks of people with SZ are abnormal in both space and time. However, little is known about the interactions and overlaps between hubs of the brain underlying spatiotemporal dynamics. In this study, we aimed to investigate different patterns of spatial and temporal overlap of hubs between SZ patients and healthy individuals. Specifically, we obtained resting-state functional magnetic resonance imaging data from the public dataset for 43 SZ patients and 49 healthy individuals. We derived a representation of time-varying functional connectivity using the Jackknife Correlation (JC) method. We employed the Betweenness Centrality (BC) method to identify the hubs of the brain's functional connectivity network. We then applied measures of temporal overlap, spatial overlap, and hierarchical clustering to investigate differences in the organization of brain hubs between SZ patients and healthy controls. Our findings suggest significant differences between SZ patients and healthy controls at the whole-brain and subnetwork levels. Furthermore, spatial overlap and hierarchical clustering analysis showed that quasi-periodic patterns were disrupted in SZ patients. Analyses of temporal overlap revealed abnormal pairwise engagement preferences in the hubs of SZ patients. These results provide new insights into the dynamic characteristics of the network organization of the SZ brain.
Collapse
Affiliation(s)
- Jie Xiang
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China; (J.X.); (Y.S.); (X.W.); (J.X.); (Y.N.)
| | - Yumeng Sun
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China; (J.X.); (Y.S.); (X.W.); (J.X.); (Y.N.)
| | - Xubin Wu
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China; (J.X.); (Y.S.); (X.W.); (J.X.); (Y.N.)
| | - Yuxiang Guo
- School of Software, Taiyuan University of Technology, Taiyuan 030024, China;
| | - Jiayue Xue
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China; (J.X.); (Y.S.); (X.W.); (J.X.); (Y.N.)
| | - Yan Niu
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China; (J.X.); (Y.S.); (X.W.); (J.X.); (Y.N.)
| | - Xiaohong Cui
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China; (J.X.); (Y.S.); (X.W.); (J.X.); (Y.N.)
| |
Collapse
|
10
|
Chen QS, Bergman O, Ziegler L, Baldassarre D, Veglia F, Tremoli E, Strawbridge RJ, Gallo A, Pirro M, Smit AJ, Kurl S, Savonen K, Lind L, Eriksson P, Gigante B. A machine learning based approach to identify carotid subclinical atherosclerosis endotypes. Cardiovasc Res 2023; 119:2594-2606. [PMID: 37475157 PMCID: PMC10730242 DOI: 10.1093/cvr/cvad106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 03/12/2023] [Accepted: 05/05/2023] [Indexed: 07/22/2023] Open
Abstract
AIMS To define endotypes of carotid subclinical atherosclerosis. METHODS AND RESULTS We integrated demographic, clinical, and molecular data (n = 124) with ultrasonographic carotid measurements from study participants in the IMPROVE cohort (n = 3340). We applied a neural network algorithm and hierarchical clustering to identify carotid atherosclerosis endotypes. A measure of carotid subclinical atherosclerosis, the c-IMTmean-max, was used to extract atherosclerosis-related features and SHapley Additive exPlanations (SHAP) to reveal endotypes. The association of endotypes with carotid ultrasonographic measurements at baseline, after 30 months, and with the 3-year atherosclerotic cardiovascular disease (ASCVD) risk was estimated by linear (β, SE) and Cox [hazard ratio (HR), 95% confidence interval (CI)] regression models. Crude estimates were adjusted by common cardiovascular risk factors, and baseline ultrasonographic measures. Improvement in ASCVD risk prediction was evaluated by C-statistic and by net reclassification improvement with reference to SCORE2, c-IMTmean-max, and presence of carotid plaques. An ensemble stacking model was used to predict endotypes in an independent validation cohort, the PIVUS (n = 1061). We identified four endotypes able to differentiate carotid atherosclerosis risk profiles from mild (endotype 1) to severe (endotype 4). SHAP identified endotype-shared variables (age, biological sex, and systolic blood pressure) and endotype-specific biomarkers. In the IMPROVE, as compared to endotype 1, endotype 4 associated with the thickest c-IMT at baseline (β, SE) 0.36 (0.014), the highest number of plaques 1.65 (0.075), the fastest c-IMT progression 0.06 (0.013), and the highest ASCVD risk (HR, 95% CI) (1.95, 1.18-3.23). Baseline and progression measures of carotid subclinical atherosclerosis and ASCVD risk were associated with the predicted endotypes in the PIVUS. Endotypes consistently improved measures of ASCVD risk discrimination and reclassification in both study populations. CONCLUSIONS We report four replicable subclinical carotid atherosclerosis-endotypes associated with progression of atherosclerosis and ASCVD risk in two independent populations. Our approach based on endotypes can be applied for precision medicine in ASCVD prevention.
Collapse
Affiliation(s)
- Qiao Sen Chen
- Division of Cardiovascular Medicine, Department of Medicine Solna, Karolinska Institutet, Solnavägen 30, 171 64 Stockholm, Sweden
| | - Otto Bergman
- Division of Cardiovascular Medicine, Department of Medicine Solna, Karolinska Institutet, Solnavägen 30, 171 64 Stockholm, Sweden
| | - Louise Ziegler
- Division of Medicine and Department of Clinical Sciences, Danderyd Hospital, Karolinska Institutet, Entrevägen 2, 182 88 Stockholm, Sweden
| | - Damiano Baldassarre
- Department of Medical Biotechnology and Translational Medicine, Università di Milano, Via Vanvitelli 32, 20133 Milan, Italy
- Centro Cardiologico Monzino, IRCCS, Via Carlo Parea 4, 20138 Milan, Italy
| | - Fabrizio Veglia
- Maria Cecilia Hospital, GVM Care & Research, Via Corriera 1, 48033 Cotignola (RA), Italy
| | - Elena Tremoli
- Maria Cecilia Hospital, GVM Care & Research, Via Corriera 1, 48033 Cotignola (RA), Italy
| | - Rona J Strawbridge
- Division of Cardiovascular Medicine, Department of Medicine Solna, Karolinska Institutet, Solnavägen 30, 171 64 Stockholm, Sweden
- Institute of Health and Wellbeing, University of Glasgow, Clarice Pears Building, 90 Byres Road, Glasgow G12 8TB, UK
- Health Data Research, Clarice Pears Building, 90 Byres Road, Glasgow G12 8TB, UK
| | - Antonio Gallo
- Lipidology and Cardiovascular Prevention Unit, Department of Nutrition, Sorbonne Université, INSERM UMR1166, APHP, Hôpital Pitié-Salpètriêre, 47 Boulevard de l´Hopital, 75013 Paris, France
| | - Matteo Pirro
- Internal Medicine, Angiology and Arteriosclerosis Diseases, Department of Medicine, University of Perugia, Piazzale Menghini 1, 06129 Perugia, Italy
| | - Andries J Smit
- Department of Medicine, University Medical Center Groningen, Groningen & Isala Clinics Zwolle, Dokter Spanjaardweg 29B, 8025 BT Groningen, the Netherlands
| | - Sudhir Kurl
- Institute of Public Health and Clinical Nutrition, University of Eastern Finland, Kuopio Campus, Yliopistonranta 1 C, Canthia Building, B Wing, FI-70211 Kuopio, Finland
| | - Kai Savonen
- Kuopio Research Institute of Exercise Medicine, Haapaniementie 16, FI-70100 Kuopio, Finland
- Department of Clinical Physiology and Nuclear Medicine, Science Service Center, Kuopio University Hospital, Yliopsistonranta 1F, FI-70211 Kuopio, Finland
| | - Lars Lind
- Department of Medical Sciences, Uppsala University, Uppsala Science Park, Dag Hammarskjöldsv 10B, 752 37 Uppsala, Sweden
| | - Per Eriksson
- Division of Cardiovascular Medicine, Department of Medicine Solna, Karolinska Institutet, Solnavägen 30, 171 64 Stockholm, Sweden
| | - Bruna Gigante
- Division of Cardiovascular Medicine, Department of Medicine Solna, Karolinska Institutet, Solnavägen 30, 171 64 Stockholm, Sweden
- Department of Cardiology, Danderyd University Hospital, Entrevägen 2, 182 88 Stockholm, Sweden
| |
Collapse
|
11
|
Su D, Xiong Y, Wang S, Wei H, Ke J, Li H, Wang T, Zuo Y, Yang L. Structural deep clustering network for stratification of breast cancer patients through integration of somatic mutation profiles. Comput Methods Programs Biomed 2023; 242:107808. [PMID: 37716222 DOI: 10.1016/j.cmpb.2023.107808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/15/2023] [Accepted: 09/10/2023] [Indexed: 09/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Breast cancer is among of the most malignant tumor that occurs in women and is one of the leading causes of death from gynecologic malignancy worldwide. The high degree of heterogeneity that characterizes breast cancer makes it challenging to devise effective therapeutic strategies. Accumulating evidence highlights the crucial role of stratifying breast cancer patients into clinically significant subtypes to achieve better prognoses and treatments. The structural deep clustering network is a graph convolutional network-based clustering algorithm that integrates structural information and has achieved state-of-the-art performance in various applications. METHODS In this study, we employed structural deep clustering network to integrate somatic mutation profiles for stratifying 2526 breast cancer patients from the Memorial Sloan Kettering Cancer Center into two clinically differentiable subtypes. RESULTS Breast cancer patients in cluster 1 exhibited better prognosis than breast cancer patients in cluster 2, and the difference between them was statistically significant. The immunogenomic landscape further demonstrated that cluster 1 was associated with remarkable infiltration of the tumor infiltrating lymphocytes. The clustering subtype could be used to evaluate the therapeutic benefit of immunotherapy and chemotherapy in breast cancer patients. Furthermore, our approach effectively classified patients from eight different cancer types, demonstrating its generalizability. CONCLUSIONS Our study represents a step towards a generic methodology for classifying cancer patients using only somatic mutation data and structural deep clustering network approaches. Employing structural deep clustering network to identify breast cancer subtypes is promising and can inform the development of more accurate and personalized therapies.
Collapse
Affiliation(s)
- Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yuqiang Xiong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Haodong Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Jiawei Ke
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Honghao Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Tao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China; Digital College, Inner Mongolia Intelligent Union Big Data Academy, Inner Mongolia Wesure Date Technology Co., Ltd. Hohhot, 010010, China; Inner Mongolia International Mongolian Hospital, Hohhot 010065, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
12
|
Li L, Li H, Yang C, Tang Y, Wang Y, Yang H, Zhang W, Jiang F, Ji S. Multiscale levels CO 2 decouple reinforcement in China. Environ Sci Pollut Res Int 2023; 30:121569-121583. [PMID: 37953427 DOI: 10.1007/s11356-023-30931-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 11/02/2023] [Indexed: 11/14/2023]
Abstract
Decoupling economic growth from CO2 emissions is imperative for China. Meanwhile, establishing a consistent and comprehensive decoupling inventory that includes national (N), regional and provincial (RP), and city and county (CC) levels is essential for further policy formulation. This research aims to investigate the decoupling status using the "N-RP-CC" approach while considering changes in decoupling trends at the different levels. A combination of the Tapio decoupling model and cluster analysis is employed to study the decoupling's spatiotemporal characteristics and trends. The study first calculates the decoupling value for "national, 7; regions, 30; provinces, 1501 CCs" in China, 2006-2017. The results show that there continues to be an improvement in the decoupling trend at the national level. Conversely, the regional scale exhibits a more vulnerable decoupling trend compared to the national level, with weak and extended negative decoupling observed in northeastern and northern China. Moreover, provincial heterogeneities are increasingly evident, with poor decoupling statuses appearing in Jilin, Heilongjiang, Liaoning, and Xinjiang, as well as many central provinces. Additionally, although more than half of CCs exhibit weak decoupling during most years, seven different states of decoupling were also identified during the time frame. These findings further indicate that spatiotemporal heterogeneities extend beyond RP scales within CCs. Taking the Yangtze River as a boundary line reveals a severe situation in northern areas along with rapid development trends observed in southern regions. Finally, we clustered 1414 CCs based on their industrial proportions for 2017 which further highlights increasingly prominent heterogeneities that should be carefully considered. Based on these findings, policy recommendations such as spatial organization and optimization and technique investment are proposed to achieve CO2 emission decoupling under the N-RP-CC levels.
Collapse
Affiliation(s)
- Lei Li
- School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
- Research Center of Lake Restoration Technology Engineering for Universities of Yunnan Province (Yunnan University), School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
| | - Huiying Li
- Research Center of Lake Restoration Technology Engineering for Universities of Yunnan Province (Yunnan University), School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
- Institute of International Rivers and Eco-Security, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
| | - Chuanhua Yang
- School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
- Research Center of Lake Restoration Technology Engineering for Universities of Yunnan Province (Yunnan University), School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
| | - Yue Tang
- School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
- Research Center of Lake Restoration Technology Engineering for Universities of Yunnan Province (Yunnan University), School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
| | - Yujian Wang
- School of Chemical Science and Technology, Yunnan Minzu University, 2929 Yuehua Street, Kunming, 650500, China
| | - HongJuan Yang
- Faculty of Management and Economics, Kunming University of Science and Technology, No. 727 Jingming South Road, Kunming, 650500, China
| | - Weishi Zhang
- School of Geographic and Environmental Sciences, Tianjin Normal University, No.393, Extension of Bin Shui West Road, Xi Qing District, Tianjin, 300387, China
| | - Fengzhi Jiang
- School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
- Research Center of Lake Restoration Technology Engineering for Universities of Yunnan Province (Yunnan University), School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China
- Workstation of Academician Chen Jing of Yunnan Province, University City East Outer Ring South Road, Kunming, 650500, China
| | - Siping Ji
- School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China.
- Research Center of Lake Restoration Technology Engineering for Universities of Yunnan Province (Yunnan University), School of Chemical Science and Technology, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, China.
- School of Chemistry Science and Engineering, Yunnan University, University City East Outer Ring South Road, Kunming, 650500, Yunnan Province, China.
| |
Collapse
|
13
|
Bailleux C, Chardin D, Guigonis JM, Ferrero JM, Chateau Y, Humbert O, Pourcher T, Gal J. Survival analysis of patient groups defined by unsupervised machine learning clustering methods based on patient metabolomic data. Comput Struct Biotechnol J 2023; 21:5136-5143. [PMID: 37920813 PMCID: PMC10618114 DOI: 10.1016/j.csbj.2023.10.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 10/16/2023] [Accepted: 10/16/2023] [Indexed: 11/04/2023] Open
Abstract
Purpose Meta-analyses failed to accurately identify patients with non-metastatic breast cancer who are likely to benefit from chemotherapy, and metabolomics could provide new answers. In our previous published work, patients were clustered using five different unsupervised machine learning (ML) methods resulting in the identification of three clusters with distinct clinical and simulated survival data. The objective of this study was to evaluate the survival outcomes, with extended follow-up, using the same 5 different methods of unsupervised machine learning. Experimental design Forty-nine patients, diagnosed between 2013 and 2016, with non-metastatic BC were included retrospectively. Median follow-up was extended to 85.8 months. 449 metabolites were extracted from tumor resection samples by combined Liquid chromatography-mass spectrometry (LC-MS). Survival analyses were reported grouping together Cluster 1 and 2 versus cluster 3. Bootstrap optimization was applied. Results PCA k-means, K-sparse and Spectral clustering were the most effective methods to predict 2-year progression-free survival with bootstrap optimization (PFSb); as bootstrap example, with PCA k-means method, PFSb were 94% for cluster 1&2 versus 82% for cluster 3 (p = 0.01). PCA k-means method performed best, with higher reproducibility (mean HR=2 (95%CI [1.4-2.7]); probability of p ≤ 0.05 85%). Cancer-specific survival (CSS) and overall survival (OS) analyses highlighted a discrepancy between the 5 ML unsupervised methods. Conclusion Our study is a proof-of-principle that it is possible to use unsupervised ML methods on metabolomic data to predict PFS survival outcomes, with the best performance for PCA k-means. A larger population study is needed to draw conclusions from CSS and OS analyses.
Collapse
Affiliation(s)
- Caroline Bailleux
- University Côte d′Azur, Centre Antoine Lacassagne, Medical Oncology Department, Nice F-06189, France
- University Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, France
| | - David Chardin
- University Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, France
- University Côte d′Azur, Centre Antoine Lacassagne, Nuclear medicine Department, Nice F-06189, France
| | - Jean-Marie Guigonis
- University Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, France
| | - Jean-Marc Ferrero
- University Côte d′Azur, Centre Antoine Lacassagne, Medical Oncology Department, Nice F-06189, France
| | - Yann Chateau
- University Côte d′Azur, Centre Antoine Lacassagne, Epidemiology and Biostatistics Department, Nice F-06189, France
| | - Olivier Humbert
- University Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, France
- University Côte d′Azur, Centre Antoine Lacassagne, Nuclear medicine Department, Nice F-06189, France
| | - Thierry Pourcher
- University Côte d′Azur, Commissariat à l′Energie Atomique et aux énergies alternatives, Institut Frédéric Joliot, Service Hospitalier Frédéric Joliot, laboratory Transporters in Oncology and Radiotherapy in Oncology (TIRO), School of medicine, Nice F-06100, France
| | - Jocelyn Gal
- University Côte d′Azur, Centre Antoine Lacassagne, Epidemiology and Biostatistics Department, Nice F-06189, France
| |
Collapse
|
14
|
Willie E, Yang P, Patrick E. The impact of similarity metrics on cell-type clustering in highly multiplexed in situ imaging cytometry data. Bioinform Adv 2023; 3:vbad141. [PMID: 37928340 PMCID: PMC10625459 DOI: 10.1093/bioadv/vbad141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/23/2023] [Accepted: 10/07/2023] [Indexed: 11/07/2023]
Abstract
Motivation The advent of highly multiplexed in situ imaging cytometry assays has revolutionized the study of cellular systems, offering unparalleled detail in observing cellular activities and characteristics. These assays provide comprehensive insights by concurrently profiling the spatial distribution and molecular features of numerous cells. In navigating this complex data landscape, unsupervised machine learning techniques, particularly clustering algorithms, have become essential tools. They enable the identification and categorization of cell types and subsets based on their molecular characteristics. Despite their widespread adoption, most clustering algorithms in use were initially developed for cell suspension technologies, leading to a potential mismatch in application. There is a critical gap in the systematic evaluation of these methods, particularly in determining the properties that make them optimal for in situ imaging assays. Addressing this gap is vital for ensuring accurate, reliable analyses and fostering advancements in cellular biology research. Results In our extensive investigation, we evaluated a range of similarity metrics, which are crucial in determining the relationships between cells during the clustering process. Our findings reveal substantial variations in clustering performance, contingent on the similarity metric employed. These variations underscore the importance of selecting appropriate metrics to ensure accurate cell type and subset identification. In response to these challenges, we introduce FuseSOM, a novel ensemble clustering algorithm that integrates hierarchical multiview learning of similarity metrics with self-organizing maps. Through a rigorous stratified subsampling analysis framework, we demonstrate that FuseSOM outperforms existing best-practice clustering methods specifically tailored for in situ imaging cytometry data. Our work not only provides critical insights into the performance of clustering algorithms in this novel context but also offers a robust solution, paving the way for more accurate and reliable in situ imaging cytometry data analysis. Availability and implementation The FuseSOM R package is available on Bioconductor and is available under the GPL-3 license. All the codes for the analysis performed can be found at Github.
Collapse
Affiliation(s)
- Elijah Willie
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Pengyi Yang
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong, China
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Ellis Patrick
- Sydney Precision Data Science Centre, The University of Sydney, Camperdown, NSW 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Camperdown, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong, China
- Centre for Cancer Research, The Westmead Institute for Medical Research, The University of Sydney, Westmead, NSW 2145, Australia
| |
Collapse
|
15
|
Hui X, Wang Y, Li W, Yuan Y, Tao X, Lv R. Nd-Mn Molecular Cluster with Searched Targets for Oral Cancer Imaging. Mol Imaging Biol 2023; 25:875-886. [PMID: 37256508 DOI: 10.1007/s11307-023-01828-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/10/2023] [Accepted: 05/11/2023] [Indexed: 06/01/2023]
Abstract
In this research, we designed a novel NIR II luminescence imaging probe with targeting effect to accurately track oral squamous cell carcinoma (OSCC) cells. Massive gene expression data were processed by weighted gene co-expression network analysis to establish a network of relationships between genes. After clustering, correlation of clinical information, and gene functional enrichment analysis, MMP1 was predicted to be a biomarker/therapeutic target for OSCC cells. To obtain rare-earth probes with better luminescence in the NIR II region, we adjusted the doping ratio of the rare-earth element (Nd, Gd, Er, and Yb) fraction of the Nd-Mn molecular cluster to optimize its luminescence properties. The results of in vitro targeting experiments showed that Nd-Mn-MMP1Ab can target Cal-27 cells, demonstrating at the cellular level that the MMP1 gene is a biomarker for oral cancer, which also proves that the cancer targets predicted by the bioinformatics approach are correct.
Collapse
Affiliation(s)
- Xin Hui
- Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, School of Life Science and Technology, Xidian University, Xi'an, 710071, Shanxi, China
| | - Yanxing Wang
- Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, School of Life Science and Technology, Xidian University, Xi'an, 710071, Shanxi, China
| | - Wenjing Li
- Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, School of Life Science and Technology, Xidian University, Xi'an, 710071, Shanxi, China
| | - Ying Yuan
- Department of Medical Interdisciplinary Research, Xi'an Ninth Hospital Affiliated to Medical College of Xi'an Jiaotong University, Xi'an, 710054, Shaanxi, China
| | - Xiaofeng Tao
- Department of Medical Interdisciplinary Research, Xi'an Ninth Hospital Affiliated to Medical College of Xi'an Jiaotong University, Xi'an, 710054, Shaanxi, China.
| | - Ruichan Lv
- Engineering Research Center of Molecular and Neuro Imaging, Ministry of Education, School of Life Science and Technology, Xidian University, Xi'an, 710071, Shanxi, China.
| |
Collapse
|
16
|
Meng R, Yin S, Sun J, Hu H, Zhao Q. scAAGA: Single cell data analysis framework using asymmetric autoencoder with gene attention. Comput Biol Med 2023; 165:107414. [PMID: 37660567 DOI: 10.1016/j.compbiomed.2023.107414] [Citation(s) in RCA: 36] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 08/02/2023] [Accepted: 08/28/2023] [Indexed: 09/05/2023]
Abstract
In recent years, single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technique for investigating cellular heterogeneity and structure. However, analyzing scRNA-seq data remains challenging, especially in the context of COVID-19 research. Single-cell clustering is a key step in analyzing scRNA-seq data, and deep learning methods have shown great potential in this area. In this work, we propose a novel scRNA-seq analysis framework called scAAGA. Specifically, we utilize an asymmetric autoencoder with a gene attention module to learn important gene features adaptively from scRNA-seq data, with the aim of improving the clustering effect. We apply scAAGA to COVID-19 peripheral blood mononuclear cell (PBMC) scRNA-seq data and compare its performance with state-of-the-art methods. Our results consistently demonstrate that scAAGA outperforms existing methods in terms of adjusted rand index (ARI), normalized mutual information (NMI), and adjusted mutual information (AMI) scores, achieving improvements ranging from 2.8% to 27.8% in NMI scores. Additionally, we discuss a data augmentation technology to expand the datasets and improve the accuracy of scAAGA. Overall, scAAGA presents a robust tool for scRNA-seq data analysis, enhancing the accuracy and reliability of clustering results in COVID-19 research.
Collapse
Affiliation(s)
- Rui Meng
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Shuaidong Yin
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Huan Hu
- Institute of Applied Genomics, Fuzhou University, Fuzhou, 350108, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| |
Collapse
|
17
|
Zeibich R, Kwan P, J. O’Brien T, Perucca P, Ge Z, Anderson A. Applications for Deep Learning in Epilepsy Genetic Research. Int J Mol Sci 2023; 24:14645. [PMID: 37834093 PMCID: PMC10572791 DOI: 10.3390/ijms241914645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/11/2023] [Accepted: 09/21/2023] [Indexed: 10/15/2023] Open
Abstract
Epilepsy is a group of brain disorders characterised by an enduring predisposition to generate unprovoked seizures. Fuelled by advances in sequencing technologies and computational approaches, more than 900 genes have now been implicated in epilepsy. The development and optimisation of tools and methods for analysing the vast quantity of genomic data is a rapidly evolving area of research. Deep learning (DL) is a subset of machine learning (ML) that brings opportunity for novel investigative strategies that can be harnessed to gain new insights into the genomic risk of people with epilepsy. DL is being harnessed to address limitations in accuracy of long-read sequencing technologies, which improve on short-read methods. Tools that predict the functional consequence of genetic variation can represent breaking ground in addressing critical knowledge gaps, while methods that integrate independent but complimentary data enhance the predictive power of genetic data. We provide an overview of these DL tools and discuss how they may be applied to the analysis of genetic data for epilepsy research.
Collapse
Affiliation(s)
- Robert Zeibich
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
| | - Patrick Kwan
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Terence J. O’Brien
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| | - Piero Perucca
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Neurology, Alfred Health, Melbourne, VIC 3004, Australia
- Department of Neurology, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
- Epilepsy Research Centre, Department of Medicine, Austin Health, The University of Melbourne, Melbourne, VIC 3084, Australia
- Bladin-Berkovic Comprehensive Epilepsy Program, Department of Neurology, Austin Health, The University of Melbourne, Melbourne, VIC 3084, Australia
| | - Zongyuan Ge
- Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia;
- Monash-Airdoc Research, Monash University, Melbourne, VIC 3800, Australia
| | - Alison Anderson
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, VIC 3800, Australia; (R.Z.); (P.K.); (T.J.O.); (P.P.)
- Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Parkville, VIC 3052, Australia
| |
Collapse
|
18
|
Karim MR, Islam T, Shajalal M, Beyan O, Lange C, Cochez M, Rebholz-Schuhmann D, Decker S. Explainable AI for Bioinformatics: Methods, Tools and Applications. Brief Bioinform 2023; 24:bbad236. [PMID: 37478371 DOI: 10.1093/bib/bbad236] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/10/2023] [Accepted: 05/26/2023] [Indexed: 07/23/2023] Open
Abstract
Artificial intelligence (AI) systems utilizing deep neural networks and machine learning (ML) algorithms are widely used for solving critical problems in bioinformatics, biomedical informatics and precision medicine. However, complex ML models that are often perceived as opaque and black-box methods make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. In sensitive areas such as healthcare, explainability and accountability are not only desirable properties but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable AI (XAI) aims to overcome the opaqueness of black-box models and to provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and identify factors that influence their outcomes. However, the majority of the state-of-the-art interpretable ML methods are domain-agnostic and have evolved from fields such as computer vision, automated reasoning or statistics, making direct application to bioinformatics problems challenging without customization and domain adaptation. In this paper, we discuss the importance of explainability and algorithmic transparency in the context of bioinformatics. We provide an overview of model-specific and model-agnostic interpretable ML methods and tools and outline their potential limitations. We discuss how existing interpretable ML methods can be customized and fit to bioinformatics research problems. Further, through case studies in bioimaging, cancer genomics and text mining, we demonstrate how XAI methods can improve transparency and decision fairness. Our review aims at providing valuable insights and serving as a starting point for researchers wanting to enhance explainability and decision transparency while solving bioinformatics problems. GitHub: https://github.com/rezacsedu/XAI-for-bioinformatics.
Collapse
Affiliation(s)
- Md Rezaul Karim
- Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
- Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Germany
| | - Tanhim Islam
- Computer Science 9 - Process and Data Science, RWTH Aachen University, Germany
| | | | - Oya Beyan
- Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Institute for Medical Informatics, Germany
| | - Christoph Lange
- Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
- Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Germany
| | - Michael Cochez
- Department of Computer Science, Vrije Universiteit Amsterdam, the Netherlands
- Elsevier Discovery Lab, Amsterdam, the Netherlands
| | - Dietrich Rebholz-Schuhmann
- ZBMED - Information Center for Life Sciences, Cologne, Germany
- Faculty of Medicine, University of Cologne, Germany
| | - Stefan Decker
- Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
- Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Germany
| |
Collapse
|
19
|
Gao CX, Dwyer D, Zhu Y, Smith CL, Du L, Filia KM, Bayer J, Menssink JM, Wang T, Bergmeir C, Wood S, Cotton SM. An overview of clustering methods with guidelines for application in mental health research. Psychiatry Res 2023; 327:115265. [PMID: 37348404 DOI: 10.1016/j.psychres.2023.115265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/20/2023] [Accepted: 05/21/2023] [Indexed: 06/24/2023]
Abstract
Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and libraries.
Collapse
Affiliation(s)
- Caroline X Gao
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia; Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia.
| | - Dominic Dwyer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Ye Zhu
- School of Information Technology, Deakin University, Geelong, VIC, Australia
| | - Catherine L Smith
- Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Lan Du
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Kate M Filia
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Johanna Bayer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Jana M Menssink
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Teresa Wang
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Christoph Bergmeir
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia; Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Stephen Wood
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Sue M Cotton
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| |
Collapse
|
20
|
Vora LK, Gholap AD, Jetha K, Thakur RRS, Solanki HK, Chavda VP. Artificial Intelligence in Pharmaceutical Technology and Drug Delivery Design. Pharmaceutics 2023; 15:1916. [PMID: 37514102 PMCID: PMC10385763 DOI: 10.3390/pharmaceutics15071916] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 06/28/2023] [Accepted: 07/04/2023] [Indexed: 07/30/2023] Open
Abstract
Artificial intelligence (AI) has emerged as a powerful tool that harnesses anthropomorphic knowledge and provides expedited solutions to complex challenges. Remarkable advancements in AI technology and machine learning present a transformative opportunity in the drug discovery, formulation, and testing of pharmaceutical dosage forms. By utilizing AI algorithms that analyze extensive biological data, including genomics and proteomics, researchers can identify disease-associated targets and predict their interactions with potential drug candidates. This enables a more efficient and targeted approach to drug discovery, thereby increasing the likelihood of successful drug approvals. Furthermore, AI can contribute to reducing development costs by optimizing research and development processes. Machine learning algorithms assist in experimental design and can predict the pharmacokinetics and toxicity of drug candidates. This capability enables the prioritization and optimization of lead compounds, reducing the need for extensive and costly animal testing. Personalized medicine approaches can be facilitated through AI algorithms that analyze real-world patient data, leading to more effective treatment outcomes and improved patient adherence. This comprehensive review explores the wide-ranging applications of AI in drug discovery, drug delivery dosage form designs, process optimization, testing, and pharmacokinetics/pharmacodynamics (PK/PD) studies. This review provides an overview of various AI-based approaches utilized in pharmaceutical technology, highlighting their benefits and drawbacks. Nevertheless, the continued investment in and exploration of AI in the pharmaceutical industry offer exciting prospects for enhancing drug development processes and patient care.
Collapse
Affiliation(s)
- Lalitkumar K Vora
- School of Pharmacy, Queen's University Belfast, 97 Lisburn Road, Belfast BT9 7BL, UK
| | - Amol D Gholap
- Department of Pharmaceutics, St. John Institute of Pharmacy and Research, Palghar 401404, Maharashtra, India
| | - Keshava Jetha
- Department of Pharmaceutics and Pharmaceutical Technology, L. M. College of Pharmacy, Ahmedabad 380009, Gujarat, India
- Ph.D. Section, Gujarat Technological University, Ahmedabad 382424, Gujarat, India
| | | | - Hetvi K Solanki
- Pharmacy Section, L. M. College of Pharmacy, Ahmedabad 380009, Gujarat, India
| | - Vivek P Chavda
- Department of Pharmaceutics and Pharmaceutical Technology, L. M. College of Pharmacy, Ahmedabad 380009, Gujarat, India
| |
Collapse
|
21
|
Kalweit M, Burden AM, Boedecker J, Hügle T, Burkard T. Patient groups in Rheumatoid arthritis identified by deep learning respond differently to biologic or targeted synthetic DMARDs. PLoS Comput Biol 2023; 19:e1011073. [PMID: 37267387 DOI: 10.1371/journal.pcbi.1011073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 04/04/2023] [Indexed: 06/04/2023] Open
Abstract
Cycling of biologic or targeted synthetic disease modifying antirheumatic drugs (b/tsDMARDs) in rheumatoid arthritis (RA) patients due to non-response is a problem preventing and delaying disease control. We aimed to assess and validate treatment response of b/tsDMARDs among clusters of RA patients identified by deep learning. We clustered RA patients clusters at first-time b/tsDMARD (cohort entry) in the Swiss Clinical Quality Management in Rheumatic Diseases registry (SCQM) [1999-2018]. We performed comparative effectiveness analyses of b/tsDMARDs (ref. adalimumab) using Cox proportional hazard regression. Within 15 months, we assessed b/tsDMARD stop due to non-response, and separately a ≥20% reduction in DAS28-esr as a response proxy. We validated results through stratified analyses according to most distinctive patient characteristics of clusters. Clusters comprised between 362 and 1481 patients (3516 unique patients). Stratified (validation) analyses confirmed comparative effectiveness results among clusters: Patients with ≥2 conventional synthetic DMARDs and prednisone at b/tsDMARD initiation, male patients, as well as patients with a lower disease burden responded better to tocilizumab than to adalimumab (hazard ratio [HR] 5.46, 95% confidence interval [CI] [1.76-16.94], and HR 8.44 [3.43-20.74], and HR 3.64 [2.04-6.49], respectively). Furthermore, seronegative women without use of prednisone at b/tsDMARD initiation as well as seropositive women with a higher disease burden and longer disease duration had a higher risk of non-response with golimumab (HR 2.36 [1.03-5.40] and HR 5.27 [2.10-13.21], respectively) than with adalimumab. Our results suggest that RA patient clusters identified by deep learning may have different responses to first-line b/tsDMARD. Thus, it may suggest optimal first-line b/tsDMARD for certain RA patients, which is a step forward towards personalizing treatment. However, further research in other cohorts is needed to verify our results.
Collapse
Affiliation(s)
- Maria Kalweit
- Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Andrea M Burden
- ETH Zurich, Department of Chemistry and Applied Biosciences, Zurich, Switzerland
| | - Joschka Boedecker
- Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Thomas Hügle
- Department of Rheumatology, Lausanne University Hospital, and University of Lausanne, Lausanne, Switzerland
| | - Theresa Burkard
- ETH Zurich, Department of Chemistry and Applied Biosciences, Zurich, Switzerland
| |
Collapse
|
22
|
Zhang H, Kong W, Xie Y, Zhao X, Luo D, Chen S, Pan Z. Telomere-related genes as potential biomarkers to predict endometriosis and immune response: Development of a machine learning-based risk model. Front Med (Lausanne) 2023; 10:1132676. [PMID: 36968845 PMCID: PMC10034389 DOI: 10.3389/fmed.2023.1132676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 02/20/2023] [Indexed: 03/11/2023] Open
Abstract
IntroductionEndometriosis (EM) is an aggressive, pleomorphic, and common gynecological disease. Its clinical presentation includes abnormal menstruation, dysmenorrhea, and infertility, which seriously affect the patient's quality of life. However, the pathogenesis underlying EM and associated regulatory genes are unknown.MethodsTelomere-related genes (TRGs) were uploaded from TelNet. RNA-sequencing (RNA-seq) data of EM patients were obtained from three datasets (GSE5108, GSE23339, and GSE25628) in the GEO database, and a random forest approach was used to identify telomere signature genes and build nomogram prediction models. Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, and Gene Set Enrichment Analysis were used to identify the pathways involved in the action of the signature genes. Finally, the CAMP database was used to screen drugs for potential use in EM treatment.ResultsFifteen total genes were screened as EM–telomere differentially expressed genes. Further screening by machine learning obtained six genes as characteristic predictive of EM. Immuno-infiltration analysis of the telomeric genes showed that expressions including macrophages and natural killer cells were significantly higher in cluster A. Further enrichment analysis showed that the differential genes were mainly enriched in biological pathways like cell cycle and extracellular matrix. Finally, the Connective Map database was used to screen 11 potential drugs for EM treatment.DiscussionTRGs play a crucial role in EM development, and are associated with immune infiltration and act on multiple pathways, including the cell cycle. Telomere signature genes can be valuable predictive markers for EM.
Collapse
|
23
|
Hernández-Hernández S, Ballester PJ. On the Best Way to Cluster NCI-60 Molecules. Biomolecules 2023; 13:biom13030498. [PMID: 36979433 PMCID: PMC10046274 DOI: 10.3390/biom13030498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 03/02/2023] [Accepted: 03/06/2023] [Indexed: 03/30/2023] Open
Abstract
Machine learning-based models have been widely used in the early drug-design pipeline. To validate these models, cross-validation strategies have been employed, including those using clustering of molecules in terms of their chemical structures. However, the poor clustering of compounds will compromise such validation, especially on test molecules dissimilar to those in the training set. This study aims at finding the best way to cluster the molecules screened by the National Cancer Institute (NCI)-60 project by comparing hierarchical, Taylor-Butina, and uniform manifold approximation and projection (UMAP) clustering methods. The best-performing algorithm can then be used to generate clusters for model validation strategies. This study also aims at measuring the impact of removing outlier molecules prior to the clustering step. Clustering results are evaluated using three well-known clustering quality metrics. In addition, we compute an average similarity matrix to assess the quality of each cluster. The results show variation in clustering quality from method to method. The clusters obtained by the hierarchical and Taylor-Butina methods are more computationally expensive to use in cross-validation strategies, and both cluster the molecules poorly. In contrast, the UMAP method provides the best quality, and therefore we recommend it to analyze this highly valuable dataset.
Collapse
Affiliation(s)
- Saiveth Hernández-Hernández
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli-Calmettes, Aix-Marseille Université UM105, CNRS UMR7258), 13009 Marseille, France
| | - Pedro J Ballester
- Department of Bioengineering, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
24
|
Nguyen R, Sokhansanj BA, Polikar R, Rosen GL. Complet+: a computationally scalable method to improve completeness of large-scale protein sequence clustering. PeerJ 2023; 11:e14779. [PMID: 36785708 PMCID: PMC9921987 DOI: 10.7717/peerj.14779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 01/03/2023] [Indexed: 02/10/2023] Open
Abstract
A major challenge for clustering algorithms is to balance the trade-off between homogeneity, i.e., the degree to which an individual cluster includes only related sequences, and completeness, the degree to which related sequences are broken up into multiple clusters. Most algorithms are conservative in grouping sequences with other sequences. Remote homologs may fail to be clustered together and instead form unnecessarily distinct clusters. The resulting clusters have high homogeneity but completeness that is too low. We propose Complet+, a computationally scalable post-processing method to increase the completeness of clusters without an undue cost in homogeneity. Complet+ proves to effectively merge closely-related clusters of protein that have verified structural relationships in the SCOPe classification scheme, improving the completeness of clustering results at little cost to homogeneity. Applying Complet+ to clusters obtained using MMseqs2's clusterupdate achieves an increased V-measure of 0.09 and 0.05 at the SCOPe superfamily and family levels, respectively. Complet+ also creates more biologically representative clusters, as shown by a substantial increase in Adjusted Mutual Information (AMI) and Adjusted Rand Index (ARI) metrics when comparing predicted clusters to biological classifications. Complet+ similarly improves clustering metrics when applied to other methods, such as CD-HIT and linclust. Finally, we show that Complet+ runtime scales linearly with respect to the number of clusters being post-processed on a COG dataset of over 3 million sequences. Code and supplementary information is available on Github: https://github.com/EESI/Complet-Plus.
Collapse
Affiliation(s)
- Rachel Nguyen
- Drexel University, Philadelphia, United States of America
| | | | - Robi Polikar
- Rowan University, Glassboro, NJ, United States of America
| | - Gail L. Rosen
- Drexel University, Philadelphia, United States of America
| |
Collapse
|
25
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
26
|
Gorla A, Sankararaman S, Burchard E, Flint J, Zaitlen N, Rahmani E. Phenotypic subtyping via contrastive learning. bioRxiv 2023:2023.01.05.522921. [PMID: 36711575 PMCID: PMC9881932 DOI: 10.1101/2023.01.05.522921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Defining and accounting for subphenotypic structure has the potential to increase statistical power and provide a deeper understanding of the heterogeneity in the molecular basis of complex disease. Existing phenotype subtyping methods primarily rely on clinically observed heterogeneity or metadata clustering. However, they generally tend to capture the dominant sources of variation in the data, which often originate from variation that is not descriptive of the mechanistic heterogeneity of the phenotype of interest; in fact, such dominant sources of variation, such as population structure or technical variation, are, in general, expected to be independent of subphenotypic structure. We instead aim to find a subspace with signal that is unique to a group of samples for which we believe that subphenotypic variation exists (e.g., cases of a disease). To that end, we introduce Phenotype Aware Components Analysis (PACA), a contrastive learning approach leveraging canonical correlation analysis to robustly capture weak sources of subphenotypic variation. In the context of disease, PACA learns a gradient of variation unique to cases in a given dataset, while leveraging control samples for accounting for variation and imbalances of biological and technical confounders between cases and controls. We evaluated PACA using an extensive simulation study, as well as on various subtyping tasks using genotypes, transcriptomics, and DNA methylation data. Our results provide multiple strong evidence that PACA allows us to robustly capture weak unknown variation of interest while being calibrated and well-powered, far superseding the performance of alternative methods. This renders PACA as a state-of-the-art tool for defining de novo subtypes that are more likely to reflect molecular heterogeneity, especially in challenging cases where the phenotypic heterogeneity may be masked by a myriad of strong unrelated effects in the data.
Collapse
Affiliation(s)
- Aditya Gorla
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Esteban Burchard
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | - Jonathan Flint
- Department of Psychiatry and Behavioral Sciences, Brain Research Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Noah Zaitlen
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Elior Rahmani
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
27
|
Sun J, Huang Q. Two stages biclustering with three populations. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
28
|
Sun J, Liu Q, Wang Y, Wang L, Song X, Zhao X. Five-year prognosis model of esophageal cancer based on genetic algorithm improved deep neural network. Ing Rech Biomed 2023. [DOI: 10.1016/j.irbm.2022.100748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
29
|
Johnson AC, Silva JAF, Kim SC, Larsen CP. Progress in kidney transplantation: The role for systems immunology. Front Med (Lausanne) 2022; 9:1070385. [PMID: 36590970 PMCID: PMC9800623 DOI: 10.3389/fmed.2022.1070385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 11/16/2022] [Indexed: 12/23/2022] Open
Abstract
The development of systems biology represents an immense breakthrough in our ability to perform translational research and deliver personalized and precision medicine. A multidisciplinary approach in combination with use of novel techniques allows for the extraction and analysis of vast quantities of data even from the volume and source limited samples that can be obtained from human subjects. Continued advances in microfluidics, scalability and affordability of sequencing technologies, and development of data analysis tools have made the application of a multi-omics, or systems, approach more accessible for use outside of specialized centers. The study of alloimmune and protective immune responses after solid organ transplant offers innumerable opportunities for a multi-omics approach, however, transplant immunology labs are only just beginning to adopt the systems methodology. In this review, we focus on advances in biological techniques and how they are improving our understanding of the immune system and its interactions, highlighting potential applications in transplant immunology. First, we describe the techniques that are available, with emphasis on major advances that allow for increased scalability. Then, we review initial applications in the field of transplantation with a focus on topics that are nearing clinical integration. Finally, we examine major barriers to adapting these methods and discuss potential future developments.
Collapse
|
30
|
Abstract
SARS-CoV-2’s population structure might have a substantial impact on public health management and diagnostics if it can be identified. It is critical to rapidly monitor and characterize their lineages circulating globally for a more accurate diagnosis, improved care, and faster treatment. For a clearer picture of the SARS-CoV-2 population structure, clustering the sequencing data is essential. Here, deep clustering techniques were used to automatically group 29,017 different strains of SARS-CoV-2 into clusters. We aim to identify the main clusters of SARS-CoV-2 population structure based on convolutional autoencoder (CAE) trained with numerical feature vectors mapped from coronavirus Spike peptide sequences. Our clustering findings revealed that there are six large SARS-CoV-2 population clusters (C1, C2, C3, C4, C5, C6). These clusters contained 43 unique lineages in which the 29,017 publicly accessible strains were dispersed. In all the resulting six clusters, the genetic distances within the same cluster (intra-cluster distances) are less than the distances between inter-clusters (P-value 0.0019, Wilcoxon rank-sum test). This indicates substantial evidence of a connection between the cluster’s lineages. Furthermore, comparisons of the K-means and hierarchical clustering methods have been examined against the proposed deep learning clustering method. The intra-cluster genetic distances of the proposed method were smaller than those of K-means alone and hierarchical clustering methods. We used T-distributed stochastic-neighbor embedding (t-SNE) to show the outcomes of the deep learning clustering. The strains were isolated correctly between clusters in the t-SNE plot. Our results showed that the (C5) cluster exclusively includes Gamma lineage (P.1) only, suggesting that strains of P.1 in C5 are more diversified than those in the other clusters. Our study indicates that the genetic similarity between strains in the same cluster enables a better understanding of the major features of the unknown population lineages when compared to some of the more prevalent viral isolates. This information helps researchers figure out how the virus changed over time and spread to people all over the world.
Collapse
|
31
|
Abstract
Omics-based approaches have become increasingly influential in identifying disease mechanisms and drug responses. Considering that diseases and drug responses are co-expressed and regulated in the relevant omics data interactions, the traditional way of grabbing omics data from single isolated layers cannot always obtain valuable inference. Also, drugs have adverse effects that may impair patients, and launching new medicines for diseases is costly. To resolve the above difficulties, systems biology is applied to predict potential molecular interactions by integrating omics data from genomic, proteomic, transcriptional, and metabolic layers. Combined with known drug reactions, the resulting models improve medicines' therapeutical performance by re-purposing the existing drugs and combining drug molecules without off-target effects. Based on the identified computational models, drug administration control laws are designed to balance toxicity and efficacy. This review introduces biomedical applications and analyses of interactions among gene, protein and drug molecules for modeling disease mechanisms and drug responses. The therapeutical performance can be improved by combining the predictive and computational models with drug administration designed by control laws. The challenges are also discussed for its clinical uses in this work.
Collapse
Affiliation(s)
- Rongting Yue
- Department of Electrical and Computer Engineering, University of Connecticut, 371 Fairfield Way, Storrs, CT, 06269, USA.
| | - Abhishek Dutta
- Department of Electrical and Computer Engineering, University of Connecticut, 371 Fairfield Way, Storrs, CT, 06269, USA
| |
Collapse
|
32
|
Moebel E, Kervrann C. Towards unsupervised classification of macromolecular complexes in cryo electron tomography: Challenges and opportunities. Comput Methods Programs Biomed 2022; 225:107017. [PMID: 35901628 DOI: 10.1016/j.cmpb.2022.107017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 07/03/2022] [Accepted: 07/08/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVES Cryo electron tomography visualizes native cells at nanometer resolution, but analysis is challenged by noise and artifacts. Recently, supervised deep learning methods have been applied to decipher the 3D spatial distribution of macromolecules. However, in order to discover unknown objects, unsupervised classification techniques are necessary. In this paper, we provide an overview of unsupervised deep learning techniques, discuss the challenges to analyze cryo-ET data, and provide a proof-of-concept on real data. METHODS We propose a weakly supervised subtomogram classification method based on transfer learning. We use a deep neural network to learn a clustering friendly representation able to capture 3D shapes in the presence of noise and artifacts. This representation is learned here from a synthetic data set. RESULTS We show that when applying k-means clustering given a learning-based representation, it becomes possible to satisfyingly classify real subtomograms according to structural similarity. It is worth noting that no manual annotation is used for performing classification. CONCLUSIONS We describe the advantages and limitations of our proof-of-concept and raise several perspectives for improving classification performance.
Collapse
Affiliation(s)
- E Moebel
- Inria Rennes: Inria Centre de Recherche Rennes Bretagne Atlantique, France.
| | - C Kervrann
- Inria Rennes: Inria Centre de Recherche Rennes Bretagne Atlantique, France
| |
Collapse
|
33
|
Guo W, Bai J, Zhang Q, Duan K, Zhang P, Zhang J, Zhao J, Zhang W, Kong D. Influence of thermal processing on the quality of hawthorn: quality markers of heat-processed hawthorn. J Sep Sci 2022; 45:3774-3785. [PMID: 35938469 DOI: 10.1002/jssc.202200222] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 07/12/2022] [Accepted: 08/02/2022] [Indexed: 11/11/2022]
Abstract
Hawthorn and its derived products are used worldwide as foods as well as complementary medicine. During the preparation of hawthorn, heating and thermal processing are frequently reported. The thermal processing will change the medicinal purposes and modify the efficacy of hawthorn. However, details including the chemical profile shifting and quality markers of heat-processed hawthorn have not been well understood. In the paper, we analyzed the hawthorn samples processed at different temperatures and different times by ultraviolet visible absorption spectrum and LC-MS technologies combined with multivariate statistical analysis. It was revealed for the first time that thermal processing could greatly change the ultraviolet visible absorption spectra and chemical profiles of hawthorn even with heat treatment at 130°C for 10 minutes. And the ultraviolet visible absorption spectrum, especially the ratio value (RA500 nm/400 nm ), was a descriptive and qualitative indicator of heating degree for the thermal processing at the macroscopic level. Several components, such as hyperoside, chlorogenic acid, quercetin and apigenin, decreased or increased in content during the processing, and they could be utilized as the chemical quality markers. The proposed quality markers for heat-processed hawthorn will be helpful for further optimizing the processing conditions of hawthorn. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Wenyan Guo
- Department of Pharmacology of Chinese Materia Medica, School of Chinese Integrative Medicine, Hebei Medical University, Shijiazhuang, China
| | - Jing Bai
- Department of Pharmacy, The Fourth Hospital of Hebei Medical University, Shijiazhuang, China
| | - Qingning Zhang
- Department of Pharmacology of Chinese Materia Medica, School of Chinese Integrative Medicine, Hebei Medical University, Shijiazhuang, China
| | - Kunfeng Duan
- Department of Pharmacy, The Third Hospital of Hebei Medical University, Shijiazhuang, China
| | - Panpan Zhang
- Department of Pharmacology of Chinese Materia Medica, School of Chinese Integrative Medicine, Hebei Medical University, Shijiazhuang, China
| | - Jianghua Zhang
- School of Chinese Integrative Medicine, Hebei Medical University, Shijiazhuang, China
| | - Jing Zhao
- Department of Pharmacology of Chinese Materia Medica, School of Chinese Integrative Medicine, Hebei Medical University, Shijiazhuang, China
| | - Wei Zhang
- Department of Pharmacology of Chinese Materia Medica, School of Chinese Integrative Medicine, Hebei Medical University, Shijiazhuang, China
| | - Dezhi Kong
- Department of Pharmacology of Chinese Materia Medica, School of Chinese Integrative Medicine, Hebei Medical University, Shijiazhuang, China
| |
Collapse
|
34
|
Lee H, Choi Y, Son B, Lim J, Lee S, Kang JW, Kim KH, Kim EJ, Yang C, Lee JD. Deep autoencoder-powered pattern identification of sleep disturbance using multi-site cross-sectional survey data. Front Med (Lausanne) 2022; 9:950327. [PMID: 35966837 PMCID: PMC9374171 DOI: 10.3389/fmed.2022.950327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Accepted: 07/11/2022] [Indexed: 11/13/2022] Open
Abstract
Pattern identification (PI) is a diagnostic method used in Traditional East Asian medicine (TEAM) to select appropriate and personalized acupuncture points and herbal medicines for individual patients. Developing a reproducible PI model using clinical information is important as it would reflect the actual clinical setting and improve the effectiveness of TEAM treatment. In this paper, we suggest a novel deep learning-based PI model with feature extraction using a deep autoencoder and k-means clustering through a cross-sectional study of sleep disturbance patient data. The data were obtained from an anonymous electronic survey in the Republic of Korea Army (ROKA) members from August 16, 2021, to September 20, 2021. The survey instrument consisted of six sections: demographics, medical history, military duty, sleep-related assessments (Pittsburgh sleep quality index (PSQI), Berlin questionnaire, and sleeping environment), diet/nutrition-related assessments [dietary habit survey questionnaire and nutrition quotient (NQ)], and gastrointestinal-related assessments [gastrointestinal symptom rating scale (GSRS) and Bristol stool scale]. Principal component analysis (PCA) and a deep autoencoder were used to extract features, which were then clustered using the k-means clustering method. The Calinski-Harabasz index, silhouette coefficient, and within-cluster sum of squares were used for internal cluster validation and the final PSQI, Berlin questionnaire, GSRS, and NQ scores were used for external cluster validation. One-way analysis of variance followed by the Tukey test and chi-squared test were used for between-cluster comparisons. Among 4,869 survey responders, 2,579 patients with sleep disturbances were obtained after filtering using a PSQI score of >5. When comparing clustering performance using raw data and extracted features by PCA and the deep autoencoder, the best feature extraction method for clustering was the deep autoencoder (16 nodes for the first and third hidden layers, and two nodes for the second hidden layer). Our model could cluster three different PI types because the optimal number of clusters was determined to be three via the elbow method. After external cluster validation, three PI types were differentiated by changes in sleep quality, dietary habits, and concomitant gastrointestinal symptoms. This model may be applied to the development of artificial intelligence-based clinical decision support systems through electronic medical records and clinical trial protocols for evaluating the effectiveness of TEAM treatment.
Collapse
Affiliation(s)
- Hyeonhoon Lee
- Department of Clinical Korean Medicine, Graduate School, Kyung Hee University, Seoul, South Korea
| | - Yujin Choi
- KM Science Research Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
| | - Byunwoo Son
- Department of Korean Medicine, Combined Dispensary, 7th Corps, Republic of Korea Army, Icheon-si, South Korea
| | - Jinwoong Lim
- Department of Clinical Korean Medicine, Graduate School, Kyung Hee University, Seoul, South Korea
- Department of Acupuncture and Moxibustion, Wonkwang University Gwangju Korean Medicine Hospital, Gwangju, South Korea
| | - Seunghoon Lee
- Department of Acupuncture and Moxibustion, College of Korean Medicine, Kyung Hee University, Seoul, South Korea
| | - Jung Won Kang
- Department of Acupuncture and Moxibustion, College of Korean Medicine, Kyung Hee University, Seoul, South Korea
| | - Kun Hyung Kim
- School of Korean Medicine, Pusan National University, Yangsan, South Korea
| | - Eun Jung Kim
- Department of Acupuncture and Moxibustion Medicine, Dongguk University Bundang Oriental Hospital, Seongnam-si, South Korea
| | - Changsop Yang
- KM Science Research Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
- *Correspondence: Changsop Yang
| | - Jae-Dong Lee
- Department of Acupuncture and Moxibustion, College of Korean Medicine, Kyung Hee University, Seoul, South Korea
- Jae-Dong Lee
| |
Collapse
|
35
|
Lukauskas M, Ruzgas T. A New Clustering Method Based on the Inversion Formula. Mathematics 2022; 10:2559. [DOI: 10.3390/math10152559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Data clustering is one area of data mining that falls into the data mining class of unsupervised learning. Cluster analysis divides data into different classes by discovering the internal structure of data set objects and their relationship. This paper presented a new density clustering method based on the modified inversion formula density estimation. This new method should allow one to improve the performance and robustness of the k-means, Gaussian mixture model, and other methods. The primary process of the proposed clustering algorithm consists of three main steps. Firstly, we initialized parameters and generated a T matrix. Secondly, we estimated the densities of each point and cluster. Third, we updated mean, sigma, and phi matrices. The new method based on the inversion formula works quite well with different datasets compared with K-means, Gaussian Mixture Model, and Bayesian Gaussian Mixture model. On the other hand, new methods have limitations because this one method in the current state cannot work with higher-dimensional data (d > 15). This will be solved in the future versions of the model, detailed further in future work. Additionally, based on the results, we can see that the MIDEv2 method works the best with generated data with outliers in all datasets (0.5%, 1%, 2%, 4% outliers). The interesting point is that a new method based on the inversion formula can cluster the data even if data do not have outliers; one of the most popular, for example, is the Iris dataset.
Collapse
|
36
|
Gao L, Chen Z, Zang L, Sun Z, Wang Q, Yu G. Midpalatal Suture CBCT Image Quantitive Characteristics Analysis Based on Machine Learning Algorithm Construction and Optimization. Bioengineering (Basel) 2022; 9:bioengineering9070316. [PMID: 35877367 PMCID: PMC9311955 DOI: 10.3390/bioengineering9070316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/20/2022] [Accepted: 07/01/2022] [Indexed: 11/16/2022] Open
Abstract
Background: Midpalatal suture maturation and ossification status is the basis for appraising maxillary transverse developmental status. Methods: We established a midpalatal suture cone-beam computed tomography (CBCT) normalized database of the growth population, including 1006 CBCT files from 690 participants younger than 24 years old. The midpalatal suture region of interest (ROI) labeling was completed by two experienced clinical experts. The CBCT image fusion algorithm and image texture feature analysis algorithm were constructed and optimized. The age range prediction convolutional neural network (CNN) was conducted and tested. Results: The midpalatal suture fusion images contain complete semantic information for appraising midpalatal suture maturation and ossification status during the fast growth and development period. Correlation and homogeneity are the two texture features with the strongest relevance to chronological age. The overall performance of the age range prediction CNN model is satisfactory, especially in the 4 to 10 years range and the 17 to 23 years range, while for the 13 to 14 years range, the model performance is compromised. Conclusions: The image fusion algorithm can help show the overall perspective of the midpalatal suture in one fused image effectively. Furthermore, clinical decisions for maxillary transverse deficiency should be appraised by midpalatal suture image features directly rather than by age, especially in the 13 to 14 years range.
Collapse
Affiliation(s)
- Lu Gao
- Department of Stomatology, Beijing Children’s Hospital, Capital Medical University, National Center for Children’s Health, Beijing 100045, China;
| | - Zhiyu Chen
- School of Software Engineering, North University of China, Taiyuan 030051, China;
| | - Lin Zang
- Pharmacovigilance Research Center for Information Technology and Data Science, Cross-Strait Tsinghua Research Institute, Xiamen 361000, China;
| | - Zhipeng Sun
- National Engineering Laboratory for Digital and Material Technology of Stomatology, Beijing Key Laboratory of Digital Stomatology, Department of Oral and Maxillofacial Radiology, Peking University School and Hospital of Stomatology, Beijing 100081, China;
| | - Qing Wang
- Pharmacovigilance Research Center for Information Technology and Data Science, Cross-Strait Tsinghua Research Institute, Xiamen 361000, China;
- Department of Automation, Tsinghua University, Beijing 100084, China
- Correspondence: (Q.W.); (G.Y.)
| | - Guoxia Yu
- Department of Stomatology, Beijing Children’s Hospital, Capital Medical University, National Center for Children’s Health, Beijing 100045, China;
- National Clinical Research Center for Respiratory Diseases, Beijing Children’s Hospital, Capital Medical University, National Center for Children’s Health, Beijing 100045, China
- Correspondence: (Q.W.); (G.Y.)
| |
Collapse
|
37
|
Pan X, Lin X, Cao D, Zeng X, Yu PS, He L, Nussinov R, Cheng F. Deep learning for drug repurposing: Methods, databases, and applications. WIREs Comput Mol Sci 2022. [DOI: 10.1002/wcms.1597] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Xiaoqin Pan
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Xuan Lin
- School of Computer Science Xiangtan University Xiangtan China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education Xiangtan University Xiangtan China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Xiangxiang Zeng
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Philip S. Yu
- Department of Computer Science University of Illinois at Chicago Chicago Illinois USA
| | - Lifang He
- Department of Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research National Cancer Institute at Frederick Frederick Maryland USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine Tel Aviv University Tel Aviv Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic Cleveland Ohio USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine Case Western Reserve University Cleveland Ohio USA
- Case Comprehensive Cancer Center Case Western Reserve University School of Medicine Cleveland Ohio USA
| |
Collapse
|
38
|
Wang X, Shan H, Yan X, Yu L, Yu Y. A Neural Network Model Secret-Sharing Scheme with Multiple Weights for Progressive Recovery. Mathematics 2022; 10:2231. [DOI: 10.3390/math10132231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the widespread use of deep-learning models in production environments, the value of deep-learning models has become more prominent. The key issues are the rights of the model trainers and the security of the specific scenarios using the models. In the commercial domain, consumers pay different fees and have access to different levels of services. Therefore, dividing the model into several shadow models with multiple weights is necessary. When holders want to use the model, they can recover the model whose performance corresponds to the number and weights of the collected shadow models so that access to the model can be controlled progressively, i.e., progressive recovery is significant. This paper proposes a neural network model secret sharing scheme (NNSS) with multiple weights for progressive recovery. The scheme uses Shamir’s polynomial to control model parameters’ sharing and embedding phase, which in turn enables hierarchical performance control in the secret model recovery phase. First, the important model parameters are extracted. Then, effective shadow parameters are assigned based on the holders’ weights in the sharing phase, and t shadow models are generated. The holders can obtain a sufficient number of shadow parameters for recovering the secret parameters with a certain probability during the recovery phase. As the number of shadow models obtained increases, the probability becomes larger, while the performance of the extracted models is related to the participants’ weights in the recovery phase. The probability is proportional to the number and weights of the shadow models obtained in the recovery phase, and the probability of the successful recovery of the shadow parameters is 1 when all t shadow models are obtained, i.e., the performance of the reconstruction model can reach the performance of the secret model. A series of experiments conducted on VGG19 verify the effectiveness of the scheme.
Collapse
|
39
|
Davagdorj K, Wang L, Li M, Pham VH, Ryu KH, Theera-Umpon N. Discovering Thematically Coherent Biomedical Documents Using Contextualized Bidirectional Encoder Representations from Transformers-Based Clustering. Int J Environ Res Public Health 2022; 19:5893. [PMID: 35627429 DOI: 10.3390/ijerph19105893] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 04/26/2022] [Accepted: 05/10/2022] [Indexed: 02/01/2023]
Abstract
The increasing expansion of biomedical documents has increased the number of natural language textual resources related to the current applications. Meanwhile, there has been a great interest in extracting useful information from meaningful coherent groupings of textual content documents in the last decade. However, it is challenging to discover informative representations and define relevant articles from the rapidly growing biomedical literature due to the unsupervised nature of document clustering. Moreover, empirical investigations demonstrated that traditional text clustering methods produce unsatisfactory results in terms of non-contextualized vector space representations because that neglect the semantic relationship between biomedical texts. Recently, pre-trained language models have emerged as successful in a wide range of natural language processing applications. In this paper, we propose the Gaussian Mixture Model-based efficient clustering framework that incorporates substantially pre-trained (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) BioBERT domain-specific language representations to enhance the clustering accuracy. Our proposed framework consists of main three phases. First, classic text pre-processing techniques are used biomedical document data, which crawled from the PubMed repository. Second, representative vectors are extracted from a pre-trained BioBERT language model for biomedical text mining. Third, we employ the Gaussian Mixture Model as a clustering algorithm, which allows us to assign labels for each biomedical document. In order to prove the efficiency of our proposed model, we conducted a comprehensive experimental analysis utilizing several clustering algorithms while combining diverse embedding techniques. Consequently, the experimental results show that the proposed model outperforms the benchmark models by reaching performance measures of Fowlkes mallows score, silhouette coefficient, adjusted rand index, Davies-Bouldin score of 0.7817, 0.3765, 0.4478, 1.6849, respectively. We expect the outcomes of this study will assist domain specialists in comprehending thematically cohesive documents in the healthcare field.
Collapse
|
40
|
Chen Z, Liu X, Zhao P, Li C, Wang Y, Li F, Akutsu T, Bain C, Gasser RB, Li J, Yang Z, Gao X, Kurgan L, Song J. iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets. Nucleic Acids Res 2022; 50:W434-W447. [PMID: 35524557 PMCID: PMC9252729 DOI: 10.1093/nar/gkac351] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/22/2022] [Accepted: 04/25/2022] [Indexed: 01/07/2023] Open
Abstract
The rapid accumulation of molecular data motivates development of innovative approaches to computationally characterize sequences, structures and functions of biological and chemical molecules in an efficient, accessible and accurate manner. Notwithstanding several computational tools that characterize protein or nucleic acids data, there are no one-stop computational toolkits that comprehensively characterize a wide range of biomolecules. We address this vital need by developing a holistic platform that generates features from sequence and structural data for a diverse collection of molecule types. Our freely available and easy-to-use iFeatureOmega platform generates, analyzes and visualizes 189 representations for biological sequences, structures and ligands. To the best of our knowledge, iFeatureOmega provides the largest scope when directly compared to the current solutions, in terms of the number of feature extraction and analysis approaches and coverage of different molecules. We release three versions of iFeatureOmega including a webserver, command line interface and graphical interface to satisfy needs of experienced bioinformaticians and less computer-savvy biologists and biochemists. With the assistance of iFeatureOmega, users can encode their molecular data into representations that facilitate construction of predictive models and analytical studies. We highlight benefits of iFeatureOmega based on three research applications, demonstrating how it can be used to accelerate and streamline research in bioinformatics, computational biology, and cheminformatics areas. The iFeatureOmega webserver is freely available at http://ifeatureomega.erc.monash.edu and the standalone versions can be downloaded from https://github.com/Superzchen/iFeatureOmega-GUI/ and https://github.com/Superzchen/iFeatureOmega-CLI/.
Collapse
Affiliation(s)
- Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China.,Center for Crop Genome Engineering, Henan Agricultural University, Zhengzhou 450046, China
| | - Xuhan Liu
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Einsteinweg 55, Leiden 2333 CC, The Netherlands
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Yanan Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Chris Bain
- Monash Data Future Institutes, Monash University, Melbourne, Victoria 3800, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450046, China
| | - Zuoren Yang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Future Institutes, Monash University, Melbourne, Victoria 3800, Australia
| |
Collapse
|
41
|
Hadipour H, Liu C, Davis R, Cardona ST, Hu P. Deep clustering of small molecules at large-scale via variational autoencoder embedding and K-means. BMC Bioinformatics 2022; 23:132. [PMID: 35428173 PMCID: PMC9011935 DOI: 10.1186/s12859-022-04667-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
Background Converting molecules into computer-interpretable features with rich molecular information is a core problem of data-driven machine learning applications in chemical and drug-related tasks. Generally speaking, there are global and local features to represent a given molecule. As most algorithms have been developed based on one type of feature, a remaining bottleneck is to combine both feature sets for advanced molecule-based machine learning analysis. Here, we explored a novel analytical framework to make embeddings of the molecular features and apply them in the clustering of a large number of small molecules. Results In this novel framework, we first introduced a principal component analysis method encoding the molecule-specific atom and bond information. We then used a variational autoencoder (AE)-based method to make embeddings of the global chemical properties and the local atom and bond features. Next, using the embeddings from the encoded local and global features, we implemented and compared several unsupervised clustering algorithms to group the molecule-specific embeddings. The number of clusters was treated as a hyper-parameter and determined by the Silhouette method. Finally, we evaluated the corresponding results using three internal indices. Applying the analysis framework to a large chemical library of more than 47,000 molecules, we successfully identified 50 molecular clusters using the K-means method with 32 embeddings based on the AE method. We visualized the clustering result via t-SNE for the overall distribution of molecules and the similarity maps for the structural analysis of randomly selected cluster-specific molecules. Conclusions This study developed a novel analytical framework that comprises a feature engineering scheme for molecule-specific atomic and bonding features and a deep learning-based embedding strategy for different molecular features. By applying the identified embeddings, we show their usefulness for clustering a large molecule dataset. Our novel analytic algorithms can be applied to any virtual library of chemical compounds with diverse molecular structures. Hence, these tools have the potential of optimizing drug discovery, as they can decrease the number of compounds to be screened in any drug screening campaign.
Collapse
|
42
|
Soares IM, Camargo FHF, Marques A, Crook OM. Improving lab-of-origin prediction of genetically engineered plasmids via deep metric learning. Nat Comput Sci 2022; 2:253-264. [PMID: 38177551 DOI: 10.1038/s43588-022-00234-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 03/22/2022] [Indexed: 01/06/2024]
Abstract
Genome engineering is undergoing unprecedented development and is now becoming widely available. Genetic engineering attribution can make sequence-lab associations and assist forensic experts in ensuring responsible biotechnology innovation and reducing misuse of engineered DNA sequences. Here we propose a method based on metric learning to rank the most likely labs of origin while simultaneously generating embeddings for plasmid sequences and labs. These embeddings can be used to perform various downstream tasks, such as clustering DNA sequences and labs, as well as using them as features in machine learning models. Our approach employs a circular shift augmentation method and can correctly rank the lab of origin 90% of the time within its top-10 predictions. We also demonstrate that we can perform few-shot learning and obtain 76% top-10 accuracy using only 10% of the sequences. Finally, our approach can also extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
Collapse
Affiliation(s)
| | | | | | - Oliver M Crook
- Oxford Protein Informatics Group, University of Oxford, Oxford, UK.
| |
Collapse
|
43
|
Jiménez P, Roldán JC, Corchuelo R. On exploring data lakes by finding compact, isolated clusters. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.12.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
44
|
Wanotayan R, Chousangsuntorn K, Petisiwaveth P, Anuttra T, Lertchanyaphan W, Jaikuna T, Jangpatarapongsa K, Uttayarat P, Tongloy T, Chousangsuntorn C, Boonsang S. A deep learning model (FociRad) for automated detection of γ-H2AX foci and radiation dose estimation. Sci Rep 2022; 12:5527. [PMID: 35365702 DOI: 10.1038/s41598-022-09180-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
DNA double-strand breaks (DSBs) are the most lethal form of damage to cells from irradiation. γ-H2AX (phosphorylated form of H2AX histone variant) has become one of the most reliable and sensitive biomarkers of DNA DSBs. However, the γ-H2AX foci assay still has limitations in the time consumed for manual scoring and possible variability between scorers. This study proposed a novel automated foci scoring method using a deep convolutional neural network based on a You-Only-Look-Once (YOLO) algorithm to quantify γ-H2AX foci in peripheral blood samples. FociRad, a two-stage deep learning approach, consisted of mononuclear cell (MNC) and γ-H2AX foci detections. Whole blood samples were irradiated with X-rays from a 6 MV linear accelerator at 1, 2, 4 or 6 Gy. Images were captured using confocal microscopy. Then, dose-response calibration curves were established and implemented with unseen dataset. The results of the FociRad model were comparable with manual scoring. MNC detection yielded 96.6% accuracy, 96.7% sensitivity and 96.5% specificity. γ-H2AX foci detection showed very good F1 scores (> 0.9). Implementation of calibration curve in the range of 0-4 Gy gave mean absolute difference of estimated doses less than 1 Gy compared to actual doses. In addition, the evaluation times of FociRad were very short (< 0.5 min per 100 images), while the time for manual scoring increased with the number of foci. In conclusion, FociRad was the first automated foci scoring method to use a YOLO algorithm with high detection performance and fast evaluation time, which opens the door for large-scale applications in radiation triage.
Collapse
|
45
|
Zhou R, Wang P, Li Y, Mou X, Zhao Z, Chen X, Du L, Yang T, Zhan Q, Fang Z. Prediction of Pulmonary Function Parameters Based on a Combination Algorithm. Bioengineering (Basel) 2022; 9:136. [PMID: 35447696 PMCID: PMC9032560 DOI: 10.3390/bioengineering9040136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/18/2022] [Accepted: 03/23/2022] [Indexed: 11/19/2022] Open
Abstract
Objective: Pulmonary function parameters play a pivotal role in the assessment of respiratory diseases. However, the accuracy of the existing methods for the prediction of pulmonary function parameters is low. This study proposes a combination algorithm to improve the accuracy of pulmonary function parameter prediction. Methods: We first established a system to collect volumetric capnography and then processed the data with a combination algorithm to predict pulmonary function parameters. The algorithm consists of three main parts: a medical feature regression structure consisting of support vector machines (SVM) and extreme gradient boosting (XGBoost) algorithms, a sequence feature regression structure consisting of one-dimensional convolutional neural network (1D-CNN), and an error correction structure using improved K-nearest neighbor (KNN) algorithm. Results: The root mean square error (RMSE) of the pulmonary function parameters predicted by the combination algorithm was less than 0.39L and the R2 was found to be greater than 0.85 through a ten-fold cross-validation experiment. Conclusion: Compared with the existing methods for predicting pulmonary function parameters, the present algorithm can achieve a higher accuracy rate. At the same time, this algorithm uses specific processing structures for different features, and the interpretability of the algorithm is ensured while mining the feature depth information.
Collapse
|
46
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
47
|
|
48
|
Gan Y, Huang X, Zou G, Zhou S, Guan J. Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network. Brief Bioinform 2022; 23:6529282. [PMID: 35172334 DOI: 10.1093/bib/bbac018] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/27/2021] [Accepted: 01/13/2022] [Indexed: 12/20/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) permits researchers to study the complex mechanisms of cell heterogeneity and diversity. Unsupervised clustering is of central importance for the analysis of the scRNA-seq data, as it can be used to identify putative cell types. However, due to noise impacts, high dimensionality and pervasive dropout events, clustering analysis of scRNA-seq data remains a computational challenge. Here, we propose a new deep structural clustering method for scRNA-seq data, named scDSC, which integrate the structural information into deep clustering of single cells. The proposed scDSC consists of a Zero-Inflated Negative Binomial (ZINB) model-based autoencoder, a graph neural network (GNN) module and a mutual-supervised module. To learn the data representation from the sparse and zero-inflated scRNA-seq data, we add a ZINB model to the basic autoencoder. The GNN module is introduced to capture the structural information among cells. By joining the ZINB-based autoencoder with the GNN module, the model transfers the data representation learned by autoencoder to the corresponding GNN layer. Furthermore, we adopt a mutual supervised strategy to unify these two different deep neural architectures and to guide the clustering task. Extensive experimental results on six real scRNA-seq datasets demonstrate that scDSC outperforms state-of-the-art methods in terms of clustering accuracy and scalability. Our method scDSC is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/DHUDBlab/scDSC.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University 201600, Shanghai, China
| | - Xingyu Huang
- School of Computer Science and Technology, Donghua University 201600, Shanghai, China
| | - Guobing Zou
- School of Computer Science and Technology, Shanghai University 200444, Shanghai, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University 200433, Shanghai, China
| | - Jihong Guan
- Computer Science and Technology, Tongji University 200092, Shanghai, China
| |
Collapse
|
49
|
Ioannides AA, Orphanides GA, Liu L. Rhythmicity in heart rate and its surges usher a special period of sleep, a likely home for PGO waves. Curr Res Physiol 2022; 5:118-141. [PMID: 35243361 PMCID: PMC8867048 DOI: 10.1016/j.crphys.2022.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 02/01/2022] [Accepted: 02/06/2022] [Indexed: 11/30/2022] Open
Abstract
High amplitude electroencephalogram (EEG) events, like unitary K-complex (KC), are used to partition sleep into stages and hence define the hypnogram, a key instrument of sleep medicine. Throughout sleep the heart rate (HR) changes, often as a steady HR increase leading to a peak, what is known as a heart rate surge (HRS). The hypnogram is often unavailable when most needed, when sleep is disturbed and the graphoelements lose their identity. The hypnogram is also difficult to define during normal sleep, particularly at the start of sleep and the periods that precede and follow rapid eye movement (REM) sleep. Here, we use objective quantitative criteria that group together periods that cannot be assigned to a conventional sleep stage into what we call REM0 periods, with the presence of a HRS one of their defining properties. Extended REM0 periods are characterized by highly regular sequences of HRS that generate an infra-low oscillation around 0.05 Hz. During these regular sequence of HRS, and just before each HRS event, we find avalanches of high amplitude events for each one of the mass electrophysiological signals, i.e. related to eye movement, the motor system and the general neural activity. The most prominent features of long REM0 periods are sequences of three to five KCs which we label multiple K-complexes (KCm). Regarding HRS, a clear dissociation is demonstrated between the presence or absence of high gamma band spectral power (55-95 Hz) of the two types of KCm events: KCm events with strong high frequencies (KCmWSHF) cluster just before the peak of HRS, while KCm between HRS show no increase in high gamma band (KCmNOHF). Tomographic estimates of activity from magnetoencephalography (MEG) in pre-KC periods (single and multiple) showed common increases in the cholinergic Nucleus Basalis of Meynert in the alpha band. The direct contrast of KCmWSHF with KCmNOHF showed increases in all subjects in the high sigma band in the base of the pons and in three subjects in both the delta and high gamma bands in the medial Pontine Reticular Formation (mPRF), the putative Long Lead Initial pulse (LLIP) for Ponto-Geniculo-Occipital (PGO) waves.
Collapse
Affiliation(s)
- Andreas A. Ioannides
- Lab. for Human Brain Dynamics, AAI Scientific Cultural Services Ltd., Nicosia, 1065, Cyprus
| | - Gregoris A. Orphanides
- Lab. for Human Brain Dynamics, AAI Scientific Cultural Services Ltd., Nicosia, 1065, Cyprus
- The English School, Nicosia, 1684, Cyprus
| | - Lichan Liu
- Lab. for Human Brain Dynamics, AAI Scientific Cultural Services Ltd., Nicosia, 1065, Cyprus
| |
Collapse
|
50
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|