1
|
Ghosh S, Mandal SD, Thakur S. Biomarker-driven drug repurposing for NAFLD-associated hepatocellular carcinoma using machine learning integrated ensemble feature selection. FRONTIERS IN BIOINFORMATICS 2025; 5:1522401. [PMID: 40313868 PMCID: PMC12043677 DOI: 10.3389/fbinf.2025.1522401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Accepted: 04/04/2025] [Indexed: 05/03/2025] Open
Abstract
The incidence of non-alcoholic fatty liver disease (NAFLD), encompassing the more severe non-alcoholic steatohepatitis (NASH), is rising alongside the surges in diabetes and obesity. Increasing evidence indicates that NASH is responsible for a significant share of idiopathic hepatocellular carcinoma (HCC) cases, a fatal cancer with a 5-year survival rate below 22%. Biomarkers can facilitate early screening and monitoring of at-risk NAFLD/NASH patients and assist in identifying potential drug candidates for treatment. This study utilized an ensemble feature selection framework to analyze transcriptomic data, identifying biomarker genes associated with the stage-wise progression of NAFLD-related HCC. Seven machine learning algorithms were assessed for disease stage classification. Twelve feature selection methods including correlation-based techniques, mutual information-based methods, and embedded techniques were utilized to rank the top genes as features, through this approach, multiple feature selection methods were combined to yield more robust features important in this disease progression. Cox regression-based survival analysis was carried out to evaluate the biomarker potentiality of these genes. Furthermore, multiphase drug repurposing strategy and molecular docking were employed to identify potential drug candidates against these biomarkers. Among the seven machine learning models initially evaluated, DISCR resulted as the most accurate disease stage classifier. Ensemble feature selection identified ten top genes, among which eight were recognized as potential biomarkers based on survival analysis. These include genes ABAT, ABCB11, MBTPS1, and ZFP1 mostly involved in alanine and glutamate metabolism, butanoate metabolism, and ER protein processing. Through drug repurposing, 81 candidate drugs were found to be effective against these markers genes, with Diosmin, Esculin, Lapatinib, and Phenelzine as the best candidates screened through molecular docking and MMGBSA. The consensus derived from multiple methods enhances the accuracy of identifying relevant robust biomarkers for NAFLD-associated HCC. The use of these biomarkers in a multiphase drug repurposing strategy highlights potential therapeutic options for early intervention, which is essential to stop disease progression and improve outcomes.
Collapse
Affiliation(s)
- Subhajit Ghosh
- Department of Bioinformatics, University of North Bengal, Darjeeling, West Bengal, India
| | - Sukhen Das Mandal
- Department of Computer Science and Engineering, Ghani Khan Choudhury Institute of Engineering and Technology (GKCIET), Malda, India
| | - Subarna Thakur
- Department of Bioinformatics, University of North Bengal, Darjeeling, West Bengal, India
| |
Collapse
|
2
|
Dan Y, Ruan J, Zhu Z, Yu H. Predicting the Toxicity of Drug Molecules with Selecting Effective Descriptors Using a Binary Ant Colony Optimization (BACO) Feature Selection Approach. Molecules 2025; 30:1548. [PMID: 40286190 PMCID: PMC11990530 DOI: 10.3390/molecules30071548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2025] [Revised: 03/26/2025] [Accepted: 03/26/2025] [Indexed: 04/29/2025] Open
Abstract
Predicting the toxicity of drug molecules using in silico quantitative structure-activity relationship (QSAR) approaches is very helpful for guiding safe drug development and accelerating the drug development procedure. The ongoing development of machine learning techniques has made this task easier and more accurate, but it still suffers negative effects from both the severely skewed distribution of active/inactive chemicals and relatively high-dimensional feature distribution. To simultaneously address both of these issues, a binary ant colony optimization feature selection algorithm, called BACO, is proposed in this study. Specifically, it divides the labeled drug molecules into a training set and a validation set multiple times; with each division, the ant colony seeks an optimal feature group that aims to maximize the weighted combination of three specific class imbalance performance metrics (F-measure, G-mean, and MCC) on the validation set. Then, after running all divisions, the frequency of each feature (descriptor) that emerges in the optimal feature groups is calculated and ranked in descending order. Only those high-frequency features are used to train a support vector machine (SVM) and construct the structure-activity relationship (SAR) prediction model. The experimental results for the 12 datasets in the Tox21 challenge, represented by the Modred descriptor calculator, show that the proposed BACO method significantly outperforms several traditional feature selection approaches that have been widely used in QSAR analysis. It only requires a few to a few dozen descriptors for most datasets to exhibit its best performance, which shows its effectiveness and potential application value in cheminformatics.
Collapse
Affiliation(s)
- Yuanyuan Dan
- School of Environmental and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212100, China; (Y.D.); (J.R.); (Z.Z.)
| | - Junhao Ruan
- School of Environmental and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212100, China; (Y.D.); (J.R.); (Z.Z.)
| | - Zhenghua Zhu
- School of Environmental and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212100, China; (Y.D.); (J.R.); (Z.Z.)
| | - Hualong Yu
- School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China
| |
Collapse
|
3
|
Xiang Z. VSS-SpatioNet: a multi-scale feature fusion network for multimodal image integrations. Sci Rep 2025; 15:9306. [PMID: 40102490 PMCID: PMC11920090 DOI: 10.1038/s41598-025-93143-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 03/05/2025] [Indexed: 03/20/2025] Open
Abstract
Infrared and visible image fusion (vis-ir) enhances diagnostic accuracy in medical imaging and biological analysis. Existing CNN-based and Transformer-based methods face computational inefficiencies in modeling global dependencies. The author proposes VSS-SpatioNet, a lightweight architecture that replaces self-attention in Transformers with a Visual State Space (VSS) module for efficient dependency modeling. The framework employs an asymmetric encoder-decoder with a multi-scale autoencoder and a novel VSS-Spatial (VS) fusion block for local-global feature integration. Evaluations on TNO, Harvard Medical, and RoadScene datasets demonstrate superior performance. On TNO, VSS-SpatioNet achieves state-of-the-art Entropy (En = 7.0058) and Mutual Information (MI = 14.0116), outperforming 12 benchmark methods. For RoadScene, it attains gradient-based fusion performance ([Formula: see text]=0.5712), Piella's metric ([Formula: see text]=0.7926), and average gradient (AG = 5.2994), surpassing prior works. On Harvard Medical, the VS strategy improves Mean Gradient by 18.7% (0.0224 vs. 0.0198) against FusionGAN, validating enhanced feature preservation. Results confirm the framework's efficacy in medical applications, particularly precise tissue characterization.
Collapse
Affiliation(s)
- Zeyu Xiang
- College of Information Engineering, Henan University of Science and Technology, Luoyang, 471000, Henan, China.
| |
Collapse
|
4
|
Wang Y, Liu Z, Ma X. MuCST: restoring and integrating heterogeneous morphology images and spatial transcriptomics data with contrastive learning. Genome Med 2025; 17:21. [PMID: 40082941 PMCID: PMC11907906 DOI: 10.1186/s13073-025-01449-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2024] [Accepted: 03/07/2025] [Indexed: 03/16/2025] Open
Abstract
Spatially resolved transcriptomics (SRT) simultaneously measure spatial location, histology images, and transcriptional profiles of cells or regions in undissociated tissues. Integrative analysis of multi-modal SRT data holds immense potential for understanding biological mechanisms. Here, we present a flexible multi-modal contrastive learning for the integration of SRT data (MuCST), which joins denoising, heterogeneity elimination, and compatible feature learning. MuCST accurately identifies spatial domains and is applicable to diverse datasets platforms. Overall, MuCST provides an alternative for integrative analysis of multi-modal SRT data ( https://github.com/xkmaxidian/MuCST ).
Collapse
Affiliation(s)
- Yu Wang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Road, Xi'an, 710071, Shaanxi, China
- Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xidian University, No.2 South Taibai Road, Xi'an, 710071, Shaanxi, China
| | - Zaiyi Liu
- Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, 106 Zhongshan Er Road, Guangzhou, 510080, Guangdong, China
- Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, 106 Zhongshan Er Road, Guangzhou, 510080, Guangdong, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Road, Xi'an, 710071, Shaanxi, China.
- Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, Xidian University, No.2 South Taibai Road, Xi'an, 710071, Shaanxi, China.
| |
Collapse
|
5
|
Zielosko B, Jabloński K, Dmytrenko A. Exploiting Data Distribution: A Multi-Ranking Approach. ENTROPY (BASEL, SWITZERLAND) 2025; 27:278. [PMID: 40149201 PMCID: PMC11940951 DOI: 10.3390/e27030278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Revised: 03/02/2025] [Accepted: 03/03/2025] [Indexed: 03/29/2025]
Abstract
Data heterogeneity is the result of increasing data volumes, technological advances, and growing business requirements in the IT environment. It means that data comes from different sources, may be dispersed in terms of location, and may be stored in different structures and formats. As a result, the management of distributed data requires special integration and analysis techniques to ensure coherent processing and a global view. Distributed learning systems often use entropy-based measures to assess the quality of local data and its impact on the global model. One important aspect of data processing is feature selection. This paper proposes a research methodology for multi-level attribute ranking construction for distributed data. The research was conducted on a publicly available dataset from the UCI Machine Learning Repository. In order to disperse the data, a table division into subtables was applied using reducts, which is a very well-known method from the rough sets theory. So-called local rankings were constructed for local data sources using an approach based on machine learning models, i.e., the greedy algorithm for the induction of decision rules. Two types of classifiers relating to explicit and implicit knowledge representation, i.e., gradient boosting and neural networks, were used to verify the research methodology. Extensive experiments, comparisons, and analysis of the obtained results show the merit of the proposed approach.
Collapse
Affiliation(s)
- Beata Zielosko
- Institute of Computer Science, University of Silesia in Katowice, Bȩdzińska 39, 41-200 Sosnowiec, Poland; (K.J.); (A.D.)
| | | | | |
Collapse
|
6
|
Miao L, Li Z, Gao J. A multi-model machine learning framework for breast cancer risk stratification using clinical and imaging data. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2025; 33:360-375. [PMID: 39973793 DOI: 10.1177/08953996241308175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
PurposeThis study presents a comprehensive machine learning framework for assessing breast cancer malignancy by integrating clinical features with imaging features derived from deep learning.MethodsThe dataset included 1668 patients with documented breast lesions, incorporating clinical data (e.g., age, BI-RADS category, lesion size, margins, and calcifications) alongside mammographic images processed using four CNN architectures: EfficientNet, ResNet, DenseNet, and InceptionNet. Three predictive configurations were developed: an imaging-only model, a hybrid model combining imaging and clinical data, and a stacking-based ensemble model that aggregates both data types to enhance predictive accuracy. Twelve feature selection techniques, including ReliefF and Fisher Score, were applied to identify key predictive features. Model performance was evaluated using accuracy and AUC, with 5-fold cross-valida tion and hyperparameter tuning to ensure robustness.ResultsThe imaging-only models demonstrated strong predictive performance, with EfficientNet achieving an AUC of 0.76. The hybrid model combining imaging and clinical data reached the highest accuracy of 83% and an AUC of 0.87, underscoring the benefits of data integration. The stacking-based ensemble model further optimized accuracy, reaching a peak AUC of 0.94, demonstrating its potential as a reliable tool for malignancy risk assessment.ConclusionThis study highlights the importance of integrating clinical and deep imaging features for breast cancer risk stratification, with the stacking-based model.
Collapse
Affiliation(s)
- Lu Miao
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Zidong Li
- Department of Neurology and Psychiatry, Beijing Shijitan Hospital, Capital Medical University, Beijing, China
| | - Jinnan Gao
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| |
Collapse
|
7
|
Arriero-País EM, Bajo-Rubio MA, Arrojo-García R, Sandoval P, González-Mateo GT, Albar-Vizcaíno P, Del Peso-Gilsanz G, Ossorio-González M, Majano P, López-Cabrera M. Biomarker and clinical data-based predictor tool (MAUXI) for ultrafiltration failure and cardiovascular outcome in peritoneal dialysis patients: a retrospective and longitudinal study. BMJ Health Care Inform 2025; 32:e101138. [PMID: 40021191 PMCID: PMC11873327 DOI: 10.1136/bmjhci-2024-101138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 02/12/2025] [Indexed: 03/03/2025] Open
Abstract
OBJECTIVES To develop a machine learning-based software as a medical device to predict the endurance and outcomes of peritoneal dialysis (PD) patients in real time using effluent-measured biomarkers of the mesothelial-to-mesenchymal transition (MMT). METHODS Retrospective, longitudinal, triple blind study in two independent hospitals (Spain), designed under information-theoretical approaches for feature selection and machine learning-based modelling techniques. A total of 151 (train set) and 32 (validation) PD patients in 1979-2022 were included. PD outcomes were analysed in four categories (endurance, exit from PD, cause of PD end, technical failure) by using MMT biomarkers in effluents and clinical databases. RESULTS MMT biomarkers and clinical data can predict PD with a mean absolute error of 16.99 months by using an Extra Tree (ET) regressor. Linear discriminant analysis (LDA) discerns among transfer to haemodialysis or death, predicts whether the cause of PD end is ultrafiltration failure (UFF) or cardiovascular disease (CVD) and anticipates the type of CVD (receiver operating characteristic curve under the area>0.71). DISCUSSION Our combination of longitudinal PD datasets, attribute shrinkage and gold-standard algorithms with overfitting testing and class imbalance ensures robust predictions in PD. Biomarkers displayed proper mutual information and SHapley values, indicating that MMT processes may have a causal relationship in the development of UFF and CVD. CONCLUSIONS MMT biomarkers and clinical data may be associated in a causal manner with ultrafiltration failure (local effect) and cardiovascular events (systemic effect) in PD. The machine learning-based software MAUXI provides applicability of ET-LDA models with ≤38 variables to predict PD endurance and type of PD technique failure related to peritoneal membrane deterioration.
Collapse
Affiliation(s)
- Eva María Arriero-País
- Tissue and Organ Homeostasis Program, Cell-Cell Communication and Inflammation Unit, Centro de Biologia Molecular Severo Ochoa (CBM), CSIC-UAM, Fundacion General CSIC, Madrid, Spain
| | - María Auxiliadora Bajo-Rubio
- Servicio de Nefrología, Hospital Universitario de la Princesa & Instituto de Investigación la Princesa (IP), Madrid, Spain
- Redes de Investigación Cooperativa Orientadas a Resultados en Salud (RICORS 2040-Renal), Madrid, Spain
| | - Roberto Arrojo-García
- Tissue and Organ Homeostasis Program, Cell-Cell Communication and Inflammation Unit, Centro de Biologia Molecular Severo Ochoa (CBM), CSIC-UAM, Fundacion General CSIC, Madrid, Spain
| | - Pilar Sandoval
- Tissue and Organ Homeostasis Program, Cell-Cell Communication and Inflammation Unit, Centro de Biologia Molecular Severo Ochoa (CBM), CSIC-UAM, Fundacion General CSIC, Madrid, Spain
- Redes de Investigación Cooperativa Orientadas a Resultados en Salud (RICORS 2040-Renal), Madrid, Spain
| | - Guadalupe Tirma González-Mateo
- Redes de Investigación Cooperativa Orientadas a Resultados en Salud (RICORS 2040-Renal), Madrid, Spain
- Premium Research, S.L, Guadalajara, Spain
| | - Patricia Albar-Vizcaíno
- Servicio de Nefrología, Hospital Universitario La Paz & Instituto de Investigación Sanitaria la Paz (IdiPAZ), Madrid, Spain
| | - Gloria Del Peso-Gilsanz
- Redes de Investigación Cooperativa Orientadas a Resultados en Salud (RICORS 2040-Renal), Madrid, Spain
- Servicio de Nefrología, Hospital Universitario La Paz & Instituto de Investigación Sanitaria la Paz (IdiPAZ), Madrid, Spain
| | - Marta Ossorio-González
- Redes de Investigación Cooperativa Orientadas a Resultados en Salud (RICORS 2040-Renal), Madrid, Spain
- Servicio de Nefrología, Hospital Universitario La Paz & Instituto de Investigación Sanitaria la Paz (IdiPAZ), Madrid, Spain
| | - Pedro Majano
- Unidad de Biología Molecular, Hospital Universitario de la Princesa & Instituto de Investigación la Princesa (IP), Madrid, Spain
| | - Manuel López-Cabrera
- Tissue and Organ Homeostasis Program, Cell-Cell Communication and Inflammation Unit, Centro de Biologia Molecular Severo Ochoa (CBM), CSIC-UAM, Fundacion General CSIC, Madrid, Spain
- Redes de Investigación Cooperativa Orientadas a Resultados en Salud (RICORS 2040-Renal), Madrid, Spain
| |
Collapse
|
8
|
Yang Y, Cui Y, Zeng X, Zhang Y, Loza M, Park SJ, Nakai K. STAIG: Spatial transcriptomics analysis via image-aided graph contrastive learning for domain exploration and alignment-free integration. Nat Commun 2025; 16:1067. [PMID: 39870633 PMCID: PMC11772580 DOI: 10.1038/s41467-025-56276-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 01/06/2025] [Indexed: 01/29/2025] Open
Abstract
Spatial transcriptomics is an essential application for investigating cellular structures and interactions and requires multimodal information to precisely study spatial domains. Here, we propose STAIG, a deep-learning model that integrates gene expression, spatial coordinates, and histological images using graph-contrastive learning coupled with high-performance feature extraction. STAIG can integrate tissue slices without prealignment and remove batch effects. Moreover, it is designed to accept data acquired from various platforms, with or without histological images. By performing extensive benchmarks, we demonstrate the capability of STAIG to recognize spatial regions with high precision and uncover new insights into tumor microenvironments, highlighting its promising potential in deciphering spatial biological intricates.
Collapse
Affiliation(s)
- Yitao Yang
- Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, the University of Tokyo, Tokyo, Japan
| | - Yang Cui
- Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, the University of Tokyo, Tokyo, Japan
| | - Xin Zeng
- Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, the University of Tokyo, Tokyo, Japan
| | - Yubo Zhang
- Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, the University of Tokyo, Tokyo, Japan
| | - Martin Loza
- Human Genome Center, the Institute of Medical Science, the University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- Human Genome Center, the Institute of Medical Science, the University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, the University of Tokyo, Tokyo, Japan.
- Human Genome Center, the Institute of Medical Science, the University of Tokyo, Tokyo, Japan.
| |
Collapse
|
9
|
Darshan BSD, Sampathila N, Bairy GM, Prabhu S, Belurkar S, Chadaga K, Nandish S. Differential diagnosis of iron deficiency anemia from aplastic anemia using machine learning and explainable Artificial Intelligence utilizing blood attributes. Sci Rep 2025; 15:505. [PMID: 39747241 PMCID: PMC11695698 DOI: 10.1038/s41598-024-84120-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 12/20/2024] [Indexed: 01/04/2025] Open
Abstract
As per world health organization, Anemia is a most prevalent blood disorder all over the world. Reduced number of Red Blood Cells or decrease in the number of healthy red blood cells is considered as Anemia. This condition also leads to the decrease in the oxygen carrying capacity of the blood. The main goal of this research is to develop a dependable method for diagnosing Aplastic Anemia and Iron Deficiency Anemia by examining the blood test attributes. As of today, there are no studies which use Interpretable Artificial Intelligence to perform the above differential diagnosis. The dataset used in this study is collected from Kasturba Medical College, Manipal. The dataset consisted of various blood test attributes such as Red Blood cell count, Hemoglobin level, Mean Corpuscular Volume, etc. One of the trending topics in Machine Learning is Explainable Artificial Intelligence. They are known to demystify the machine learning outputs to all its stakeholders. Hence, Five XAI tools including SHAP, LIME, Eli5, Qlattice and Anchor are used to understand the model's predictions. The importance characteristics according to XAI models are PLT, PCT, MCV, PDW, HGB, ABS LYMP, WBC, MCH, and MCHC. are employed to train and test the data. The goal of using data analytic techniques is to give medical professionals a useful tool that improves decision-making, enhances resource management, and eventually raises the standard of patient care. By considering the unique qualities of each patient, medical professionals who must rely on AI-assisted diagnosis and treatment suggestions, XAI offers arguments to strengthen their faith in the model outcomes.
Collapse
Affiliation(s)
- B S Dhruva Darshan
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - Niranjana Sampathila
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
| | - G Muralidhar Bairy
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
| | - Srikanth Prabhu
- Department of Computer Science & Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - Sushma Belurkar
- Hematology and Clinical Pathology lab, Kasturba Medical College, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - Krishnaraj Chadaga
- Department of Computer Science & Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - S Nandish
- L&T Technology Services Limited, Mysore, Karnataka, India
| |
Collapse
|
10
|
Wunsch L, Hubold M, Nestler R, Notni G. Realisation of an Application Specific Multispectral Snapshot-Imaging System Based on Multi-Aperture-Technology and Multispectral Machine Learning Loops. SENSORS (BASEL, SWITZERLAND) 2024; 24:7984. [PMID: 39771722 PMCID: PMC11679387 DOI: 10.3390/s24247984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 12/09/2024] [Accepted: 12/12/2024] [Indexed: 01/11/2025]
Abstract
Multispectral imaging (MSI) enables the acquisition of spatial and spectral image-based information in one process. Spectral scene information can be used to determine the characteristics of materials based on reflection or absorption and thus their material compositions. This work focuses on so-called multi aperture imaging, which enables a simultaneous capture (snapshot) of spectrally selective and spatially resolved scene information. There are some limiting factors for the spectral resolution when implementing this imaging principle, e.g., usable sensor resolutions and area, and required spatial scene resolution or optical complexity. Careful analysis is therefore needed for the specification of the multispectral system properties and its realisation. In this work we present a systematic approach for the application-related implementation of this kind of MSI. We focus on spectral system modeling, data analysis, and machine learning to build a universally usable multispectral loop to find the best sensor configuration. The approach presented is demonstrated and tested on the classification of waste, a typical application for multispectral imaging.
Collapse
Affiliation(s)
- Lennard Wunsch
- Group of Quality Assurance and Industrial Image Processing, Faculty of Mechanical Engineering, Technische Universität Ilmenau, Gustav-Kirchhoff-Platz 2, 98693 Ilmenau, Germany;
| | - Martin Hubold
- Fraunhofer Institute for Applied Optics and Precision Engineering IOF Jena, Albert-Einstein-Str. 7, 07745 Jena, Germany;
| | - Rico Nestler
- Zentrum für Bild- und Signalverarbeitung e.V., Werner-von-Siemens-Str. 12, 98693 Ilmenau, Germany;
| | - Gunther Notni
- Group of Quality Assurance and Industrial Image Processing, Faculty of Mechanical Engineering, Technische Universität Ilmenau, Gustav-Kirchhoff-Platz 2, 98693 Ilmenau, Germany;
- Fraunhofer Institute for Applied Optics and Precision Engineering IOF Jena, Albert-Einstein-Str. 7, 07745 Jena, Germany;
| |
Collapse
|
11
|
Shams Alden ZNAM, Ata O. A comprehensive analysis and performance evaluation for osteoporosis prediction models. PeerJ Comput Sci 2024; 10:e2338. [PMID: 39896405 PMCID: PMC11784534 DOI: 10.7717/peerj-cs.2338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 08/28/2024] [Indexed: 02/04/2025]
Abstract
Medical data analysis is an expanding area of study that holds the promise of transforming the healthcare landscape. The use of available data by researchers gives guidelines to improve health practitioners' decision-making capacity, thus enhancing patients' lives. The study looks at using deep learning techniques to predict the onset of osteoporosis from the NHANES 2017-2020 dataset that was preprocessed and arranged into SpineOsteo and FemurOsteo datasets. Two feature selection methods, namely mutual information (MI) and recursive feature elimination (RFE), were applied to sequential deep neural network models, convolutional neural network models, and recurrent neural network models. It can be concluded from the models that the mutual information method achieved higher accuracy than recursive feature elimination, and the MI feature selection CNN model showed better performance by showing 99.15% accuracy for the SpineOsteo dataset and 99.94% classification accuracy for the FemurOsteo dataset. Key findings of this study include family medical history, cases of fractures in patients and parental hip fractures, and regular use of medications like prednisone or cortisone. The research underscores the potential for deep learning in medical data processing, which eventually opens the way for enhanced models for diagnosis and prognosis based on non-image medical data. The implications of the study shall then be important for healthcare providers to be more informed in their decision-making processes for patients' outcomes.
Collapse
Affiliation(s)
- Zahraa Noor Aldeen M. Shams Alden
- Faculty of Tourism Science, University of Kerbala, Kerbala, Iraq
- Department of Electrical and Computer Engineering, Altinbas University, Istanbul, Turkey
| | - Oguz Ata
- Department of Software Engineering, Engineering and Architecture Faculty, Altinbas University, İstanbul, Turkey
| |
Collapse
|
12
|
Xiu YH, Sun SL, Zhou BW, Wan Y, Tang H, Long HX. DGSIST: Clustering spatial transcriptome data based on deep graph structure Infomax. Methods 2024; 231:226-236. [PMID: 39413889 DOI: 10.1016/j.ymeth.2024.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 09/26/2024] [Accepted: 10/04/2024] [Indexed: 10/18/2024] Open
Abstract
Although spatial transcriptomics data provide valuable insights into gene expression profiles and the spatial structure of tissues, most studies rely solely on gene expression information, underutilizing the spatial data. To fully leverage the potential of spatial transcriptomics and graph neural networks, the DGSI (Deep Graph Structure Infomax) model is proposed. This innovative graph data processing model uses graph convolutional neural networks and employs an unsupervised learning approach. It maximizes the mutual information between graph-level and node-level representations, emphasizing flexible sampling and aggregation of nodes and their neighbors. This effectively captures and incorporates local information from nodes into the overall graph structure. Additionally, this paper developed the DGSIST framework, an unsupervised cell clustering method that integrates the DGSI model, SVD dimensionality reduction algorithm, and k-means++ clustering algorithm. This aims to identify cell types accurately. DGSIST fully uses spatial transcriptomics data and outperforms existing methods in accuracy. Demonstrations of DGSIST's capability across various tissue types and technological platforms have shown its effectiveness in accurately identifying spatial domains in multiple tissue sections. Compared to other spatial clustering methods, DGSIST excels in cell clustering and effectively eliminates batch effects without needing batch correction. DGSIST excels in spatial clustering analysis, spatial variation identification, and differential gene expression detection and directly applies to graph analysis tasks, such as node classification, link prediction, or graph clustering. Anticipation lies in the contribution of the DGSIST framework to a deeper understanding of the spatial organizational structures of diseases such as cancer.
Collapse
Affiliation(s)
- Yu-Han Xiu
- College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China
| | - Si-Lin Sun
- College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China
| | - Bing-Wei Zhou
- College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China
| | - Ying Wan
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China; Medical Engineering & Medical Informatics Integration and Transformational Medicine Key Laboratory of Luzhou City, Luzhou 646000, China.
| | - Hai-Xia Long
- College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China.
| |
Collapse
|
13
|
Zhang Z, Wang M, Dai R, Wang Z, Lei L, Zhao X, Han K, Shi C, Guo Q. GraphCVAE: Uncovering cell heterogeneity and therapeutic target discovery through residual and contrastive learning. Life Sci 2024; 359:123208. [PMID: 39488267 DOI: 10.1016/j.lfs.2024.123208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 09/03/2024] [Accepted: 10/30/2024] [Indexed: 11/04/2024]
Abstract
Advancements in Spatial Transcriptomics (ST) technologies in recent years have transformed the analysis of tissue structure and function within spatial contexts. However, accurately identifying spatial domains remains challenging due to data sparsity and noise. Traditional clustering methods often fail to capture spatial dependencies, while spatial clustering methods struggle with batch effects and data integration. We introduce GraphCVAE, a model designed to enhance spatial domain identification by integrating spatial and morphological information, correcting batch effects, and managing heterogeneous data. GraphCVAE employs a multi-layer Graph Convolutional Network (GCN) and a variational autoencoder to improve the representation and integration of spatial information. Through contrastive learning, the model captures subtle differences between cell types and states. Extensive testing on various ST datasets demonstrates GraphCVAE's robustness and biological contributions. In the dorsolateral prefrontal cortex (DLPFC) dataset, it accurately delineates cortical layer boundaries. In glioblastoma, GraphCVAE reveals critical therapeutic targets such as TF and NFIB. In colorectal cancer, it explores the role of the extracellular matrix in colorectal cancer. The model's performance metrics consistently surpass existing methods, validating its effectiveness. GraphCVAE's advanced visualization capabilities further highlight its precision in resolving spatial structures, making it a powerful tool for spatial transcriptomics analysis and offering new insights into disease studies.
Collapse
Affiliation(s)
- Zhiwei Zhang
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Mengqiu Wang
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Ruoyan Dai
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Zhenghui Wang
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Lixin Lei
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Xudong Zhao
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Kaitai Han
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Chaojing Shi
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Qianjin Guo
- Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102617, China.
| |
Collapse
|
14
|
Longkumer I, Mazumder DH. A novel parallel feature rank aggregation algorithm for gene selection applied to microarray data classification. Comput Biol Chem 2024; 112:108182. [PMID: 39197395 DOI: 10.1016/j.compbiolchem.2024.108182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/07/2024] [Accepted: 08/22/2024] [Indexed: 09/01/2024]
Abstract
Microarray data often comprises numerous genes, yet not all genes are relevant for predicting cancer. Feature selection becomes a crucial step to reduce the high dimensionality in these kinds of data. While no single feature selection method consistently outperforms others across diverse domains, the combination of multiple feature selectors or rankers tends to produce more effective results compared to relying on a single ranker alone. However, this approach can be computationally expensive, particularly when handling a large quantity of features. Hence, this paper presents a parallel feature rank aggregation that utilizes borda count as the rank aggregator. The concept of vertically partitioning the data along feature space was adapted to ease the parallel execution of the aggregation task. Features were selected based on the final aggregated rank list, and their classification performances were evaluated. The model's execution time was also observed across multiple worker nodes of the cluster. The experiment was conducted on six benchmark microarray datasets. The results show the capability of the proposed distributed framework compared to the sequential version in all the cases. It also illustrated the improved accuracy performance of the proposed method and its ability to select a minimal number of genes.
Collapse
Affiliation(s)
- Imtisenla Longkumer
- National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103, India
| | | |
Collapse
|
15
|
Zhang Q, Mao D, Tu Y, Wu YY. A New Fingerprint and Graph Hybrid Neural Network for Predicting Molecular Properties. J Chem Inf Model 2024; 64:5853-5866. [PMID: 39052623 DOI: 10.1021/acs.jcim.4c00586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
Machine learning plays a role in accelerating drug discovery, and the design of effective machine learning models is crucial for accurately predicting molecular properties. Characterizing molecules typically involves the use of molecular fingerprints and molecular graphs. These are input into a multilayer perceptron (MLP) and variants of graph neural networks, such as graph attention networks (GATs). Due to the diverse types and large dimension of fingerprints, models may contain many features that are relatively irrelevant or redundant; meanwhile, although the GAT excels in handling heterogeneous graph tasks, it lacks the ability to extract collaborative information from neighboring nodes, which is crucial in scenarios where it cannot capture the joint influence of adjacent groups on atoms. To overcome these challenges, we introduce a hybrid model, combining improved GAT and MLP. In GAT, the recurrent neural network is employed to capture collaborative information. To address the dimensionality issue, we propose a feature selection algorithm, which is based on the principle of maximizing relevance while minimizing redundancy. Through experiments on 13 public data sets and 14 breast cell lines, our model demonstrates superior performance compared to state-of-the-art deep learning and traditional machine learning algorithms. Additionally, a series of ablation experiments were conducted to demonstrate the advantages of our improved version, as well as its antinoise capability and interpretability. These results indicate that our model holds promising prospects for practical applications.
Collapse
Affiliation(s)
- Qingtian Zhang
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Dangxin Mao
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yusong Tu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| | - Yuan-Yan Wu
- College of Physics Science and Technology, Yangzhou University, Jiangsu 225009, China
| |
Collapse
|
16
|
Sun D, Amiri M, Unnithan RR, French C. Protocol for calcium imaging and analysis of hippocampal CA1 activity evoked by non-spatial stimuli. STAR Protoc 2024; 5:103110. [PMID: 38843398 PMCID: PMC11216012 DOI: 10.1016/j.xpro.2024.103110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/22/2024] [Accepted: 05/15/2024] [Indexed: 06/26/2024] Open
Abstract
The hippocampus has a major role in processing spatial information but has been found to encode non-spatial information from multisensory modalities in recent studies. Here, we present a protocol for recording non-spatial stimuli (visual, auditory, and a combination) that evoked calcium activity of hippocampal CA1 neuronal ensembles in C57BL/6 mice using a miniaturized fluorescence microscope. We describe steps for experimental apparatus setup, surgical procedures, software development, and neuronal population activity analysis. For complete details on the use and execution of this protocol, please refer to Sun et al.1.
Collapse
Affiliation(s)
- Dechuan Sun
- Neural Dynamics Laboratory, Department of Medicine, The University of Melbourne, Melbourne, VIC 3051, Australia; Department of Electrical and Electronic Engineering, The University of Melbourne, Melbourne, VIC 3051, Australia.
| | - Mona Amiri
- Neural Dynamics Laboratory, Department of Medicine, The University of Melbourne, Melbourne, VIC 3051, Australia
| | | | - Chris French
- Neural Dynamics Laboratory, Department of Medicine, The University of Melbourne, Melbourne, VIC 3051, Australia.
| |
Collapse
|
17
|
Kim J, Seok J. ctGAN: combined transformation of gene expression and survival data with generative adversarial network. Brief Bioinform 2024; 25:bbae325. [PMID: 38980369 PMCID: PMC11232285 DOI: 10.1093/bib/bbae325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/29/2024] [Accepted: 06/21/2024] [Indexed: 07/10/2024] Open
Abstract
Recent studies have extensively used deep learning algorithms to analyze gene expression to predict disease diagnosis, treatment effectiveness, and survival outcomes. Survival analysis studies on diseases with high mortality rates, such as cancer, are indispensable. However, deep learning models are plagued by overfitting owing to the limited sample size relative to the large number of genes. Consequently, the latest style-transfer deep generative models have been implemented to generate gene expression data. However, these models are limited in their applicability for clinical purposes because they generate only transcriptomic data. Therefore, this study proposes ctGAN, which enables the combined transformation of gene expression and survival data using a generative adversarial network (GAN). ctGAN improves survival analysis by augmenting data through style transformations between breast cancer and 11 other cancer types. We evaluated the concordance index (C-index) enhancements compared with previous models to demonstrate its superiority. Performance improvements were observed in nine of the 11 cancer types. Moreover, ctGAN outperformed previous models in seven out of the 11 cancer types, with colon adenocarcinoma (COAD) exhibiting the most significant improvement (median C-index increase of ~15.70%). Furthermore, integrating the generated COAD enhanced the log-rank p-value (0.041) compared with using only the real COAD (p-value = 0.797). Based on the data distribution, we demonstrated that the model generated highly plausible data. In clustering evaluation, ctGAN exhibited the highest performance in most cases (89.62%). These findings suggest that ctGAN can be meaningfully utilized to predict disease progression and select personalized treatments in the medical field.
Collapse
Affiliation(s)
- Jaeyoon Kim
- School of Electrical and Computer Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea
| | - Junhee Seok
- School of Electrical and Computer Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea
| |
Collapse
|
18
|
Wang Q, Tao Z, Gao Q, Jiao L. Multi-View Subspace Clustering via Structured Multi-Pathway Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7244-7250. [PMID: 36306291 DOI: 10.1109/tnnls.2022.3213374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Recently, deep multi-view clustering (MVC) has attracted increasing attention in multi-view learning owing to its promising performance. However, most existing deep multi-view methods use single-pathway neural networks to extract features of each view, which cannot explore comprehensive complementary information and multilevel features. To tackle this problem, we propose a deep structured multi-pathway network (SMpNet) for multi-view subspace clustering task in this brief. The proposed SMpNet leverages structured multi-pathway convolutional neural networks to explicitly learn the subspace representations of each view in a layer-wise way. By this means, both low-level and high-level structured features are integrated through a common connection matrix to explore the comprehensive complementary structure among multiple views. Moreover, we impose a low-rank constraint on the connection matrix to decrease the impact of noise and further highlight the consensus information of all the views. Experimental results on five public datasets show the effectiveness of the proposed SMpNet compared with several state-of-the-art deep MVC methods.
Collapse
|
19
|
Li W, Yang F, Wang F, Rong Y, Liu L, Wu B, Zhang H, Yao J. scPROTEIN: a versatile deep graph contrastive learning framework for single-cell proteomics embedding. Nat Methods 2024; 21:623-634. [PMID: 38504113 DOI: 10.1038/s41592-024-02214-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 02/16/2024] [Indexed: 03/21/2024]
Abstract
Single-cell proteomics sequencing technology sheds light on protein-protein interactions, posttranslational modifications and proteoform dynamics in the cell. However, the uncertainty estimation for peptide quantification, data missingness, batch effects and high noise hinder the analysis of single-cell proteomic data. It is important to solve this set of tangled problems together, but the existing methods tailored for single-cell transcriptomes cannot fully address this task. Here we propose a versatile framework designed for single-cell proteomics data analysis called scPROTEIN, which consists of peptide uncertainty estimation based on a multitask heteroscedastic regression model and cell embedding generation based on graph contrastive learning. scPROTEIN can estimate the uncertainty of peptide quantification, denoise protein data, remove batch effects and encode single-cell proteomic-specific embeddings in a unified framework. We demonstrate that scPROTEIN is efficient for cell clustering, batch correction, cell type annotation, clinical analysis and spatially resolved proteomic data exploration.
Collapse
Affiliation(s)
- Wei Li
- College of Artificial Intelligence, Nankai University, Tianjin, China
- AI Lab, Tencent, Shenzhen, China
| | - Fan Yang
- AI Lab, Tencent, Shenzhen, China
| | | | - Yu Rong
- AI Lab, Tencent, Shenzhen, China
| | | | | | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tianjin, China.
| | | |
Collapse
|
20
|
Ranek JS, Stallaert W, Milner JJ, Redick M, Wolff SC, Beltran AS, Stanley N, Purvis JE. DELVE: feature selection for preserving biological trajectories in single-cell data. Nat Commun 2024; 15:2765. [PMID: 38553455 PMCID: PMC10980758 DOI: 10.1038/s41467-024-46773-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 03/07/2024] [Indexed: 04/02/2024] Open
Abstract
Single-cell technologies can measure the expression of thousands of molecular features in individual cells undergoing dynamic biological processes. While examining cells along a computationally-ordered pseudotime trajectory can reveal how changes in gene or protein expression impact cell fate, identifying such dynamic features is challenging due to the inherent noise in single-cell data. Here, we present DELVE, an unsupervised feature selection method for identifying a representative subset of molecular features which robustly recapitulate cellular trajectories. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effects of confounding sources of variation, and instead models cell states from dynamic gene or protein modules based on core regulatory complexes. Using simulations, single-cell RNA sequencing, and iterative immunofluorescence imaging data in the context of cell cycle and cellular differentiation, we demonstrate how DELVE selects features that better define cell-types and cell-type transitions. DELVE is available as an open-source python package: https://github.com/jranek/delve .
Collapse
Affiliation(s)
- Jolene S Ranek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Wayne Stallaert
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - J Justin Milner
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
| | - Margaret Redick
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Samuel C Wolff
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Adriana S Beltran
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Human Pluripotent Cell Core, University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
| | - Natalie Stanley
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Jeremy E Purvis
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
21
|
Kang Y, Zhang H, Guan J. scINRB: single-cell gene expression imputation with network regularization and bulk RNA-seq data. Brief Bioinform 2024; 25:bbae148. [PMID: 38600665 PMCID: PMC11006796 DOI: 10.1093/bib/bbae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 02/26/2024] [Accepted: 03/18/2024] [Indexed: 04/12/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) facilitates the study of cell type heterogeneity and the construction of cell atlas. However, due to its limitations, many genes may be detected to have zero expressions, i.e. dropout events, leading to bias in downstream analyses and hindering the identification and characterization of cell types and cell functions. Although many imputation methods have been developed, their performances are generally lower than expected across different kinds and dimensions of data and application scenarios. Therefore, developing an accurate and robust single-cell gene expression data imputation method is still essential. Considering to maintain the original cell-cell and gene-gene correlations and leverage bulk RNA sequencing (bulk RNA-seq) data information, we propose scINRB, a single-cell gene expression imputation method with network regularization and bulk RNA-seq data. scINRB adopts network-regularized non-negative matrix factorization to ensure that the imputed data maintains the cell-cell and gene-gene similarities and also approaches the gene average expression calculated from bulk RNA-seq data. To evaluate the performance, we test scINRB on simulated and experimental datasets and compare it with other commonly used imputation methods. The results show that scINRB recovers gene expression accurately even in the case of high dropout rates and dimensions, preserves cell-cell and gene-gene similarities and improves various downstream analyses including visualization, clustering and trajectory inference.
Collapse
Affiliation(s)
- Yue Kang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
| | - Hongyu Zhang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China
| |
Collapse
|
22
|
Li Y, Zhou RG, Xu R, Luo J, Hu W, Fan P. Implementing Graph-Theoretic Feature Selection by Quantum Approximate Optimization Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2364-2377. [PMID: 35862330 DOI: 10.1109/tnnls.2022.3190042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Feature selection plays a significant role in computer science; nevertheless, this task is intractable since its search space scales exponentially with the number of dimensions. Motivated by the potential advantages of near-term quantum computing, three graph-theoretic feature selection (GTFS) methods, including minimum cut (MinCut)-based, densest k -subgraph (DkS)-based, and maximal-independent set/minimal vertex cover (MIS/MVC)-based, are investigated in this article, where the original graph-theoretic problems are naturally formulated as the quadratic problems in binary variables and then solved using the quantum approximate optimization algorithm (QAOA). Specifically, three separate graphs are created from the raw feature set, where the vertex set consists of individual features and pairwise measure describes the edge. The corresponding feature subset is generated by deriving a subgraph from the established graph using QAOA. For the above three GTFS approaches, the solving procedure and quantum circuit for the corresponding graph-theoretic problems are formulated with the framework of QAOA. In addition, those proposals could be employed as a local solver and integrated with the Tabu search algorithm for solving large-scale GTFS problems utilizing limited quantum bit resource. Finally, extensive numerical experiments are conducted with 20 publicly available datasets and the results demonstrate that each model is superior to its classical scheme. In addition, the complexity of each model is only O(p n2) even in the worst cases, where p is the number of layers in QAOA and n is the number of features.
Collapse
|
23
|
Cai Y, Wang S. Deeply integrating latent consistent representations in high-noise multi-omics data for cancer subtyping. Brief Bioinform 2024; 25:bbae061. [PMID: 38426322 PMCID: PMC10939425 DOI: 10.1093/bib/bbae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/13/2024] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Cancer is a complex and high-mortality disease regulated by multiple factors. Accurate cancer subtyping is crucial for formulating personalized treatment plans and improving patient survival rates. The underlying mechanisms that drive cancer progression can be comprehensively understood by analyzing multi-omics data. However, the high noise levels in omics data often pose challenges in capturing consistent representations and adequately integrating their information. This paper proposed a novel variational autoencoder-based deep learning model, named Deeply Integrating Latent Consistent Representations (DILCR). Firstly, multiple independent variational autoencoders and contrastive loss functions were designed to separate noise from omics data and capture latent consistent representations. Subsequently, an Attention Deep Integration Network was proposed to integrate consistent representations across different omics levels effectively. Additionally, we introduced the Improved Deep Embedded Clustering algorithm to make integrated variable clustering friendly. The effectiveness of DILCR was evaluated using 10 typical cancer datasets from The Cancer Genome Atlas and compared with 14 state-of-the-art integration methods. The results demonstrated that DILCR effectively captures the consistent representations in omics data and outperforms other integration methods in cancer subtyping. In the Kidney Renal Clear Cell Carcinoma case study, cancer subtypes were identified by DILCR with significant biological significance and interpretability.
Collapse
Affiliation(s)
- Yueyi Cai
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| |
Collapse
|
24
|
Milosevic M, Jin Q, Singh A, Amal S. Applications of AI in multi-modal imaging for cardiovascular disease. FRONTIERS IN RADIOLOGY 2024; 3:1294068. [PMID: 38283302 PMCID: PMC10811170 DOI: 10.3389/fradi.2023.1294068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 12/22/2023] [Indexed: 01/30/2024]
Abstract
Data for healthcare is diverse and includes many different modalities. Traditional approaches to Artificial Intelligence for cardiovascular disease were typically limited to single modalities. With the proliferation of diverse datasets and new methods in AI, we are now able to integrate different modalities, such as magnetic resonance scans, computerized tomography scans, echocardiography, x-rays, and electronic health records. In this paper, we review research from the last 5 years in applications of AI to multi-modal imaging. There have been many promising results in registration, segmentation, and fusion of different magnetic resonance imaging modalities with each other and computer tomography scans, but there are still many challenges that need to be addressed. Only a few papers have addressed modalities such as x-ray, echocardiography, or non-imaging modalities. As for prediction or classification tasks, there have only been a couple of papers that use multiple modalities in the cardiovascular domain. Furthermore, no models have been implemented or tested in real world cardiovascular clinical settings.
Collapse
Affiliation(s)
- Marko Milosevic
- Roux Institute, Northeastern University, Portland, ME, United States
| | - Qingchu Jin
- Roux Institute, Northeastern University, Portland, ME, United States
| | - Akarsh Singh
- College of Engineering, Northeastern University, Boston, MA, United States
| | - Saeed Amal
- Roux Institute, Northeastern University, Portland, ME, United States
| |
Collapse
|
25
|
Guldogan E, Yagin FH, Pinar A, Colak C, Kadry S, Kim J. A proposed tree-based explainable artificial intelligence approach for the prediction of angina pectoris. Sci Rep 2023; 13:22189. [PMID: 38092844 PMCID: PMC10719282 DOI: 10.1038/s41598-023-49673-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 12/11/2023] [Indexed: 12/17/2023] Open
Abstract
Cardiovascular diseases (CVDs) are a serious public health issue that affects and is responsible for numerous fatalities and impairments. Ischemic heart disease (IHD) is one of the most prevalent and deadliest types of CVDs and is responsible for 45% of all CVD-related fatalities. IHD occurs when the blood supply to the heart is reduced due to narrowed or blocked arteries, which causes angina pectoris (AP) chest pain. AP is a common symptom of IHD and can indicate a higher risk of heart attack or sudden cardiac death. Therefore, it is important to diagnose and treat AP promptly and effectively. To forecast AP in women, we constructed a novel artificial intelligence (AI) method employing the tree-based algorithm known as an Explainable Boosting Machine (EBM). EBM is a machine learning (ML) technique that combines the interpretability of linear models with the flexibility and accuracy of gradient boosting. We applied EBM to a dataset of 200 female patients, 100 with AP and 100 without AP, and extracted the most relevant features for AP prediction. We then evaluated the performance of EBM against other AI methods, such as Logistic Regression (LR), Categorical Boosting (CatBoost), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), and Light Gradient Boosting Machine (LightGBM). We found that EBM was the most accurate and well-balanced technique for forecasting AP, with accuracy (0.925) and Youden's index (0.960). We also looked at the global and local explanations provided by EBM to better understand how each feature affected the prediction and how each patient was classified. Our research showed that EBM is a useful AI method for predicting AP in women and identifying the risk factors related to it. This can help clinicians to provide personalized and evidence-based care for female patients with AP.
Collapse
Affiliation(s)
- Emek Guldogan
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey
| | - Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - Abdulvahap Pinar
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey
| | - Seifedine Kadry
- Noroff University College, Kristiansand, Norway
- Artificial Intelligence Research Center (AIRC), Ajman University, 346, Ajman, United Arab Emirates
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos, Lebanon
| | - Jungeun Kim
- Department of Software, Kongju National University, Cheonan, 31080, Korea.
| |
Collapse
|
26
|
Rasiya Koya S, Kar KK, Srivastava S, Tadesse T, Svoboda M, Roy T. An autoencoder-based snow drought index. Sci Rep 2023; 13:20664. [PMID: 38001144 PMCID: PMC10673943 DOI: 10.1038/s41598-023-47999-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/21/2023] [Indexed: 11/26/2023] Open
Abstract
In several regions across the globe, snow has a significant impact on hydrology. The amounts of water that infiltrate the ground and flow as runoff are driven by the melting of snow. Therefore, it is crucial to study the magnitude and effect of snowmelt. Snow droughts, resulting from reduced snow storage, can drastically impact the water supplies in basins where snow predominates, such as in the western United States. Hence, it is important to detect the time and severity of snow droughts efficiently. We propose the Snow Drought Response Index or SnoDRI, a novel indicator that could be used to identify and quantify snow drought occurrences. Our index is calculated using cutting-edge ML algorithms from various snow-related variables. The self-supervised learning of an autoencoder is combined with mutual information in the model. In this study, we use Random Forests for feature extraction for SnoDRI and assess the importance of each variable. We use reanalysis data (NLDAS-2) from 1981 to 2021 for the Pacific United States to study the efficacy of the new snow drought index. We evaluate the index by confirming the coincidence of its interpretation and the actual snow drought incidents.
Collapse
Affiliation(s)
- Sinan Rasiya Koya
- Department of Civil and Environmental Engineering, University of Nebraska-Lincoln, Lincoln, USA
| | - Kanak Kanti Kar
- Department of Civil and Environmental Engineering, University of Nebraska-Lincoln, Lincoln, USA
| | - Shivendra Srivastava
- Department of Civil and Environmental Engineering, University of Nebraska-Lincoln, Lincoln, USA
| | - Tsegaye Tadesse
- National Drought Mitigation Center, University of Nebraska-Lincoln, Lincoln, USA
| | - Mark Svoboda
- National Drought Mitigation Center, University of Nebraska-Lincoln, Lincoln, USA
| | - Tirthankar Roy
- Department of Civil and Environmental Engineering, University of Nebraska-Lincoln, Lincoln, USA.
| |
Collapse
|
27
|
Nazari S, Garcia R. Automatic Skin Cancer Detection Using Clinical Images: A Comprehensive Review. Life (Basel) 2023; 13:2123. [PMID: 38004263 PMCID: PMC10672549 DOI: 10.3390/life13112123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 10/21/2023] [Accepted: 10/23/2023] [Indexed: 11/26/2023] Open
Abstract
Skin cancer has become increasingly common over the past decade, with melanoma being the most aggressive type. Hence, early detection of skin cancer and melanoma is essential in dermatology. Computational methods can be a valuable tool for assisting dermatologists in identifying skin cancer. Most research in machine learning for skin cancer detection has focused on dermoscopy images due to the existence of larger image datasets. However, general practitioners typically do not have access to a dermoscope and must rely on naked-eye examinations or standard clinical images. By using standard, off-the-shelf cameras to detect high-risk moles, machine learning has also proven to be an effective tool. The objective of this paper is to provide a comprehensive review of image-processing techniques for skin cancer detection using clinical images. In this study, we evaluate 51 state-of-the-art articles that have used machine learning methods to detect skin cancer over the past decade, focusing on clinical datasets. Even though several studies have been conducted in this field, there are still few publicly available clinical datasets with sufficient data that can be used as a benchmark, especially when compared to the existing dermoscopy databases. In addition, we observed that the available artifact removal approaches are not quite adequate in some cases and may also have a negative impact on the models. Moreover, the majority of the reviewed articles are working with single-lesion images and do not consider typical mole patterns and temporal changes in the lesions of each patient.
Collapse
|
28
|
Pacheco J, Saiz O, Casado S, Ubillos S. A multistart tabu search-based method for feature selection in medical applications. Sci Rep 2023; 13:17140. [PMID: 37816874 PMCID: PMC10564765 DOI: 10.1038/s41598-023-44437-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/08/2023] [Indexed: 10/12/2023] Open
Abstract
In the design of classification models, irrelevant or noisy features are often generated. In some cases, there may even be negative interactions among features. These weaknesses can degrade the performance of the models. Feature selection is a task that searches for a small subset of relevant features from the original set that generate the most efficient models possible. In addition to improving the efficiency of the models, feature selection confers other advantages, such as greater ease in the generation of the necessary data as well as clearer and more interpretable models. In the case of medical applications, feature selection may help to distinguish which characteristics, habits, and factors have the greatest impact on the onset of diseases. However, feature selection is a complex task due to the large number of possible solutions. In the last few years, methods based on different metaheuristic strategies, mainly evolutionary algorithms, have been proposed. The motivation of this work is to develop a method that outperforms previous methods, with the benefits that this implies especially in the medical field. More precisely, the present study proposes a simple method based on tabu search and multistart techniques. The proposed method was analyzed and compared to other methods by testing their performance on several medical databases. Specifically, eight databases belong to the well-known repository of the University of California in Irvine and one of our own design were used. In these computational tests, the proposed method outperformed other recent methods as gauged by various metrics and classifiers. The analyses were accompanied by statistical tests, the results of which showed that the superiority of our method is significant and therefore strengthened these conclusions. In short, the contribution of this work is the development of a method that, on the one hand, is based on different strategies than those used in recent methods, and on the other hand, improves the performance of these methods.
Collapse
|
29
|
Al Turkestani N, Cai L, Cevidanes L, Bianchi J, Zhang W, Gurgel M, Gillot M, Baquero B, Soroushmehr R. Osteoarthritis Diagnosis Integrating Whole Joint Radiomics and Clinical Features for Robust Learning Models Using Biological Privileged Information. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2023 WORKSHOPS : ISIC 2023, CARE-AI 2023, MEDAGI 2023, DECAF 2023, HELD IN CONJUNCTION WITH MICCAI 2023, VANCOUVER, BC, CANADA, OCTOBER 8-12, 2023, PROCEEDINGS 2023; 14394:193-204. [PMID: 38533395 PMCID: PMC10964798 DOI: 10.1007/978-3-031-47425-5_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
This paper proposes a machine learning model using privileged information (LUPI) and normalized mutual information feature selection method (NMIFS) to build a robust and accurate framework to diagnose patients with Temporomandibular Joint Osteoarthritis (TMJ OA). To build such a model, we employ clinical, quantitative imaging and additional biological markers as privileged information. We show that clinical features play a leading role in the TMJ OA diagnosis and quantitative imaging features, extracted from cone-beam computerized tomography (CBCT) scans, improve the model performance. As the proposed LUPI model employs biological data in the training phase (which boosted the model performance), this data is unnecessary for the testing stage, indicating the model can be widely used even when only clinical and imaging data are collected. The model was validated using 5-fold stratified cross-validation with hyperparameter tuning to avoid the bias of data splitting. Our method achieved an AUC, specificity and precision of 0.81, 0.79 and 0.77, respectively.
Collapse
Affiliation(s)
- Najla Al Turkestani
- Department of Orthodontics and Pediatric Dentistry, University of Michigan, 1011 North University Avenue, Ann Arbor, MI 48109, USA
- Department of Restorative and Aesthetic Dentistry, Faculty of Dentistry, King Abdulaziz University, Jeddah 22252, Saudi Arabia
| | - Lingrui Cai
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Lucia Cevidanes
- Department of Orthodontics and Pediatric Dentistry, University of Michigan, 1011 North University Avenue, Ann Arbor, MI 48109, USA
| | - Jonas Bianchi
- Department of Orthodontics, University of the Pacific, Arthur A. Dugoni School of Dentistry, 155 5th Street, San Francisco, CA 94103, USA
| | - Winston Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Marcela Gurgel
- Department of Orthodontics and Pediatric Dentistry, University of Michigan, 1011 North University Avenue, Ann Arbor, MI 48109, USA
| | - Maxime Gillot
- Department of Orthodontics and Pediatric Dentistry, University of Michigan, 1011 North University Avenue, Ann Arbor, MI 48109, USA
| | - Baptiste Baquero
- Department of Orthodontics and Pediatric Dentistry, University of Michigan, 1011 North University Avenue, Ann Arbor, MI 48109, USA
| | - Reza Soroushmehr
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| |
Collapse
|
30
|
Hu J, Chu C, Zhu P, Yuan M. Visibility graph-based segmentation of multivariate time series data and its application. CHAOS (WOODBURY, N.Y.) 2023; 33:093123. [PMID: 37712915 DOI: 10.1063/5.0152881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 08/22/2023] [Indexed: 09/16/2023]
Abstract
In this paper, we propose an efficient segmentation approach in order to divide a multivariate time series through integrating principal component analysis (PCA), visibility graph theory, and community detection algorithm. Based on structural characteristics, we can automatically divide the high-dimensional time series into several stages. First, we adopt the PCA to reduce the dimensions; thus, a low dimensional time series can be obtained. Hence, we can overcome the curse of dimensionality conduct, which is incurred by multidimensional time sequences. Later, the visibility graph theory is applied to handle these multivariate time series, and corresponding networks can be derived accordingly. Then, we propose a community detection algorithm (the obtained communities correspond to the desired segmentation), while modularity Q is adopted as an objective function to find the optimal. As indicated, the segmentation determined by our method is of high accuracy. Compared with the state-of-art models, we find that our proposed model is of a lower time complexity (O(n3)), while the performance of segmentation is much better. At last, we not only applied this model to generated data with known multiple phases but also applied it to a real dataset of oil futures. In both cases, we obtained excellent segmentation results.
Collapse
Affiliation(s)
- Jun Hu
- School of Economics and Management, Fuzhou University, Fuzhou 350108, China
| | - Chengbin Chu
- School of Economics and Management, Fuzhou University, Fuzhou 350108, China
| | - Peican Zhu
- School of Artificial Intelligence, Optics, and Electronics (iOPEN), Northwestern Polytechnical University, Xian 710072, China
| | - Manman Yuan
- School of Computer Science, Inner Mongolia University, Inner Mongolia 010021, China
| |
Collapse
|
31
|
Luo Y, Cha H, Zuo L, Cheng P, Zhao Q. General cross-modality registration framework for visible and infrared UAV target image registration. Sci Rep 2023; 13:12941. [PMID: 37558713 PMCID: PMC10412594 DOI: 10.1038/s41598-023-39863-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 08/01/2023] [Indexed: 08/11/2023] Open
Abstract
In all-day-all-weather tasks, well-aligned multi-modality images pairs can provide extensive complementary information for image-guided UAV target detection. However, multi-modality images in real scenarios are often misaligned, and images registration is extremely difficult due to spatial deformation and the difficulty narrowing cross-modality discrepancy. To better overcome the obstacle, in this paper, we construct a General Cross-Modality Registration (GCMR) Framework, which explores generation registration pattern to simplify the cross-modality image registration into a easier mono-modality image registration with an Image Cross-Modality Translation Network (ICMTN) module and a Multi-level Residual Dense Registration Network (MRDRN). Specifically, ICMTN module is used to generate a pseudo infrared image taking a visible image as input and correct the distortion of structural information during the translation of image modalities. Benefiting from the favorable geometry correct ability of the ICMTN, we further employs MRDRN module which can fully extract and exploit the mutual information of misaligned images to better registered Visible and Infrared image in a mono-modality setting. We evaluate five variants of our approach on the public Anti-UAV datasets. The extensive experimental results demonstrate that the proposed architecture achieves state-of-the-art performance.
Collapse
Affiliation(s)
- Yu Luo
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China
| | - Hao Cha
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China
| | - Lei Zuo
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China.
| | - Peng Cheng
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China
| | - Qing Zhao
- College of Electronic Engineering, Naval University of Engineering, Wuhan, 4300000, China
| |
Collapse
|
32
|
Du H, Lu D, Wang Z, Ma C, Shi X, Wang X. Fast clustering algorithm based on MST of representative points. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:15830-15858. [PMID: 37919991 DOI: 10.3934/mbe.2023705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
Minimum spanning tree (MST)-based clustering algorithms are widely used to detect clusters with diverse densities and irregular shapes. However, most algorithms require the entire dataset to construct an MST, which leads to significant computational overhead. To alleviate this issue, our proposed algorithm R-MST utilizes representative points instead of all sample points for constructing MST. Additionally, based on the density and nearest neighbor distance, we improved the representative point selection strategy to enhance the uniform distribution of representative points in sparse areas, enabling the algorithm to perform well on datasets with varying densities. Furthermore, traditional methods for eliminating inconsistent edges generally require prior knowledge about the number of clusters, which is not always readily available in practical applications. Therefore, we propose an adaptive method that employs mutual neighbors to identify inconsistent edges and determine the optimal number of clusters automatically. The experimental results indicate that the R-MST algorithm not only improves the efficiency of clustering but also enhances its accuracy.
Collapse
Affiliation(s)
- Hui Du
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Depeng Lu
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Zhihe Wang
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Cuntao Ma
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Xinxin Shi
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Xiaoli Wang
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| |
Collapse
|
33
|
Ranek JS, Stallaert W, Milner J, Stanley N, Purvis JE. Feature selection for preserving biological trajectories in single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.540043. [PMID: 37214963 PMCID: PMC10197710 DOI: 10.1101/2023.05.09.540043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Single-cell technologies can readily measure the expression of thousands of molecular features from individual cells undergoing dynamic biological processes, such as cellular differentiation, immune response, and disease progression. While examining cells along a computationally ordered pseudotime offers the potential to study how subtle changes in gene or protein expression impact cell fate decision-making, identifying characteristic features that drive continuous biological processes remains difficult to detect from unenriched and noisy single-cell data. Given that all profiled sources of feature variation contribute to the cell-to-cell distances that define an inferred cellular trajectory, including confounding sources of biological variation (e.g. cell cycle or metabolic state) or noisy and irrelevant features (e.g. measurements with low signal-to-noise ratio) can mask the underlying trajectory of study and hinder inference. Here, we present DELVE (dynamic selection of locally covarying features), an unsupervised feature selection method for identifying a representative subset of dynamically-expressed molecular features that recapitulates cellular trajectories. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effect of unwanted sources of variation confounding inference, and instead models cell states from dynamic feature modules that constitute core regulatory complexes. Using simulations, single-cell RNA sequencing data, and iterative immunofluorescence imaging data in the context of the cell cycle and cellular differentiation, we demonstrate that DELVE selects features that more accurately characterize cell populations and improve the recovery of cell type transitions. This feature selection framework provides an alternative approach for improving trajectory inference and uncovering co-variation amongst features along a biological trajectory. DELVE is implemented as an open-source python package and is publicly available at: https://github.com/jranek/delve.
Collapse
Affiliation(s)
- Jolene S. Ranek
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Wayne Stallaert
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Justin Milner
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Natalie Stanley
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jeremy E. Purvis
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
34
|
Guan J, Li S, He X, Zhu J, Chen J, Si P. SMMP: A Stable-Membership-Based Auto-Tuning Multi-Peak Clustering Algorithm. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6307-6319. [PMID: 36219667 DOI: 10.1109/tpami.2022.3213574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Since most existing single-prototype clustering algorithms are unsuitable for complex-shaped clusters, many multi-prototype clustering algorithms have been proposed. Nevertheless, the automatic estimation of the number of clusters and the detection of complex shapes are still challenging, and to solve such problems usually relies on user-specified parameters and may be prohibitively time-consuming. Herein, a stable-membership-based auto-tuning multi-peak clustering algorithm (SMMP) is proposed, which can achieve fast, automatic, and effective multi-prototype clustering without iteration. A dynamic association-transfer method is designed to learn the representativeness of points to sub-cluster centers during the generation of sub-clusters by applying the density peak clustering technique. According to the learned representativeness, a border-link-based connectivity measure is used to achieve high-fidelity similarity evaluation of sub-clusters. Meanwhile, based on the assumption that a reasonable clustering should have a relatively stable membership state upon the change of clustering thresholds, SMMP can automatically identify the number of sub-clusters and clusters, respectively. Also, SMMP is designed for large datasets. Experimental results on both synthetic and real datasets demonstrated the effectiveness of SMMP.
Collapse
|
35
|
Wang Y, Gao X, Wang J. Functional Proteomic Profiling Analysis in Four Major Types of Gastrointestinal Cancers. Biomolecules 2023; 13:biom13040701. [PMID: 37189448 DOI: 10.3390/biom13040701] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 04/05/2023] [Accepted: 04/18/2023] [Indexed: 05/17/2023] Open
Abstract
Gastrointestinal (GI) cancer accounts for one in four cancer cases and one in three cancer-related deaths globally. A deeper understanding of cancer development mechanisms can be applied to cancer medicine. Comprehensive sequencing applications have revealed the genomic landscapes of the common types of human cancer, and proteomics technology has identified protein targets and signalling pathways related to cancer growth and progression. This study aimed to explore the functional proteomic profiles of four major types of GI tract cancer based on The Cancer Proteome Atlas (TCPA). We provided an overview of functional proteomic heterogeneity by performing several approaches, including principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), t-stochastic neighbour embedding (t-SNE) analysis, and hierarchical clustering analysis in oesophageal carcinoma (ESCA), stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD), and rectum adenocarcinoma (READ) tumours, to gain a system-wide understanding of the four types of GI cancer. The feature selection approach, mutual information feature selection (MIFS) method, was conducted to screen candidate protein signature subsets to better distinguish different cancer types. The potential clinical implications of candidate proteins in terms of tumour progression and prognosis were also evaluated based on TCPA and The Cancer Genome Atlas (TCGA) databases. The results suggested that functional proteomic profiling can identify different patterns among the four types of GI cancers and provide candidate proteins for clinical diagnosis and prognosis evaluation. We also highlighted the application of feature selection approaches in high-dimensional biological data analysis. Overall, this study could improve the understanding of the complexity of cancer phenotypes and genotypes and thus be applied to cancer medicine.
Collapse
Affiliation(s)
- Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Xiaoguang Gao
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
| | - Jihan Wang
- Institute of Medical Research, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
36
|
Liu T, Fang ZY, Li X, Zhang LN, Cao DS, Yin MZ. Graph deep learning enabled spatial domains identification for spatial transcriptomics. Brief Bioinform 2023; 24:7130976. [PMID: 37080761 DOI: 10.1093/bib/bbad146] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 03/02/2023] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open
Abstract
Advancing spatially resolved transcriptomics (ST) technologies help biologists comprehensively understand organ function and tissue microenvironment. Accurate spatial domain identification is the foundation for delineating genome heterogeneity and cellular interaction. Motivated by this perspective, a graph deep learning (GDL) based spatial clustering approach is constructed in this paper. First, the deep graph infomax module embedded with residual gated graph convolutional neural network is leveraged to address the gene expression profiles and spatial positions in ST. Then, the Bayesian Gaussian mixture model is applied to handle the latent embeddings to generate spatial domains. Designed experiments certify that the presented method is superior to other state-of-the-art GDL-enabled techniques on multiple ST datasets. The codes and dataset used in this manuscript are summarized at https://github.com/narutoten520/SCGDL.
Collapse
Affiliation(s)
- Teng Liu
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, P.R. China
| | - Zhao-Yu Fang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering at Central South University, Hunan, P.R. China
| | - Xin Li
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, P.R. China
| | - Li-Ning Zhang
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, P.R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410003, P.R. China
| | - Ming-Zhu Yin
- Clinical Research Center (CRC), Clinical Pathology Center (CPC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou, Chongqing, P.R. China
- Translational Medicine Research Center (TMRC), School of Medicine, Chongqing University, Shapingba, Chongqing, P.R. China
| |
Collapse
|
37
|
Chen D, Wang W, Wang S, Tan M, Su S, Wu J, Yang J, Li Q, Tang Y, Cao J. Predicting postoperative delirium after hip arthroplasty for elderly patients using machine learning. Aging Clin Exp Res 2023; 35:1241-1251. [PMID: 37052817 DOI: 10.1007/s40520-023-02399-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/20/2023] [Indexed: 04/14/2023]
Abstract
BACKGROUND Postoperative delirium (POD) is a common and severe complication in elderly hip-arthroplasty patients. AIM This study aims to develop and validate a machine learning (ML) model that determines essential features related to POD and predicts POD for elderly hip-arthroplasty patients. METHODS The electronic record data of elderly patients who received hip-arthroplasty surgery between January 2017 and April 2021 were enrolled as the dataset. The Confusion Assessment Method (CAM) was administered to the patients during their perioperative period. The feature section method was employed as a filter to determine leading features. The classical machine learning algorithms were trained in cross-validation processing, and the model with the best performance was built in predicting the POD. Metrics of the area under the curve (AUC), accuracy (ACC), sensitivity, specificity, and F1-score were calculated to evaluate the predictive performance. RESULTS 476 Arthroplasty elderly patients with general anesthesia were included in this study, and the final model combined feature selection method mutual information (MI) and linear binary classifier using logistic regression (LR) achieved an encouraging performance (AUC = 0.94, ACC = 0.88, sensitivity = 0.85, specificity = 0.90, F1-score = 0.87) on a balanced test dataset. CONCLUSION The model could predict POD with satisfying accuracy and reveal important features of suffering POD such as age, Cystatin C, GFR, CHE, CRP, LDH, monocyte count, history of mental illness or psychotropic drug use and intraoperative blood loss. Proper preoperative interventions for these factors could reduce the incidence of POD among elderly patients.
Collapse
Affiliation(s)
- Daiyu Chen
- Department of Anesthesiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Weijia Wang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Siqi Wang
- Department of Anesthesiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Minghe Tan
- Department of Anesthesiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Song Su
- Center for Artificial Intelligence in Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Department of General Surgery (Hepatobiliary Surgery), The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Jiali Wu
- Center for Artificial Intelligence in Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Department of Anesthesiology, The Affiliated Hospital of Southwest Medical University, Luzhou, China
| | - Jun Yang
- Department of Anesthesiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Qingshu Li
- Department of Pathology, School of Basic Medicine, Chongqing Medical University, Chongqing, China
| | - Yong Tang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.
| | - Jun Cao
- Department of Anesthesiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
| |
Collapse
|
38
|
Bao Q, Xie W, Otikovs M, Xia L, Xie H, Liu X, Liu K, Zhang Z, Chen F, Zhou X, Liu C. Unsupervised cycle-consistent network using restricted subspace field map for removing susceptibility artifacts in EPI. Magn Reson Med 2023; 90:458-472. [PMID: 37052369 DOI: 10.1002/mrm.29653] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 02/19/2023] [Accepted: 03/14/2023] [Indexed: 04/14/2023]
Abstract
PURPOSE To design an unsupervised deep neural model for correcting susceptibility artifacts in single-shot Echo Planar Imaging (EPI) and evaluate the model for preclinical and clinical applications. METHODS This work proposes an unsupervised cycle-consistent model based on the restricted subspace field map to take advantage of both the deep learning (DL) and the reverse polarity-gradient (RPG) method for single-shot EPI. The proposed model consists of three main components: (1) DLRPG neural network (DLRPG-net) to obtain field maps based on a pair of images acquired with reversed phase encoding; (2) spin physical model-based modules to obtain the corrected undistorted images based on the learned field map; and (3) cycle-consistency loss between the input images and back-calculated images from each cycle is explored for network training. In addition, the field maps generated by DLRPG-net belong to a restricted subspace, which is a span of predefined cubic splines to ensure the smoothness of the field maps and avoid blurring in the corrected images. This new method is trained and validated on both preclinical and clinical datasets for diffusion MRI. RESULTS The proposed network could effectively generate smooth field maps and correct susceptibility artifacts in single-shot EPI. Simulated and in vivo preclinical/clinical experiments demonstrated that our method outperforms the state-of-the-art susceptibility artifact correction methods. Furthermore, the ablation experiments of the cycle-consistent network and the restricted subspace in generating field maps did show the advantages of DLRPG-net. CONCLUSION The proposed method (DLRPG-net) can effectively correct susceptibility artifacts for preclinical and clinical single-shot EPI sequences.
Collapse
Affiliation(s)
- Qingjia Bao
- Key Laboratory of Magnetic Resonance in Biological Systems, Innovation Academy for Precision Measurement Science and Technology, Wuhan, 430071, People's Republic of China
| | - Weida Xie
- School of Information Engineering, Wuhan University of Technology, Wuhan, People's Republic of China
| | | | - Liyang Xia
- School of Information Engineering, Wuhan University of Technology, Wuhan, People's Republic of China
| | - Han Xie
- Key Laboratory of Magnetic Resonance in Biological Systems, Innovation Academy for Precision Measurement Science and Technology, Wuhan, 430071, People's Republic of China
| | - Xinjie Liu
- Key Laboratory of Magnetic Resonance in Biological Systems, Innovation Academy for Precision Measurement Science and Technology, Wuhan, 430071, People's Republic of China
| | - Kewen Liu
- School of Information Engineering, Wuhan University of Technology, Wuhan, People's Republic of China
| | - Zhi Zhang
- Key Laboratory of Magnetic Resonance in Biological Systems, Innovation Academy for Precision Measurement Science and Technology, Wuhan, 430071, People's Republic of China
| | - Fang Chen
- Key Laboratory of Magnetic Resonance in Biological Systems, Innovation Academy for Precision Measurement Science and Technology, Wuhan, 430071, People's Republic of China
| | - Xin Zhou
- Key Laboratory of Magnetic Resonance in Biological Systems, Innovation Academy for Precision Measurement Science and Technology, Wuhan, 430071, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology-Optics Valley Laboratory, Hubei, 430074, People's Republic of China
| | - Chaoyang Liu
- Key Laboratory of Magnetic Resonance in Biological Systems, Innovation Academy for Precision Measurement Science and Technology, Wuhan, 430071, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology-Optics Valley Laboratory, Hubei, 430074, People's Republic of China
| |
Collapse
|
39
|
Multi-modality data-driven analysis of diagnosis and treatment of psoriatic arthritis. NPJ Digit Med 2023; 6:13. [PMID: 36732611 PMCID: PMC9895430 DOI: 10.1038/s41746-023-00757-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 01/16/2023] [Indexed: 02/04/2023] Open
Abstract
Psoriatic arthritis (PsA) is associated with psoriasis, featured by its irreversible joint symptoms. Despite the significant impact on the healthcare system, it is still challenging to leverage machine learning or statistical models to predict PsA and its progression, or analyze drug efficacy. With 3961 patients' clinical records, we developed a machine learning model for PsA diagnosis and analysis of PsA progression risk, respectively. Furthermore, general additive models (GAMs) and the Kaplan-Meier (KM) method were applied to analyze the efficacy of various drugs on psoriasis treatment and inhibiting PsA progression. The independent experiment on the PsA prediction model demonstrates outstanding prediction performance with an AUC score of 0.87 and an AUPR score of 0.89, and the Jackknife validation test on the PsA progression prediction model also suggests the superior performance with an AUC score of 0.80 and an AUPR score of 0.83, respectively. We also identified that interleukin-17 inhibitors were the more effective drug for severe psoriasis compared to other drugs, and methotrexate had a lower effect in inhibiting PsA progression. The results demonstrate that machine learning and statistical approaches enable accurate early prediction of PsA and its progression, and analysis of drug efficacy.
Collapse
|
40
|
A multi-objective evolutionary algorithm with decomposition and the information feedback for high-dimensional medical data. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
41
|
An in-depth and contrasting survey of meta-heuristic approaches with classical feature selection techniques specific to cervical cancer. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-022-01825-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
42
|
Peng H, Zhang R, Li S, Cao Y, Pan S, Yu PS. Reinforced, Incremental and Cross-Lingual Event Detection From Social Messages. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:980-998. [PMID: 35077355 DOI: 10.1109/tpami.2022.3144993] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detecting hot social events (e.g., political scandal, momentous meetings, natural hazards, etc.) from social messages is crucial as it highlights significant happenings to help people understand the real world. On account of the streaming nature of social messages, incremental social event detection models in acquiring, preserving, and updating messages over time have attracted great attention. However, the challenge is that the existing event detection methods towards streaming social messages are generally confronted with ambiguous events features, dispersive text contents, and multiple languages, and hence result in low accuracy and generalization ability. In this paper, we present a novel reinForced, incremental and cross-lingual social Event detection architecture, namely FinEvent, from streaming social messages. Concretely, we first model social messages into heterogeneous graphs integrating both rich meta-semantics and diverse meta-relations, and convert them to weighted multi-relational message graphs. Second, we propose a new reinforced weighted multi-relational graph neural network framework by using a Multi-agent Reinforcement Learning algorithm to select optimal aggregation thresholds across different relations/edges to learn social message embeddings. To solve the long-tail problem in social event detection, a balanced sampling strategy guided Contrastive Learning mechanism is designed for incremental social message representation learning. Third, a new Deep Reinforcement Learning guided density-based spatial clustering model is designed to select the optimal minimum number of samples required to form a cluster and optimal minimum distance between two clusters in social event detection tasks. Finally, we implement incremental social message representation learning based on knowledge preservation on the graph neural network and achieve the transferring cross-lingual social event detection. We conduct extensive experiments to evaluate the FinEvent on Twitter streams, demonstrating a significant and consistent improvement in model quality with 14%-118%, 8%-170%, and 2%-21% increases in performance on offline, online, and cross-lingual social event detection tasks.
Collapse
|
43
|
Rehioui H, Cuissart B, Ouali A, Lepailleur A, Lamotte JL, Bureau R, Zimmermann A. New Pharmacophore Fingerprints and Weight-matrix Learning for Virtual Screening. Application to Bcr-Abl Data. Mol Inform 2023; 42:e2200210. [PMID: 36221998 DOI: 10.1002/minf.202200210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 10/11/2022] [Indexed: 01/20/2023]
Abstract
In this work, we propose to analyze the potential of a new type of pharmacophoric descriptors coupled to a novel feature transformation technique, called Weight-Matrix Learning (WML, based on a feed-forward neural network). The application concerns virtual screening on a tyrosine kinase named BCR-ABL. First, the compounds were described using three different families of descriptors: our new pharmacophoric descriptors, and two circular fingerprints, ECFP4 and FCFP4. Afterwards, each of these original molecular representations were transformed using either an unsupervised WML method or a supervised one. Finally, using these transformed representations, K-Means clustering algorithm was applied to automatically partition the molecules. Combining our pharmacophoric descriptors with supervised Weight-Matrix Learning (SWMLR ) leads to clearly superior results in terms of several quality measures.
Collapse
Affiliation(s)
- Hajar Rehioui
- GREYC, Normandie Univ., UNICAEN, CNRS - UMR 6072, 14000, Caen, France
| | - Bertrand Cuissart
- GREYC, Normandie Univ., UNICAEN, CNRS - UMR 6072, 14000, Caen, France
| | - Abdelkader Ouali
- GREYC, Normandie Univ., UNICAEN, CNRS - UMR 6072, 14000, Caen, France
| | - Alban Lepailleur
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000, Caen, France
| | - Jean-Luc Lamotte
- GREYC, Normandie Univ., UNICAEN, CNRS - UMR 6072, 14000, Caen, France.,Centre d'Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000, Caen, France
| | - Ronan Bureau
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000, Caen, France
| | | |
Collapse
|
44
|
Physiological Status Prediction Based on a Novel Hybrid Intelligent Scheme. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4610747. [PMID: 36567813 PMCID: PMC9780012 DOI: 10.1155/2022/4610747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 10/16/2022] [Accepted: 11/14/2022] [Indexed: 12/23/2022]
Abstract
Physiological status plays an important role in clinical diagnosis. However, the temporal physiological data change dynamically with time, and the amount of data is large; furthermore, obtaining a complete history of data has become difficult. We propose a hybrid intelligent scheme for physiological status prediction, which can be effectively utilized to predict the physiological status of patients and provide a reference for clinical diagnosis. Our proposed scheme initially extracted the attribute information of nonlinear dynamic changes in physiological signals. The maximum discriminant feature subset was selected by employing conditional relevance mutual information feature selection. An optimal subset of features was fed into the particle swarm optimization-support vector machine classifier to perform classification. For the prediction task, the proposed hybrid intelligent scheme was tested on the Sleep Heart Health Study dataset for sleep status prediction. Experimental results demonstrate that our proposed intelligent scheme outperforms the conventional machine learning classification methods.
Collapse
|
45
|
Wang Y, Wang D, Zhou Y, Zhang X, Quek C. VDPC: Variational Density Peak Clustering Algorithm. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
46
|
Interaction-based clustering algorithm for feature selection: a multivariate filter approach. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01726-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
47
|
Chen T, Zeng Y, Yuan H, Zhong G, Lai LL, Tang YY. Multi-level regularization-based unsupervised multi-view feature selection with adaptive graph learning. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01721-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
48
|
Liang Y, Krotov D, Zaki MJ. Modern Hopfield Networks for graph embedding. Front Big Data 2022; 5:1044709. [PMID: 36466714 PMCID: PMC9713410 DOI: 10.3389/fdata.2022.1044709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 10/31/2022] [Indexed: 09/19/2023] Open
Abstract
The network embedding task is to represent a node in a network as a low-dimensional vector while incorporating the topological and structural information. Most existing approaches solve this problem by factorizing a proximity matrix, either directly or implicitly. In this work, we introduce a network embedding method from a new perspective, which leverages Modern Hopfield Networks (MHN) for associative learning. Our network learns associations between the content of each node and that node's neighbors. These associations serve as memories in the MHN. The recurrent dynamics of the network make it possible to recover the masked node, given that node's neighbors. Our proposed method is evaluated on different benchmark datasets for downstream tasks such as node classification, link prediction, and graph coarsening. The results show competitive performance compared to the common matrix factorization techniques and deep learning based methods.
Collapse
Affiliation(s)
- Yuchen Liang
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, United States
| | - Dmitry Krotov
- MIT-IBM Watson AI Lab, IBM Research, Cambridge, MA, United States
| | - Mohammed J. Zaki
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, United States
| |
Collapse
|
49
|
Category tree distance: a taxonomy-based transaction distance for web user analysis. Data Min Knowl Discov 2022. [DOI: 10.1007/s10618-022-00874-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
50
|
Budimir I, Giampieri E, Saccenti E, Suarez-Diez M, Tarozzi M, Dall'Olio D, Merlotti A, Curti N, Remondini D, Castellani G, Sala C. Intraspecies characterization of bacteria via evolutionary modeling of protein domains. Sci Rep 2022; 12:16595. [PMID: 36198716 PMCID: PMC9534902 DOI: 10.1038/s41598-022-21036-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
The ability to detect and characterize bacteria within a biological sample is crucial for the monitoring of infections and epidemics, as well as for the study of human health and its relationship with commensal microorganisms. To this aim, a commonly used technique is the 16S rRNA gene targeted sequencing. PCR-amplified 16S sequences derived from the sample of interest are usually clustered into the so-called Operational Taxonomic Units (OTUs) based on pairwise similarities. Then, representative OTU sequences are compared with reference (human-made) databases to derive their phylogeny and taxonomic classification. Here, we propose a new reference-free approach to define the phylogenetic distance between bacteria based on protein domains, which are the evolving units of proteins. We extract the protein domain profiles of 3368 bacterial genomes and we use an ecological approach to model their Relative Species Abundance distribution. Based on the model parameters, we then derive a new measurement of phylogenetic distance. Finally, we show that such model-based distance is capable of detecting differences between bacteria in cases in which the 16S rRNA-based method fails, providing a possibly complementary approach , which is particularly promising for the analysis of bacterial populations measured by shotgun sequencing.
Collapse
Affiliation(s)
- Iva Budimir
- Department of Physics and Astronomy 'Augusto Righi', University of Bologna, 40127, Bologna, Italy
| | - Enrico Giampieri
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, 6708 WE, Wageningen, The Netherlands
| | - Maria Suarez-Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University and Research, 6708 WE, Wageningen, The Netherlands
| | - Martina Tarozzi
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy
| | - Daniele Dall'Olio
- Department of Physics and Astronomy 'Augusto Righi', University of Bologna, 40127, Bologna, Italy
| | - Alessandra Merlotti
- Department of Physics and Astronomy 'Augusto Righi', University of Bologna, 40127, Bologna, Italy
| | - Nico Curti
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy
| | - Daniel Remondini
- Department of Physics and Astronomy 'Augusto Righi', University of Bologna, 40127, Bologna, Italy
| | - Gastone Castellani
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy.
| | - Claudia Sala
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, Italy
| |
Collapse
|