1
|
Wang Y, Hanford A, Boroumand M, Kalonia C, Leissa J, Shah M, Pham T, Randolph T, Prajapati I. Assessing subvisible particle risks in monoclonal antibodies: insights from quartz crystal microbalance with dissipation, machine learning, and in silico analysis. MAbs 2025; 17:2501629. [PMID: 40350687 DOI: 10.1080/19420862.2025.2501629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Revised: 04/28/2025] [Accepted: 04/29/2025] [Indexed: 05/14/2025] Open
Abstract
Throughout the lifecycle of biopharmaceutical development and manufacturing, monoclonal antibodies (mAbs) are subjected to diverse interfacial stresses and encounter various container surfaces. These interactions can cause the formation of subvisible particles (SVPs) that complicate developability and stability assessments of the drug products. This study leverages quartz crystal microbalance with dissipation (QCM-D), an interfacial characterization technique, as well as both in silico and experimentally measured physicochemical properties, to investigate the significant differences in SVP formation among different mAbs due to interfacial stresses. We conducted forced degradation experiments in borosilicate glass and high-density polyethylene containers, using agitation and stirring to rank 15 mAbs on SVP risks. Our data indicate that the kinetics of antibody adsorption to solid-liquid interfaces correlate strongly with SVP propensity in the stirring study yet show a weaker correlation with agitation-induced SVPs. In addition, SVP morphology was analyzed using self-supervised machine learning on flow imaging microscopy images. Despite the differing surface chemistry of the two container types, stirring resulted in similar SVP morphologies, in contrast to the unique morphologies produced by agitation. Collectively, our research demonstrates the utility of QCM-D and in silico models in evaluating mAb developability and their tendency to form interface-mediated SVPs, providing a strategy to mitigate risks associated with SVP formation in biotherapeutic development.
Collapse
Affiliation(s)
- Yibo Wang
- Dosage Form Design and Development, BioPharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD, USA
| | - Alexis Hanford
- Dosage Form Design and Development, BioPharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD, USA
| | - Mehdi Boroumand
- Data Science and Modeling, BioPharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD, USA
| | - Cavan Kalonia
- Dosage Form Design and Development, BioPharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD, USA
| | - Jesse Leissa
- Dosage Form Design and Development, BioPharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD, USA
| | - Mitali Shah
- Dosage Form Design and Development, BioPharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD, USA
| | - Tony Pham
- Biologics Engineering, Oncology R&D, AstraZeneca, Gaithersburg, MD, USA
| | - Theodore Randolph
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, CO, USA
| | - Indira Prajapati
- Dosage Form Design and Development, BioPharmaceuticals Development, R&D, AstraZeneca, Gaithersburg, MD, USA
| |
Collapse
|
2
|
Costa MVL, de Aguiar EJ, Rodrigues LS, Traina C, Traina AJM. DEELE-Rad: exploiting deep radiomics features in deep learning models using COVID-19 chest X-ray images. Health Inf Sci Syst 2025; 13:11. [PMID: 39741501 PMCID: PMC11683036 DOI: 10.1007/s13755-024-00330-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 12/17/2024] [Indexed: 01/03/2025] Open
Abstract
Purpose Deep learning-based radiomics techniques have the potential to aid specialists and physicians in performing decision-making in COVID-19 scenarios. Specifically, a Deep Learning (DL) ensemble model is employed to classify medical images when addressing the diagnosis during the classification tasks for COVID-19 using chest X-ray images. It also provides feasible and reliable visual explicability concerning the results to support decision-making. Methods Our DEELE-Rad approach integrates DL and Machine Learning (ML) techniques. We use deep learning models to extract deep radiomics features and evaluate its performance regarding end-to-end classifiers. We avoid successive radiomics approach steps by employing these models with transfer learning techniques from ImageNet, such as VGG16, ResNet50V2, and DenseNet201 architectures. We extract 100 and 500 deep radiomics features from each DL model. We also placed these features into well-established ML classifiers and applied automatic parameter tuning and a cross-validation strategy. Besides, we exploit insights into the decision-making behavior by applying a visual explanation method. Results Experimental evaluation on our proposed approach achieved 89.97% AUC when using 500 deep radiomics features from the DenseNet201 end-to-end classifier. Besides, our ensemble DEELE-Rad method improves the results up to 96.19% AUC for the 500 dimensions. To outperform, ML DEELE-Rad reached the best results with an Accuracy of 98.39% and 99.19% AUC for the same setup. Our visual assessment employs additional possibilities for specialists and physicians to decision-making. Conclusion The results reflect that the DEELE-Rad approach provides robustness and confidence to the images' analysis. Our approach can benefit healthcare specialists when employed at clinical routines and respective decision-making procedures. For reproducibility, our code is available at https://github.com/usmarcv/deele-rad.
Collapse
Affiliation(s)
- Márcus V. L. Costa
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590 Brazil
| | - Erikson J. de Aguiar
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590 Brazil
| | - Lucas S. Rodrigues
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590 Brazil
| | - Caetano Traina
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590 Brazil
| | - Agma J. M. Traina
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590 Brazil
| |
Collapse
|
3
|
Hariri M, Aydın A, Sıbıç O, Somuncu E, Yılmaz S, Sönmez S, Avşar E. LesionScanNet: dual-path convolutional neural network for acute appendicitis diagnosis. Health Inf Sci Syst 2025; 13:3. [PMID: 39654693 PMCID: PMC11625030 DOI: 10.1007/s13755-024-00321-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 11/19/2024] [Indexed: 12/12/2024] Open
Abstract
Acute appendicitis is an abrupt inflammation of the appendix, which causes symptoms such as abdominal pain, vomiting, and fever. Computed tomography (CT) is a useful tool in accurate diagnosis of acute appendicitis; however, it causes challenges due to factors such as the anatomical structure of the colon and localization of the appendix in CT images. In this paper, a novel Convolutional Neural Network model, namely, LesionScanNet for the computer-aided detection of acute appendicitis has been proposed. For this purpose, a dataset of 2400 CT scan images was collected by the Department of General Surgery at Kanuni Sultan Süleyman Research and Training Hospital, Istanbul, Turkey. LesionScanNet is a lightweight model with 765 K parameters and includes multiple DualKernel blocks, where each block contains a convolution, expansion, separable convolution layers, and skip connections. The DualKernel blocks work with two paths of input image processing, one of which uses 3 × 3 filters, and the other path encompasses 1 × 1 filters. It has been demonstrated that the LesionScanNet model has an accuracy score of 99% on the test set, a value that is greater than the performance of the benchmark deep learning models. In addition, the generalization ability of the LesionScanNet model has been demonstrated on a chest X-ray image dataset for pneumonia and COVID-19 detection. In conclusion, LesionScanNet is a lightweight and robust network achieving superior performance with smaller number of parameters and its usage can be extended to other medical application domains.
Collapse
Affiliation(s)
- Muhab Hariri
- Electrical and Electronics Engineering Department, Çukurova University, 01330 Adana, Turkey
| | - Ahmet Aydın
- Biomedical Engineering Department, Çukurova University, 01330 Adana, Turkey
| | - Osman Sıbıç
- General Surgery Department, Derik State Hospital, 47800 Mardin, Turkey
| | - Erkan Somuncu
- General Surgery Department, Kanuni Sultan Suleyman Research and Training Hospital, 34303 Istanbul, Turkey
| | - Serhan Yılmaz
- General Surgery Department, Bilkent City Hospital, 06800 Ankara, Turkey
| | - Süleyman Sönmez
- Interventional Radiology Department, Kanuni Sultan Suleyman Research and Training Hospital, 34303 Istanbul, Turkey
| | - Ercan Avşar
- Section for Fisheries Technology, Institute of Aquatic Resources, DTU Aqua, Technical University of Denmark, 9850 Hirtshals, Denmark
| |
Collapse
|
4
|
Zhang Z, Qiao Z, Han L, Yang H, Qian Z, Wu J. Hyperbolic vision language representation learning on chest radiology images. Health Inf Sci Syst 2025; 13:27. [PMID: 40070450 PMCID: PMC11891115 DOI: 10.1007/s13755-025-00341-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 01/25/2025] [Indexed: 03/14/2025] Open
Abstract
Given the visual-semantic hierarchy between images and texts, hyperbolic embeddings have been employed for visual-semantic representation learning, leveraging the advantages of hierarchy modeling in hyperbolic space. This approach demonstrates notable advantages in zero-shot learning tasks. However, unlike general image-text alignment tasks, textual data in the medical domain often comprises complex sentences describing various conditions or diseases, posing challenges for vision language models to comprehend free-text medical reports. Consequently, we propose a novel pretraining method specifically for medical image-text data in hyperbolic space. This method uses structured radiology reports, which consist of a set of triplets, and then converts these triplets into sentences through prompt engineering. To address the challenge that diseases or symptoms generally occur in local regions, we introduce a global + local image feature extraction module. By leveraging the hierarchy modeling advantages of hyperbolic space, we employ entailment loss to model the partial order relationship between images and texts. Experimental results show that our method exhibits better generalization and superior performance compared to baseline methods in various zero-shot tasks and different datasets.
Collapse
Affiliation(s)
- Zuojing Zhang
- Department of Anesthesiology, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030 China
| | - Zhi Qiao
- Institute of Intelligent Diagnostics, Beijing United-Imaging Research Institute of Intelligent Imaging, Beijing, 100080 China
| | - Linbin Han
- School of Physics, Peking University, Beijing, 100080 China
| | - Hong Yang
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Zhen Qian
- Institute of Intelligent Diagnostics, Beijing United-Imaging Research Institute of Intelligent Imaging, Beijing, 100080 China
| | - Jingxiang Wu
- Department of Anesthesiology, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030 China
| |
Collapse
|
5
|
Poyraz M, Poyraz AK, Dogan Y, Gunes S, Mir HS, Paul JK, Barua PD, Baygin M, Dogan S, Tuncer T, Molinari F, Acharya R. BrainNeXt: novel lightweight CNN model for the automated detection of brain disorders using MRI images. Cogn Neurodyn 2025; 19:53. [PMID: 40124704 PMCID: PMC11929658 DOI: 10.1007/s11571-025-10235-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 12/19/2024] [Accepted: 02/28/2025] [Indexed: 03/25/2025] Open
Abstract
The main aim of this study is to propose a novel convolutional neural network, named BrainNeXt, for the automated brain disorders detection using magnetic resonance images (MRI) images. Furthermore, we aim to investigate the performance of our proposed network on various medical applications. To achieve high/robust image classification performance, we gathered a new MRI dataset belonging to four classes: (1) Alzheimer's disease, (2) chronic ischemia, (3) multiple sclerosis, and (4) control. Inspired by ConvNeXt, we designed BrainNeXt as a lightweight classification model by incorporating the structural elements of the Swin Transformers Tiny model. By training our model on the collected dataset, a pretrained BrainNeXt model was obtained. Additionally, we have suggested a feature engineering (FE) approach based on the pretrained BrainNeXt, which extracted features from fixed-sized patches. To select the most discriminative/informative features, we employed the neighborhood component analysis selector in the feature selection phase. As the classifier for our patch-based FE approach, we utilized the support vector machine classifier. Our recommended BrainNeXt approach achieved an accuracy of 100% and 91.35% for training and validation. The recommended model obtained the test classification accuracy of 94.21%. To further improve the classification performance, we suggested a patch-based DFE approach, which achieved a test accuracy of 99.73%. The obtained results, surpassing 90% accuracy on the test dataset, demonstrate the effectiveness and high classification performance of the proposed models.
Collapse
Affiliation(s)
- Melahat Poyraz
- Department of Radiology, Elazig Fethi Sekin City Hospital, Elazig, Turkey
| | - Ahmet Kursad Poyraz
- Department of Radiology, School of Medicine, Firat University, 23119 Elazig, Turkey
| | - Yusuf Dogan
- Department of Radiology, School of Medicine, Firat University, 23119 Elazig, Turkey
| | - Selva Gunes
- Department of Radiology, School of Medicine, Firat University, 23119 Elazig, Turkey
| | - Hasan S. Mir
- Department of Electrical Engineering, American University of Sharjah, Sharjah, UAE
| | - Jose Kunnel Paul
- Department of Neurology, Government Medical College, Thiruvananthapuram, Kerala India
| | - Prabal Datta Barua
- School of Business (Information System), University of Southern Queensland, Springfield, Australia
| | - Mehmet Baygin
- Department of Computer Engineering, Engineering Faculty, Erzurum Technical University, Erzurum, Turkey
| | - Sengul Dogan
- Department of Digital Forensics Engineering, Technology Faculty, Firat University, Elazig, Turkey
| | - Turker Tuncer
- Department of Digital Forensics Engineering, Technology Faculty, Firat University, Elazig, Turkey
| | - Filippo Molinari
- Department of Electronics and Telecommunications, Politecnico Di Torino, Turin, Italy
| | - Rajendra Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, Australia
| |
Collapse
|
6
|
Zhang Z, Liu A, Gao Y, Qian R, Chen X. Cross-patient seizure prediction via continuous domain adaptation and similar sample replay. Cogn Neurodyn 2025; 19:26. [PMID: 39830598 PMCID: PMC11735696 DOI: 10.1007/s11571-024-10216-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 10/20/2024] [Accepted: 12/29/2024] [Indexed: 01/22/2025] Open
Abstract
Seizure prediction based on electroencephalogram (EEG) for people with epilepsy, a common brain disorder worldwide, has great potential for life quality improvement. To alleviate the high degree of heterogeneity among patients, several works have attempted to learn common seizure feature distributions based on the idea of domain adaptation to enhance the generalization ability of the model. However, existing methods ignore the inherent inter-patient discrepancy within the source patients, resulting in disjointed distributions that impede effective domain alignment. To eliminate this effect, we introduce the concept of multi-source domain adaptation (MSDA), considering each source patient as a separate domain. To avoid additional model complexity from MSDA, we propose a continuous domain adaptation approach for seizure prediction based on the convolutional neural network (CNN), which performs sequential training on multiple source domains. To relieve the model catastrophic forgetting during sequential training, we replay similar samples from each source domain, while learning common feature representations based on subdomain alignment. Evaluated on a publicly available epilepsy dataset, our proposed method attains a sensitivity of 85.0% and a false alarm rate (FPR) of 0.224/h. Compared to the prevailing domain adaptation paradigm and existing domain adaptation works in the field, the proposed method can efficiently capture the knowledge of different patients, extract better common seizure representations, and achieve state-of-the-art performance.
Collapse
Affiliation(s)
- Ziye Zhang
- School of Information Science and Technology, University of Science and Technology of China, Hefei, 230027 China
| | - Aiping Liu
- School of Information Science and Technology, University of Science and Technology of China, Hefei, 230027 China
| | - Yikai Gao
- School of Information Science and Technology, University of Science and Technology of China, Hefei, 230027 China
| | - Ruobing Qian
- Department of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001 China
| | - Xun Chen
- Department of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230001 China
| |
Collapse
|
7
|
Peng D, Sun L, Zhou Q, Zhang Y. AI-driven approaches for automatic detection of sleep apnea/hypopnea based on human physiological signals: a review. Health Inf Sci Syst 2025; 13:7. [PMID: 39712669 PMCID: PMC11659556 DOI: 10.1007/s13755-024-00320-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 11/20/2024] [Indexed: 12/24/2024] Open
Abstract
Sleep apnea/hypopnea is a sleep disorder characterized by repeated pauses in breathing which could induce a series of health problems such as cardiovascular disease (CVD) and even sudden death. Polysomnography (PSG) is the most common way to diagnose sleep apnea/hypopnea. Considering that PSG data acquisition is complex and the diagnosis of sleep apnea/hypopnea requires manual scoring, it is very time-consuming and highly professional. With the development of wearable devices and AI techniques, more and more works have been focused on building machine and deep learning models that use single or multi-modal physiological signals to achieve automated detection of sleep apnea/hypopnea. This paper provides a comprehensive review of automatic sleep apnea/hypopnea detection methods based on AI-based techniques in recent years. We summarize the general process used by existing works with a flow chart, which mainly includes data acquisition, raw signal pre-processing, model construction, event classification, and evaluation, since few papers consider these. Additionally, the commonly used public database and pre-processing methods are also reviewed in this paper. After that, we separately summarize the existing methods related to different modal physiological signals including nasal airflow, pulse oxygen saturation (SpO2), electrocardiogram (ECG), electroencephalogram (EEG) and snoring sound. Furthermore, specific signal pre-processing methods based on the characteristics of different physiological signals are also covered. Finally, challenges need to be addressed, such as limited data availability, imbalanced data problem, multi-center study necessity etc., and future research directions related to AI are discussed.
Collapse
Affiliation(s)
- Dandan Peng
- The Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, 510006 China
| | - Le Sun
- The Department of Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing, 210044 China
| | - Qian Zhou
- The School of Modern Posts, Nanjing University of Posts and Telecommunications, Nanjing, 210003 China
| | - Yanchun Zhang
- School of Computer Science, Zhejiang Normal University, Jinhua, 321000 China
- The Department of New Networks, Peng Cheng Laboratory, Shenzhen, 695571 China
| |
Collapse
|
8
|
Chen PS, Wong J, Chen EE, Chen ALP. Detecting autism in children through drawing characteristics using the visual-motor integration test. Health Inf Sci Syst 2025; 13:18. [PMID: 39877430 PMCID: PMC11769875 DOI: 10.1007/s13755-025-00338-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Accepted: 01/10/2025] [Indexed: 01/31/2025] Open
Abstract
This study introduces a novel classification method to distinguish children with autism from typically developing children. We recruited 50 school-age children in Taiwan, including 44 boys and 6 girls aged 6 to 12 years, and asked them to draw patterns from a visual-motor integration test to collect data and train deep learning classification models. Ensemble learning was adopted to significantly improve the classification accuracy to 0.934. Moreover, we identified five patterns that most effectively differentiate the drawing performance between children with and without ASD. From these five patterns we found that children with ASD had difficulty producing patterns that include circles and spatial relationships. These results align with previous findings in the field of visual-motor perceptions of individuals with autism. Our results offer a potential cross-cultural tool to detect autism, which can further promote early detection and intervention of autism.
Collapse
Affiliation(s)
- Po Sheng Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
| | - Jasin Wong
- Department of Special Education, National Tsing Hua University, Hsinchu, Taiwan
| | - Eva E. Chen
- Interdisciplinary Program of Education, National Tsing Hua University, Hsinchu, Taiwan
- Educational Psychology and Counselling, College of Education, National Tsing Hua University, Hsinchu, Taiwan
| | - Arbee L. P. Chen
- Department of Computer Science and Information Engineering, Asia University, Taichung, Taiwan
| |
Collapse
|
9
|
Wei S, Yang W, Wang E, Wang S, Li Y. A 3D decoupling Alzheimer's disease prediction network based on structural MRI. Health Inf Sci Syst 2025; 13:17. [PMID: 39846055 PMCID: PMC11748674 DOI: 10.1007/s13755-024-00333-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 12/23/2024] [Indexed: 01/24/2025] Open
Abstract
Purpose This paper aims to develop a three-dimensional (3D) Alzheimer's disease (AD) prediction method, thereby bettering current predictive methods, which struggle to fully harness the potential of structural magnetic resonance imaging (sMRI) data. Methods Traditional convolutional neural networks encounter pressing difficulties in accurately focusing on the AD lesion structure. To address this issue, a 3D decoupling, self-attention network for AD prediction is proposed. Firstly, a multi-scale decoupling block is designed to enhance the network's ability to extract fine-grained features by segregating convolutional channels. Subsequently, a self-attention block is constructed to extract and adaptively fuse features from three directions (sagittal, coronal and axial), so that more attention is geared towards brain lesion areas. Finally, a clustering loss function is introduced and combined with the cross-entropy loss to form a joint loss function for enhancing the network's ability to discriminate between different sample types. Results The accuracy of our model is 0.985 for the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and 0.963 for the Australian Imaging, Biomarker & Lifestyle (AIBL) dataset, both of which are higher than the classification accuracy of similar tasks in this category. This demonstrates that our model can accurately distinguish between normal control (NC) and Alzheimer's Disease (AD), as well as between stable mild cognitive impairment (sMCI) and progressive mild cognitive impairment (pMCI). Conclusion The proposed AD prediction network exhibits competitive performance when compared with state-of-the-art methods. The proposed model successfully addresses the challenges of dealing with 3D sMRI image data and the limitations stemming from inadequate information in 2D sections, advancing the utility of predictive methods for AD diagnosis and treatment.
Collapse
Affiliation(s)
- Shicheng Wei
- School of Mathematics and Computing, University of Southern Queensland, 487-535 West Street, Toowoomba, QLD 4350 Australia
| | - Wencheng Yang
- School of Mathematics and Computing, University of Southern Queensland, 487-535 West Street, Toowoomba, QLD 4350 Australia
| | - Eugene Wang
- Personalised Oncology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC Australia
- Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC Australia
| | - Song Wang
- Department of Engineering, La Trobe University, Bundoora, VIC 3086 Australia
| | - Yan Li
- School of Mathematics and Computing, University of Southern Queensland, 487-535 West Street, Toowoomba, QLD 4350 Australia
| |
Collapse
|
10
|
Tsang CC, Zhao C, Liu Y, Lin KPK, Tang JYM, Cheng KO, Chow FWN, Yao W, Chan KF, Poon SNL, Wong KYC, Zhou L, Mak OTN, Lee JCY, Zhao S, Ngan AHY, Wu AKL, Fung KSC, Que TL, Teng JLL, Schnieders D, Yiu SM, Lau SKP, Woo PCY. Automatic identification of clinically important Aspergillus species by artificial intelligence-based image recognition: proof-of-concept study. Emerg Microbes Infect 2025; 14:2434573. [PMID: 39585232 PMCID: PMC11632928 DOI: 10.1080/22221751.2024.2434573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 11/06/2024] [Accepted: 11/21/2024] [Indexed: 11/26/2024]
Abstract
While morphological examination is the most widely used for Aspergillus identification in clinical laboratories, PCR-sequencing and MALDI-TOF MS are emerging technologies in more financially-competent laboratories. However, mycological expertise, molecular biologists and/or expensive equipment are needed for these. Recently, artificial intelligence (AI), especially image recognition, is being increasingly employed in medicine for fast and automated disease diagnosis. We explored the potential utility of AI in identifying Aspergillus species. In this proof-of-concept study, using 2813, 2814 and 1240 images from four clinically important Aspergillus species for training, validation and testing, respectively; the performances and accuracies of automatic Aspergillus identification using colonial images by three different convolutional neural networks were evaluated. Results demonstrated that ResNet-18 outperformed Inception-v3 and DenseNet-121 and is the best algorithm of choice because it made the fewest misidentifications (n = 8) and possessed the highest testing accuracy (99.35%). Images showing more unique morphological features were more accurately identified. AI-based image recognition using colonial images is a promising technology for Aspergillus identification. Given its short turn-around-time, minimal demand of expertise, low reagent/equipment costs and user-friendliness, it has the potential to serve as a routine laboratory diagnostic tool after the database is further expanded.
Collapse
Affiliation(s)
- Chi-Ching Tsang
- School of Medical and Health Sciences, Tung Wah College, Homantin, Hong Kong
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Chenyang Zhao
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Yueh Liu
- Doctoral Program in Translational Medicine and Department of Life Sciences, National Chung Hsing University, Taichung, Taiwan
| | - Ken P. K. Lin
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - James Y. M. Tang
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Kar-On Cheng
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Franklin W. N. Chow
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hunghom, Hong Kong
| | - Weiming Yao
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Ka-Fai Chan
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Sharon N. L. Poon
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Kelly Y. C. Wong
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Lianyi Zhou
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Oscar T. N. Mak
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Jeremy C. Y. Lee
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Suhui Zhao
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Antonio H. Y. Ngan
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Alan K. L. Wu
- Department of Clinical Pathology, Pamela Youde Nethersole Eastern Hospital, Chai Wan, Hong Kong
| | - Kitty S. C. Fung
- Department of Pathology, United Christian Hospital, Kwun Tong, Hong Kong
| | - Tak-Lun Que
- Department of Clinical Pathology, Tuen Mun Hospital, Tuen Mun, Hong Kong
| | - Jade L. L. Teng
- Faculty of Dentistry, The University of Hong Kong, Sai Ying Pun, Hong Kong
| | - Dirk Schnieders
- Department of Computer Science, Faculty of Engineering, The University of Hong Kong, Pokfulam, Hong Kong
| | - Siu-Ming Yiu
- Department of Computer Science, Faculty of Engineering, The University of Hong Kong, Pokfulam, Hong Kong
| | - Susanna K. P. Lau
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Patrick C. Y. Woo
- Department of Microbiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
- Doctoral Program in Translational Medicine and Department of Life Sciences, National Chung Hsing University, Taichung, Taiwan
- The iEGG and Animal Biotechnology Research Center, National Chung Hsing University, Taichung, Taiwan
| |
Collapse
|
11
|
Ren H, Li D, Jing F, Zhang X, Tian X, Xie S, Zhang E, Wang R, He H, He Y, Xue Y, Liu C, Sun Y, Cheng W. LASF: a local adaptive segmentation framework for coronary angiogram segments. Health Inf Sci Syst 2025; 13:19. [PMID: 39881813 PMCID: PMC11772642 DOI: 10.1007/s13755-025-00339-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 01/10/2025] [Indexed: 01/31/2025] Open
Abstract
Coronary artery disease (CAD) remains the leading cause of death globally, highlighting the critical need for accurate diagnostic tools in medical imaging. Traditional segmentation methods for coronary angiograms often struggle with vessel discontinuity and inaccuracies, impeding effective diagnosis and treatment planning. To address these challenges, we developed the Local Adaptive Segmentation Framework (LASF), enhancing the YOLOv8 architecture with dilation and erosion algorithms to improve the continuity and precision of vascular image segmentation. We further enriched the ARCADE dataset by meticulously annotating both proximal and distal vascular segments, thus broadening the dataset's applicability for training robust segmentation models. Our comparative analyses reveal that LASF outperforms well-known models such as UNet and DeepLabV3Plus, demonstrating superior metrics in precision, recall, and F1-score across various testing scenarios. These enhancements ensure more reliable and accurate segmentation, critical for clinical applications. LASF represents a significant advancement in the segmentation of vascular images within coronary angiograms. By effectively addressing the common issues of vessel discontinuity and segmentation accuracy, LASF stands to improve the clinical management of CAD, offering a promising tool for enhancing diagnostic accuracy and patient outcomes in medical settings.
Collapse
Affiliation(s)
- Hao Ren
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
- Institute for Healthcare Artificial Intelligence Application, The Affiliated Guangdong Second Provincial General Hospital of Jinan University, Guangzhou, 510317 China
- Guangzhou Key Laboratory of Smart Home Ward and Health Sensing, Guangzhou, 510317 China
| | - Dongxiao Li
- Hainan International College, Minzu University of China, Hainan, 572423 China
| | - Fengshi Jing
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
- School of Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Xinyue Zhang
- Hainan International College, Minzu University of China, Hainan, 572423 China
| | - Xingyuan Tian
- Hainan International College, Minzu University of China, Hainan, 572423 China
| | - Songlin Xie
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
| | - Erfu Zhang
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
| | - Ruining Wang
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
| | - Han He
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
| | - Yinpan He
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
| | - Yake Xue
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
| | - Chi Liu
- Faculty of Data Science, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
| | - Yu Sun
- Department of Cardiac Intensive Care Unit, Cardiovascular Hospital, The Affiliated Guangdong Second Provincial General Hospital of Jinan University, Guangzhou, 510317 China
| | - Weibin Cheng
- Institute for Healthcare Artificial Intelligence Application, The Affiliated Guangdong Second Provincial General Hospital of Jinan University, Guangzhou, 510317 China
- Guangzhou Key Laboratory of Smart Home Ward and Health Sensing, Guangzhou, 510317 China
- Department of Data Science, College of Computing, City University of Hong Kong, Kowloon, Hong Kong Special Administrative Region China
- GD2H-CityUM Joint Research Centre, City University of Macau, Taipa, 999078 Macao Special Administrative Region China
| |
Collapse
|
12
|
Zhang L, Wu J, Wang L, Wang L, Steffens DC, Qiu S, Potter GG, Liu M. Brain Anatomy Prior Modeling to Forecast Clinical Progression of Cognitive Impairment with Structural MRI. PATTERN RECOGNITION 2025; 165:111603. [PMID: 40290575 PMCID: PMC12021437 DOI: 10.1016/j.patcog.2025.111603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Brain structural MRI has been widely used to assess the future progression of cognitive impairment (CI). Previous learning-based studies usually suffer from the issue of small-sized labeled training data, while a huge amount of structural MRIs exist in large-scale public databases. Intuitively, brain anatomical structures derived from these public MRIs (even without task-specific label information) can boost CI progression trajectory prediction. However, previous studies seldom use such brain anatomy structure information as priors. To this end, this paper proposes a brain anatomy prior modeling (BAPM) framework to forecast the clinical progression of cognitive impairment with small-sized target MRIs by exploring anatomical brain structures. Specifically, the BAPM consists of a pretext model and a downstream model, with a shared brain anatomy-guided encoder to model brain anatomy prior using auxiliary tasks explicitly. Besides the encoder, the pretext model also contains two decoders for two auxiliary tasks (i.e., MRI reconstruction and brain tissue segmentation), while the downstream model relies on a predictor for classification. The brain anatomy-guided encoder is pre-trained with the pretext model on 9,344 auxiliary MRIs without diagnostic labels for anatomy prior modeling. With this encoder frozen, the downstream model is then fine-tuned on limited target MRIs for prediction. We validate BAPM on two CI-related studies with T1-weighted MRIs from 448 subjects. Experimental results suggest the effectiveness of BAPM in (1) four CI progression prediction tasks, (2) MR image reconstruction, and (3) brain tissue segmentation, compared with several state-of-the-art methods.
Collapse
Affiliation(s)
- Lintao Zhang
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jinjian Wu
- The First School of Clinical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510031, China
| | - Lihong Wang
- Department of Psychiatry, University of Connecticut School of Medicine, University of Connecticut, Farmington, CT 06030, USA
| | - Li Wang
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - David C. Steffens
- Department of Psychiatry, University of Connecticut School of Medicine, University of Connecticut, Farmington, CT 06030, USA
| | - Shijun Qiu
- The First School of Clinical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510031, China
| | - Guy G. Potter
- Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Durham, NC 27710, USA
| | - Mingxia Liu
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
13
|
Pang S, Chen Y, Shi X, Wang R, Dai M, Zhu X, Song B, Li K. Interpretable 2.5D network by hierarchical attention and consistency learning for 3D MRI classification. PATTERN RECOGNITION 2025; 164:111539. [DOI: 10.1016/j.patcog.2025.111539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
|
14
|
Gudge S, Tiwari A, Ratnaparkhe M, Jha P. On construction of data preprocessing for real-life SoyLeaf dataset & disease identification using Deep Learning Models. Comput Biol Chem 2025; 117:108417. [PMID: 40086344 DOI: 10.1016/j.compbiolchem.2025.108417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Revised: 02/20/2025] [Accepted: 02/25/2025] [Indexed: 03/16/2025]
Abstract
The vast volumes of data are needed to train Deep Learning Models from scratch to identify illnesses in soybean leaves. However, there is still a lack of sufficient high-quality samples. To overcome this problem, we have developed the real-life SoyLeaf dataset and used the pre-trained Deep Learning Models to identify leaf diseases. In this paper, we have initially developed the real-life SoyLeaf dataset collected from the ICAR-Indian Institute of Soybean Research (IISR) Center, Indore field. This SoyLeaf dataset contains 9786 high-quality soybean leaf images, including healthy and diseased leaves. Following this, we have adapted data preprocessing techniques to enhance the quality of images. In addition, we have utilized several Deep Learning Models, i.e., fourteen Keras Transfer Learning Models, to determine which model best fits the dataset on SoyLeaf diseases. The accuracies of the proposed fine-tuned models using the Adam optimizer are as follows: ResNet50V2 achieves 99.79%, ResNet101V2 achieves 99.89%, ResNet152V2 achieves 99.59%, InceptionV3 achieves 99.83%, InceptionResNetV2 achieves 99.79%, MobileNet achieves 99.82%, MobileNetV2 achieves 99.89%, DenseNet121 achieves 99.87%, and DenseNet169 achieves 99.87%. Similarly, the accuracies of the proposed fine-tuned models using the RMSprop optimizer are as follows: ResNet50V2 achieves 99.49%, ResNet101V2 achieves 99.45%, ResNet152V2 achieves 99.45%, InceptionV3 achieves 99.58%, InceptionResNetV2 achieves 99.88%, MobileNet achieves 99.73%, MobileNetV2 achieves 99.83%, DenseNet121 achieves 99.89%, and DenseNet169 achieves 99.77%. The experimental results of the proposed fine-tuned models show that only ResNet50V2, ResNet101V2, InceptionV3, InceptionResNetV2, MobileNet, MobileNetV2, DenseNet121, and DenseNet169 have performed better in terms of training, validation, and testing accuracies than other state-of-the-art models.
Collapse
Affiliation(s)
- Sujata Gudge
- Indian Institute of Technology Indore, Indore, 453552, Madhya Pradesh, India.
| | - Aruna Tiwari
- Indian Institute of Technology Indore, Indore, 453552, Madhya Pradesh, India.
| | - Milind Ratnaparkhe
- ICAR-Indian Institute of Soybean Research, Indore, 452001, Madhya Pradesh, India.
| | - Preeti Jha
- Koneru Lakshmaiah Education Foundation, Hyderabad, 500043, Telangana, India.
| |
Collapse
|
15
|
Sendra T, Belanger P. On the use of a Transformer Neural Network to deconvolve ultrasonic signals. ULTRASONICS 2025; 152:107639. [PMID: 40157136 DOI: 10.1016/j.ultras.2025.107639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 03/03/2025] [Accepted: 03/12/2025] [Indexed: 04/01/2025]
Abstract
Pulse-echo ultrasonic techniques play a crucial role in assessing wall thickness deterioration in safety-critical industries. Current approaches face limitations with low signal-to-noise ratios, weak echoes, or vague echo patterns typical of heavily corroded profiles. This study proposes a novel combination of Convolution Neural Networks (CNN) and Transformer Neural Networks (TNN) to improve thickness gauging accuracy for complex geometries and echo patterns. Recognizing the strength of TNN in language processing and speech recognition, the proposed network comprises three modules: 1. pre-processing CNN, 2. a Transformer model and 3. a post-processing CNN. Two datasets, one being simulation-generated, and the other, experimentally gathered from a corroded carbon steel staircase specimen, support the training and testing processes. Results indicate that the proposed model outperforms other AI architectures and traditional methods, providing a 5.45% improvement over CNN architectures from NDE literature, a 1.81% improvement over ResNet-50, and a 17.5% improvement compared to conventional thresholding techniques in accurately detecting depths with a precision under 0.5λ.
Collapse
Affiliation(s)
- T Sendra
- Department of Mechanics, Ecole de Technologie Superieure, 1100 Notre-Dame Street West, Montreal, H3C 1K3, QC, Canada.
| | - P Belanger
- Department of Mechanics, Ecole de Technologie Superieure, 1100 Notre-Dame Street West, Montreal, H3C 1K3, QC, Canada.
| |
Collapse
|
16
|
Liao Q, Gardner B, Barlow R, McMillan K, Moore S, Fitzgerald A, Arzhaeva Y, Botwright N, Wang D, Nelis JL. Improving traceability and quality control in the red-meat industry through computer vision-driven physical meat feature tracking. Food Chem 2025; 480:143830. [PMID: 40121878 DOI: 10.1016/j.foodchem.2025.143830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Revised: 02/27/2025] [Accepted: 03/08/2025] [Indexed: 03/25/2025]
Abstract
Current traceability systems rely heavily on external markers which can be altered or tampered with. We hypothesized that the unique intramuscular fat patterns in beef cuts could serve as natural physical identifiers for traceability, while simultaneously providing information about quality attributes. To test our hypothesis, we developed a comprehensive dataset of 38,528 high-resolution beef images from 602 steaks with annotations from human grading and ingredient analysis. Using this dataset, we developed a quality prediction module based on the EfficientNet model, achieving high accuracy in marbling score prediction (96.24% top-1±1, 99.57% top-1±2), breed identification (91.23%), and diet determination (90.90%). Additionally, we demonstrated that internal meat features can be used for traceability, attaining F-1 scores of 0.9942 in sample-to-sample tracing and 0.9479 in sample-to-database tracing. This approach significantly enhances fraud resistance and enables objective quality assessment in the red meat supply chain.
Collapse
Affiliation(s)
- Qiyu Liao
- Data61, CSIRO, Corner Vimiera & Pembroke Rd, Marsfield NSW 2122, Australia.
| | - Brint Gardner
- Scientific Computing, CSIRO, Research Way, Clayton VIC 3168, Australia
| | - Robert Barlow
- Agriculture and Food, CSIRO, St Lucia, QLD 4067, Australia
| | - Kate McMillan
- Agriculture and Food, CSIRO, St Lucia, QLD 4067, Australia
| | - Sean Moore
- Agriculture and Food, CSIRO, St Lucia, QLD 4067, Australia
| | | | - Yulia Arzhaeva
- Data61, CSIRO, Corner Vimiera & Pembroke Rd, Marsfield NSW 2122, Australia
| | | | - Dadong Wang
- Data61, CSIRO, Corner Vimiera & Pembroke Rd, Marsfield NSW 2122, Australia
| | - Joost Ld Nelis
- Agriculture and Food, CSIRO, St Lucia, QLD 4067, Australia
| |
Collapse
|
17
|
Tao D, Deng S, Qiu G, Fu X. Model updating strategy study about sex identification of silkworm pupae using transfer learning and NIR spectroscopy. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 335:125999. [PMID: 40058088 DOI: 10.1016/j.saa.2025.125999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Revised: 02/11/2025] [Accepted: 03/04/2025] [Indexed: 03/24/2025]
Abstract
This paper proposes a novel model updating strategy named SilkwormNet for the first time to address the sex discrimination problem of silkworm pupae with new species. SilkwormNet integrates a ResNet block, a multi-head attention mechanism, and a Schedule-Free optimization strategy. Initially, the preprocessed spectra from one species were input into SilkwormNet to establish an optimal primary model. Then, the feature extraction layers and classification head remained unfrozen and the optimal weight parameters from the basic model were applied for model updating to identify the new species. Finally, SilkwormNet used only 20 % data to update model. Uniform Manifold Approximation and Projection (UMAP) and Confusion Matrix were employed to comprehensively evaluate the results. When the basic model was built using variety 221B_403, the accuracy was highly improved after model updating, for example, for variety 871B_463 increased from 50 % to 99.22 %, for variety 9312_ShanheB increased from 74.22 % to 99.22 %; for variety FB_P71 increased from 69.53 % to 98.44 %; and for variety 7532_906 increased from 50 % to 100 %. When using just 10 % data to update the model, the range of accuracy was between 90.62 % and 95.31 %. The results of SilkwormNet were also compared with SVM, Random Forest, and 1D-CNN to further demonstrate its superiority.
Collapse
Affiliation(s)
- Dan Tao
- School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China
| | - Suyuan Deng
- School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China
| | - Guangying Qiu
- School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang 330013, China.
| | - Xinglan Fu
- College of Engineering and Technology, Southwest University, Chongqing 400700, China.
| |
Collapse
|
18
|
Pang Z, Wang L, Yu F, Zhao K, Zeng B, Xu S. PrivCore: Multiplication-activation co-reduction for efficient private inference. Neural Netw 2025; 187:107307. [PMID: 40054024 DOI: 10.1016/j.neunet.2025.107307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 01/13/2025] [Accepted: 02/20/2025] [Indexed: 03/09/2025]
Abstract
The marriage of deep neural network (DNN) and secure 2-party computation (2PC) enables private inference (PI) on the encrypted client-side data and server-side models with both privacy and accuracy guarantees, coming at the cost of orders of magnitude communication and latency penalties. Prior works on designing PI-friendly network architectures are confined to mitigating the overheads associated with non-linear (e.g., ReLU) operations, assuming other linear computations are free. Recent works have shown that linear convolutions can no longer be ignored and are responsible for the majority of communication in PI protocols. In this work, we present PrivCore, a framework that jointly optimizes the alternating linear and non-linear DNN operators via a careful co-design of sparse Winograd convolution and fine-grained activation reduction, to improve high-efficiency ciphertext computation without impacting the inference precision. Specifically, being aware of the incompatibility between the spatial pruning and Winograd convolution, we propose a two-tiered Winograd-aware structured pruning method that removes spatial filters and Winograd vectors from coarse to fine-grained for multiplication reduction, both of which are specifically optimized for Winograd convolution in a structured pattern. PrivCore further develops a novel sensitivity-based differentiable activation approximation to automate the selection of ineffectual ReLUs and polynomial options. PrivCore also supports the dynamic determination of coefficient-adaptive polynomial replacement to mitigate the accuracy degradation. Extensive experiments on various models and datasets consistently validate the effectiveness of PrivCore, achieving 2.2× communication reduction with 1.8% higher accuracy compared with SENet (ICLR 2023) on CIFAR-100, and 2.0× total communication reduction with iso-accuracy compared with CoPriv (NeurIPS 2023) on ImageNet.
Collapse
Affiliation(s)
- Zhi Pang
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, China.
| | - Lina Wang
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, China.
| | - Fangchao Yu
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, China.
| | - Kai Zhao
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, China.
| | - Bo Zeng
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, China.
| | - Shuwang Xu
- Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University, China.
| |
Collapse
|
19
|
Zhang J, Li G, Su Q, Cao L, Tian Y, Xu B. Enabling scale and rotation invariance in convolutional neural networks with retina like transformation. Neural Netw 2025; 187:107395. [PMID: 40121784 DOI: 10.1016/j.neunet.2025.107395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 10/28/2024] [Accepted: 03/10/2025] [Indexed: 03/25/2025]
Abstract
Traditional convolutional neural networks (CNNs) struggle with scale and rotation transformations, resulting in reduced performance on transformed images. Previous research focused on designing specific CNN modules to extract transformation-invariant features. However, these methods lack versatility and are not adaptable to a wide range of scenarios. Drawing inspiration from human visual invariance, we propose a novel brain-inspired approach to tackle the invariance problem in CNNs. If we consider a CNN as the visual cortex, we have the potential to design an "eye" that exhibits transformation invariance, allowing CNNs to perceive the world consistently. Therefore, we propose a retina module and then integrate it into CNNs to create transformation-invariant CNNs (TICNN), achieving scale and rotation invariance. The retina module comprises a retina-like transformation and a transformation-aware neural network (TANN). The retina-like transformation supports flexible image transformations, while the TANN regulates these transformations for scaling and rotation. Specifically, we propose a reference-based training method (RBTM) where the retina module learns to align input images with a reference scale and rotation, thereby achieving invariance. Furthermore, we provide mathematical substantiation for the retina module to confirm its feasibility. Experimental results also demonstrate that our method outperforms existing methods in recognizing images with scale and rotation variations. The code will be released at https://github.com/JiaHongZ/TICNN.
Collapse
Affiliation(s)
- Jiahong Zhang
- Institute of Automation, Chinese Academy of Sciences, Beijing 100045, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Guoqi Li
- Institute of Automation, Chinese Academy of Sciences, Beijing 100045, China; Peng Cheng Laboratory, Shenzhen, Guangdong 518066, China.
| | - Qiaoyi Su
- Institute of Automation, Chinese Academy of Sciences, Beijing 100045, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Lihong Cao
- State Key Laboratory of Media Convergence and Communication, Communication, University of China, Beijing 100024, China.
| | - Yonghong Tian
- Peng Cheng Laboratory, Shenzhen, Guangdong 518066, China; Institute for Artificial Intelligence, Peking University, Beijing 100871, China.
| | - Bo Xu
- Institute of Automation, Chinese Academy of Sciences, Beijing 100045, China.
| |
Collapse
|
20
|
Xu X, Chen Z, Hu Y, Wang G. More signals matter to detection: Integrating language knowledge and frequency representations for boosting fine-grained aircraft recognition. Neural Netw 2025; 187:107402. [PMID: 40132453 DOI: 10.1016/j.neunet.2025.107402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 01/27/2025] [Accepted: 03/12/2025] [Indexed: 03/27/2025]
Abstract
As object detection tasks progress rapidly, fine-grained detection flourishes as a promising extension. Fine-grained recognition naturally demands high-quality detail signals; however, existing fine-grained detectors, built upon the mainstream detection paradigm, struggle to simultaneously address the challenges of insufficient original signals and the loss of critical signals, resulting in inferior performance. We argue that language signals with advanced semantic knowledge can provide valuable information for fine-grained objects, as well as the frequency domain exhibits greater flexibility in suppressing and enhancing signals; then, we propose a fine-grained aircraft detector by integrating language knowledge and frequency representations into the one-stage detection paradigm. Concretely, by considering both original signals and deep feature signals, we develop three components, including an adaptive frequency augmentation branch (AFAB), a content-aware global features intensifier (CGFI), and a fine-grained text-image interactive feeder (FTIF), to facilitate perceiving and retaining critical signals throughout pivotal detection stages. The AFAB adaptively processes image patches according to their frequency characteristics in the Fourier domain, thus thoroughly mining critical visual content in the data space; the CGFI employs content-aware frequency filtering to enhance global features, allowing for generating an information-rich feature space; the FTIF introduces text knowledge to describe visual differences among fine-grained categories, conveying robust semantic priors from language signals to visual spaces via multimodal interaction for information supplement. Extensive experiments conducted on optical and SAR images demonstrate the superior performance of the proposed fine-grained detector, especially the FTIF, which can be plugged into most existing one-stage detectors to boost their fine-grained recognition performance significantly.
Collapse
Affiliation(s)
- Xueru Xu
- School of Artificial Intelligence and Automation, National Key Laboratory of Multispectral Information Intelligent Processing Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Zhong Chen
- School of Artificial Intelligence and Automation, National Key Laboratory of Multispectral Information Intelligent Processing Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Yuxin Hu
- School of Artificial Intelligence and Automation, National Key Laboratory of Multispectral Information Intelligent Processing Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| | - Guoyou Wang
- School of Artificial Intelligence and Automation, National Key Laboratory of Multispectral Information Intelligent Processing Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
| |
Collapse
|
21
|
Pu R, Yu L, Zhan S, Xu G, Zhou F, Ling CX, Wang B. FedELR: When federated learning meets learning with noisy labels. Neural Netw 2025; 187:107275. [PMID: 40081270 DOI: 10.1016/j.neunet.2025.107275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 02/09/2025] [Accepted: 02/12/2025] [Indexed: 03/15/2025]
Abstract
Existing research on federated learning (FL) usually assumes that training labels are of high quality for each client, which is impractical in many real-world scenarios (e.g., noisy labels by crowd-sourced annotations), leading to dramatic performance degradation. In this work, we investigate noisy FL through the lens of early-time training phenomenon (ETP). Specifically, a key finding of this paper is that the early training phase varies among different local clients due to the different noisy classes in each client. In addition, we show that such an inconsistency also exists between the local and global models. As a result, local clients would always begin to memorize noisy labels before the global model reaches the optimal, which inevitably leads to the degradation of the quality of service in real-world FL applications (e.g. tumor image classification among different hospitals). Our findings provide new insights into the learning dynamics and shed light on the essence cause of this degradation in noisy FL. To address this problem, we reveal a new principle for noisy FL: it is necessary to align the early training phases across local models. To this end, we propose FedELR, a simple yet effective framework that aims to force local models to stick to their early training phase via an early learning regularization (ELR), so that the learning dynamics of local models can be kept at the same pace. Moreover, this also leverages the ETP in local clients, leading each client to take more training steps in learning a more robust local model for optimal global aggregation. Extensive experiments on various real-world datasets also validate the effectiveness of our proposed methods.
Collapse
Affiliation(s)
- Ruizhi Pu
- Western University, Department of Computer Science, London, N6A 5B7, Ontario, Canada
| | - Lixing Yu
- Yunnan University, School of Information Science and Engineering, Kunming, 650500, Yunnan Province, China.
| | - Shaojie Zhan
- Yunnan University, School of Information Science and Engineering, Kunming, 650500, Yunnan Province, China
| | - Gezheng Xu
- Western University, Department of Computer Science, London, N6A 5B7, Ontario, Canada
| | - Fan Zhou
- Beihang University, Department of Automotive Engineering, Beijing, 100191, China
| | - Charles X Ling
- Western University, Department of Computer Science, London, N6A 5B7, Ontario, Canada
| | - Boyu Wang
- Western University, Department of Computer Science, London, N6A 5B7, Ontario, Canada
| |
Collapse
|
22
|
Jiang S, Zhang D, Cheng F, Lu X, Liu Q. DuPt: Rehearsal-based continual learning with dual prompts. Neural Netw 2025; 187:107306. [PMID: 40043489 DOI: 10.1016/j.neunet.2025.107306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 01/01/2025] [Accepted: 02/19/2025] [Indexed: 04/29/2025]
Abstract
The rehearsal-based continual learning methods usually involve reviewing a small number of representative samples to enable the network to learn new contents while retaining old knowledge. However, existing works overlook two crucial factors: (1) While the network prioritizes learning new data at incremental stages, it exhibits weaker generalization capabilities when trained individually on limited samples from specific categories, in contrast to training on large-scale samples across multiple categories simultaneously. (2) Knowledge distillation of a limited set of old samples can transfer certain existing knowledge, but imposing strong constraints may hinder knowledge transfer and restrict the ability of the network from the current stage to capture fresh knowledge. To alleviate these issues, we propose a rehearsal-based continual learning method with dual prompts, termed DuPt. First, we propose an input-aware prompt, an input-level cue that utilizes an input prior to querying for valid cue information. These hints serve as an additional complement to help the input samples generate more rational and diverse distributions. Second, we introduce a proxy feature prompt, a feature-level hint that bridges the knowledge gap between the teacher and student models to maintain consistency in the feature transfer process, reinforcing feature plasticity and stability. This is because differences in network features between the new and old incremental stages could affect the generalization of their new models if strictly aligned. Our proposed prompt can act as a consistency regularization to avoid feature conflicts caused by the differences between network features. Extensive experiments validate the effectiveness of our method, which can seamlessly integrate with existing methods, leading to performance improvements.
Collapse
Affiliation(s)
- Shengqin Jiang
- School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China; Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing, 210044, China
| | - Daolong Zhang
- School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Fengna Cheng
- College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China
| | - Xiaobo Lu
- School of Automatic, Southeast University, Nanjing 210096, China
| | - Qingshan Liu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.
| |
Collapse
|
23
|
Zhang H, Chen Q, Lui LM. Deformation-invariant neural network and its applications in distorted image restoration and analysis. Neural Netw 2025; 187:107378. [PMID: 40121786 DOI: 10.1016/j.neunet.2025.107378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 10/17/2024] [Accepted: 03/07/2025] [Indexed: 03/25/2025]
Abstract
Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition. Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images. In this paper, we propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distorted images. The DINN outputs consistent latent features for images that are geometrically distorted but represent the same underlying object or scene. The idea of DINN is to incorporate a simple component, called the quasiconformal transformer network (QCTN), into other existing deep networks for imaging tasks. The QCTN is a deep neural network that outputs a quasiconformal map, which can be used to transform a geometrically distorted image into an improved version that is closer to the distribution of natural or good images. It first outputs a Beltrami coefficient, which measures the quasiconformality of the output deformation map. By controlling the Beltrami coefficient, the local geometric distortion under the quasiconformal mapping can be controlled. The QCTN is lightweight and simple, which can be readily integrated into other existing deep neural networks to enhance their performance. Leveraging our framework, we have developed an image classification network that achieves accurate classification of distorted images. Our proposed framework has been applied to restore geometrically distorted images by atmospheric turbulence and water turbulence. DINN outperforms existing GAN-based restoration methods under these scenarios, demonstrating the effectiveness of the proposed framework. Additionally, we apply our proposed framework to the 1-1 verification of human face images under atmospheric turbulence and achieve satisfactory performance, further demonstrating the efficacy of our approach.
Collapse
Affiliation(s)
- Han Zhang
- City University of Hong Kong, Kowloon, Hong Kong, China; Hong Kong Center for Cerebro-Cardiovascular Health Engineering, Sha Tin, Hong Kong, China.
| | - Qiguang Chen
- Chinese University of Hong Kong, Sha Tin, Hong Kong, China.
| | - Lok Ming Lui
- Chinese University of Hong Kong, Sha Tin, Hong Kong, China.
| |
Collapse
|
24
|
Zhou W, Lin K, Zheng Z, Chen D, Su T, Hu H. DRTN: Dual Relation Transformer Network with feature erasure and contrastive learning for multi-label image classification. Neural Netw 2025; 187:107309. [PMID: 40048756 DOI: 10.1016/j.neunet.2025.107309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 12/14/2024] [Accepted: 02/21/2025] [Indexed: 04/29/2025]
Abstract
The objective of multi-label image classification (MLIC) task is to simultaneously identify multiple objects present in an image. Several researchers directly flatten 2D feature maps into 1D grid feature sequences, and utilize Transformer encoder to capture the correlations of grid features to learn object relationships. Although obtaining promising results, these Transformer-based methods lose spatial information. In addition, current attention-based models often focus only on salient feature regions, but ignore other potential useful features that contribute to MLIC task. To tackle these problems, we present a novel Dual Relation Transformer Network (DRTN) for MLIC task, which can be trained in an end-to-end manner. Concretely, to compensate for the loss of spatial information of grid features resulting from the flattening operation, we adopt a grid aggregation scheme to generate pseudo-region features, which does not need to make additional expensive annotations to train object detector. Then, a new dual relation enhancement (DRE) module is proposed to capture correlations between objects using two different visual features, thereby complementing the advantages provided by both grid and pseudo-region features. After that, we design a new feature enhancement and erasure (FEE) module to learn discriminative features and mine additional potential valuable features. By using attention mechanism to discover the most salient feature regions and removing them with region-level erasure strategy, our FEE module is able to mine other potential useful features from the remaining parts. Further, we devise a novel contrastive learning (CL) module to encourage the foregrounds of salient and potential features to be closer, while pushing their foregrounds further away from background features. This manner compels our model to learn discriminative and valuable features more comprehensively. Extensive experiments demonstrate that DRTN method surpasses current MLIC models on three challenging benchmarks, i.e., MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE datasets.
Collapse
Affiliation(s)
- Wei Zhou
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China
| | - Kang Lin
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China
| | - Zhijie Zheng
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China
| | - Dihu Chen
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China
| | - Tao Su
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China.
| | - Haifeng Hu
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China.
| |
Collapse
|
25
|
Pan X, Jiao C, Yang B, Zhu H, Wu J. Attribute-guided feature fusion network with knowledge-inspired attention mechanism for multi-source remote sensing classification. Neural Netw 2025; 187:107332. [PMID: 40088832 DOI: 10.1016/j.neunet.2025.107332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 10/21/2024] [Accepted: 02/27/2025] [Indexed: 03/17/2025]
Abstract
Land use and land cover (LULC) classification is a popular research area in remote sensing. The information of single-modal data is insufficient for accurate classification, especially in complex scenes, while the complementarity of multi-modal data such as hyperspectral images (HSIs) and light detection and ranging (LiDAR) data could effectively improve classification performance. The attention mechanism has recently been widely used in multi-modal LULC classification methods to achieve better feature representation. However, the knowledge of data is insufficiently considered in these methods, such as spectral mixture in HSIs and inconsistent spatial scales of different categories in LiDAR data. Moreover, multi-modal features contain different physical attributes, HSI features can represent spectral information of several channels while LiDAR features focus on elevation information at the spatial dimension. Ignoring these attributes, feature fusion may introduce redundant information and effect detrimentally on classification. In this paper, we propose an attribute-guided feature fusion network with knowledge-inspired attention mechanisms, named AFNKA. Focusing on the spectral characteristics of HSI and elevation information of LiDAR data, we design the knowledge-inspired attention mechanism to explore enhanced features. Especially, a novel adaptive cosine estimator (ACE) based attention module is presented to learn features with more discriminability, which adequately utilizes the spatial-spectral correlation of HSI mixed pixels. In the fusion stage, two novel attribute-guided fusion modules are developed to selectively aggregate multi-modal features, which sufficiently exploit the correlations between the spatial-spectral property of HSI features and the spatial-elevation property of LiDAR features. Experimental results on several multi-source datasets quantitatively indicate that the proposed AFNKA significantly outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Xiao Pan
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Changzhe Jiao
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Bo Yang
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Hao Zhu
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China
| | - Jinjian Wu
- School of Artificial Intelligence, Xidian University, Xi'an 710119, China.
| |
Collapse
|
26
|
Cheng X, Jiao Y, Meiring RM, Sheng B, Zhang Y. Reliability and validity of current computer vision based motion capture systems in gait analysis: A systematic review. Gait Posture 2025; 120:150-160. [PMID: 40250127 DOI: 10.1016/j.gaitpost.2025.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 03/06/2025] [Accepted: 04/15/2025] [Indexed: 04/20/2025]
Abstract
BACKGROUND Traditional instrumented gait analysis (IGA) objectively quantifies gait deviations, but its clinical use is hindered by high cost, lab environment, and complex protocols. Pose estimation algorithm (PEA)-based gait analysis, which infers joint positions from videos, offers an accessible method to detect gait abnormalities and tailor rehabilitation strategies. However, its reliability and validity in gait analysis and algorithmic factors affecting accuracy have not been reviewed. RESEARCH QUESTION This systematic review aims to evaluate the accuracy of PEA-based gait analysis systems and to identify the algorithmic factors impacting their accuracy. METHOD A total of 644 articles were initially identified through Scopus, PubMed, and IEEE, with 20 meeting the inclusion and exclusion criteria. Reliability, validity, and algorithmic parameters were extracted for detailed review. RESULTS AND SIGNIFICANCE Most included articles focus on validity against the gold standard, while limited evidence makes it challenging to determine reliability. OpenCap demonstrated an MAE of 4.1° for 3D joint angles, but higher errors in rotational angles require further validation. OpenPose demonstrated ICCs of 0.89-0.994 for spatiotemporal parameters and MAE < 5.2° for 2D hip and knee joint angles in the sagittal plane (ICCs = 0.67-0.92, CCCs = 0.83-0.979), but ankle kinematics exhibited poor accuracy (ICCs = 0.37-0.57, MAEs = 3.1°-9.77°, CCCs = 0.51-0.936). PEA accuracy depends on camera settings, backbone architecture, and training datasets. This study reviews the accuracy of PEA-based gait analysis systems, supporting future research in gait-related clinical applications of PEA.
Collapse
Affiliation(s)
- Xingye Cheng
- Department of Exercise Sciences, Faculty of Science, University of Auckland, Auckland 1023, New Zealand
| | - Yiran Jiao
- Department of Exercise Sciences, Faculty of Science, University of Auckland, Auckland 1023, New Zealand
| | - Rebecca M Meiring
- Department of Exercise Sciences, Faculty of Science, University of Auckland, Auckland 1023, New Zealand; School of Physiology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Bo Sheng
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
| | - Yanxin Zhang
- Department of Exercise Sciences, Faculty of Science, University of Auckland, Auckland 1023, New Zealand.
| |
Collapse
|
27
|
Tian X, Li HD, Lin H, Li C, Wang YP, Bai HX, Lan W, Liu J. Inspired by pathogenic mechanisms: A novel gradual multi-modal fusion framework for mild cognitive impairment diagnosis. Neural Netw 2025; 187:107343. [PMID: 40081274 DOI: 10.1016/j.neunet.2025.107343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 01/06/2025] [Accepted: 03/02/2025] [Indexed: 03/15/2025]
Abstract
Mild cognitive impairment (MCI) is a precursor to Alzheimer's disease (AD), and its progression involves complex pathogenic mechanisms. Specifically, disturbed by gene variants, the regulation of gene expression ultimately changes brain structure, resulting in the progression of brain diseases. However, the existing works rarely take these mechanisms into account when designing their diagnosis methods. Therefore, we propose a novel gradual multi-modal fusion framework to fuse representative data from each stage of disease progression in hybrid feature space, including single nucleotide polymorphism (SNP), gene expression (GE), and magnetic resonance imaging (MRI). Specifically, to integrate genetic sequence and expression data, we design a SNP-GE fusion module, which performs multi-modal fusion to obtain genetic embedding by considering the relation between SNP and GE. Compared with SNP-GE fusion, representation of genetic embedding and MRI have more obvious heterogeneity, especially correlation with disease. Therefore, we propose to align the manifold of genetic and imaging representations, which can explore the high-order relationship between imaging and genetic data in the presence of modal heterogeneity. Our proposed framework was validated using the Alzheimer's Disease Neuroimaging Initiative dataset, and achieved diagnosis accuracy of 76.88%, 72.84%, 87.72%, and 95.00% for distinguishing MCI from control normal, lately MCI from early MCI, MCI from AD, and AD from control normal, respectively. Additionally, our proposed framework helps to identify some multi-modal biomarkers related to MCI progression. In summary, our proposed framework is effective not only for MCI diagnosis but also for guiding the further development of genetic and imaging-based brain studies. Our code is published at https://github.com/tianxu8822/workflow_MCI/tree/main/.
Collapse
Affiliation(s)
- Xu Tian
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Hong-Dong Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Hanhe Lin
- School of Science and Engineering, School of Medicine, University of Dundee, Dundee, DD1 4HN, United Kingdom
| | - Chao Li
- School of Science and Engineering, School of Medicine, University of Dundee, Dundee, DD1 4HN, United Kingdom; Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, CB3 0WA, United Kingdom
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, USA
| | - Harrison X Bai
- Department of Radiology and Radiological Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Wei Lan
- School of Computer, Electronics and Information, Guangxi University, Nanning, 530004, China
| | - Jin Liu
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of software, Xinjiang University, Urumqi, 830008, China.
| |
Collapse
|
28
|
Luo S, Yu J. A semantic enhancement-based multimodal network model for extracting information from evidence lists. Neural Netw 2025; 187:107387. [PMID: 40147163 DOI: 10.1016/j.neunet.2025.107387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 12/27/2024] [Accepted: 03/10/2025] [Indexed: 03/29/2025]
Abstract
Courts require the extraction of crucial information about various cases from heterogeneous evidence lists for knowledge-driven decision-making. However, traditional manual screening is complex and inaccurate when confronted with massive evidence lists and cannot meet the demands of legal judgment. Therefore, we propose a semantic enhancement-based multimodal network model (SEBM) to accurately extract critical information from evidence lists. First, we construct the entity semantic graph based on the differences among entity categories in the text content. Subsequently, we extract the features of multiple modalities within the document by employing distinct methods and guide the fusion of features within each modality to enhance the semantic association among them based on the constructed entity semantic graphs. Furthermore, the improved multimodal self-attention mechanism is employed to enhance the interactions between the various modal features, and the loss function combining Taylor polynomials and supervised contrast learning is utilized to reduce the information loss. Finally, SEBM is evaluated using the authentic Chinese evidence list dataset, which includes extensive entity details from diverse case types across multiple law firms. Results from experiments conducted on the authentic evidence list dataset demonstrate that our model performs better than other high-performing models.
Collapse
Affiliation(s)
- Shun Luo
- School of Economics and Management, Fuzhou University, No. 2, Wulongjiang North Avenue, Fuzhou, 350108, China.
| | - Juan Yu
- School of Economics and Management, Fuzhou University, No. 2, Wulongjiang North Avenue, Fuzhou, 350108, China.
| |
Collapse
|
29
|
Fang S, Xu W, Feng Z, Yuan S, Wang Y, Yang Y, Ding W, Zhou S. Arch-Net: Model conversion and quantization for architecture agnostic model deployment. Neural Netw 2025; 187:107384. [PMID: 40120552 DOI: 10.1016/j.neunet.2025.107384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 11/24/2024] [Accepted: 03/08/2025] [Indexed: 03/25/2025]
Abstract
The significant computational demands of Deep Neural Networks (DNNs) present a major challenge for their practical application. Recently, many Application-Specific Integrated Circuit (ASIC) chips have incorporated dedicated hardware support for neural network acceleration. However, the lengthy development cycle of ASIC chips means they often lag behind the latest advances in neural architecture research. For instance, Layer Normalization is not well-supported on many popular chips, and the efficiency of 7 × 7 convolution is significantly lower than the equivalent three 3 × 3 convolution. Therefore, in this paper, we introduce Arch-Net, a neural network framework comprised exclusively of a select few common operators, namely 3 × 3 Convolution, 2 × 2 Max-pooling, Batch Normalization, Fully Connected layers, and Concatenation, which are efficiently supported across the majority of ASIC architectures. To facilitate the conversion of disparate network architectures into Arch-Net, we propose the Arch-Distillation methodology, which incorporates strategies such as Residual Feature Adaptation and Teacher Attention Mechanism. These mechanisms enable effective conversion between different network structures alongside efficient model quantization. The resultant Arch-Net eliminates unconventional network constructs while maintaining robust performance even under sub-8-bit quantization, thereby enhancing compatibility and deployment efficiency. Empirical results from image classification and machine translation tasks demonstrate that using only a few types of operators in Arch-Net can achieve results comparable to those obtained with complex architectures. This provides a new insight for deploying structure-agnostic neural networks on various ASIC chips.
Collapse
Affiliation(s)
- Shuangkang Fang
- School of Electrical and Information Engineering, Beihang University, Beijing, 100191, China.
| | - Weixin Xu
- Megvii Research, Megvii Inc., Bejing, 100096, China.
| | - Zipeng Feng
- Megvii Research, Megvii Inc., Bejing, 100096, China.
| | - Song Yuan
- Megvii Research, Megvii Inc., Bejing, 100096, China.
| | - Yufeng Wang
- Institute of Unmanned System, Beihang University, Beijing, 100191, China.
| | - Yi Yang
- Megvii Research, Megvii Inc., Bejing, 100096, China.
| | - Wenrui Ding
- Institute of Unmanned System, Beihang University, Beijing, 100191, China.
| | - Shuchang Zhou
- Megvii Research, Megvii Inc., Bejing, 100096, China.
| |
Collapse
|
30
|
Erden MB, Cansiz S, Caki O, Khattak H, Etiz D, Yakar MC, Duruer K, Barut B, Gunduz-Demir C. FourierLoss: Shape-aware loss function with Fourier descriptors. Neurocomputing 2025; 638:130155. [DOI: 10.1016/j.neucom.2025.130155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
|
31
|
Zhang H, Xie Y, Zhang H, Xu C, Luo X, Chen D, Xu X, Zhang H, Heng PA, He S. Unambiguous granularity distillation for asymmetric image retrieval. Neural Netw 2025; 187:107303. [PMID: 40106931 DOI: 10.1016/j.neunet.2025.107303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 01/18/2025] [Accepted: 02/18/2025] [Indexed: 03/22/2025]
Abstract
Previous asymmetric image retrieval methods based on knowledge distillation have primarily focused on aligning the global features of two networks to transfer global semantic information from the gallery network to the query network. However, these methods often fail to effectively transfer local semantic information, limiting the fine-grained alignment of feature representation spaces between the two networks. To overcome this limitation, we propose a novel approach called Layered-Granularity Localized Distillation (GranDist). GranDist constructs layered feature representations that balance the richness of contextual information with the granularity of local features. As we progress through the layers, the contextual information becomes more detailed, but the semantic gap between networks can widen, complicating the transfer process. To address this challenge, GranDist decouples the feature maps at each layer to capture local features at different granularities and establishes distillation pipelines focused on effectively transferring these contextualized local features. In addition, we introduce an Unambiguous Localized Feature Selection (UnamSel) method, which leverages a well-trained fully connected layer to classify these contextual features as either ambiguous or unambiguous. By discarding the ambiguous features, we prevent the transfer of irrelevant or misleading information, such as background elements that are not pertinent to the retrieval task. Extensive experiments on various benchmark datasets demonstrate that our method outperforms state-of-the-art techniques and significantly enhances the performance of previous asymmetric retrieval approaches.
Collapse
Affiliation(s)
- Hongrui Zhang
- School of Future Technology, South China University of Technology, Guangzhou, China
| | - Yi Xie
- School of Future Technology, South China University of Technology, Guangzhou, China
| | - Haoquan Zhang
- School of Future Technology, South China University of Technology, Guangzhou, China
| | - Cheng Xu
- Center of Smart Health, Hong Kong Polytechnic University, Hong Kong, China
| | - Xuandi Luo
- School of Future Technology, South China University of Technology, Guangzhou, China
| | - Donglei Chen
- School of Future Technology, South China University of Technology, Guangzhou, China
| | - Xuemiao Xu
- School of Future Technology, South China University of Technology, Guangzhou, China.
| | - Huaidong Zhang
- School of Future Technology, South China University of Technology, Guangzhou, China.
| | | | | |
Collapse
|
32
|
Jia Y, Dong L, Jiao Y. Medical image classification based on contour processing attention mechanism. Comput Biol Med 2025; 191:110102. [PMID: 40203738 DOI: 10.1016/j.compbiomed.2025.110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 03/11/2025] [Accepted: 03/26/2025] [Indexed: 04/11/2025]
Abstract
Medical diagnosis, often constrained by a doctor's experience and capabilities, remains a critical challenge. In recent years, intelligent algorithms have emerged as promising tools to assist in improving diagnostic accuracy. Among these, medical image classification plays a pivotal role in enhancing diagnostic precision. This paper proposes a flexible and concise medical image classification method based on a contour processing attention mechanism, designed to improve accuracy by emphasizing target regions during image processing. First, the training images undergo sequential grayscale and binarization processes, after which binary images are subjected to opening and closing operations to generate two distinct contour maps. These contour maps are then concatenated with the grayscale image along the channel dimension to produce a feature map, which is subsequently convolved. Next, pixel-wise multiplication is performed between the resulting image containing contour information and the original training image, thereby enhancing the contour and positional information of the target regions. The enhanced image is then fed into a residual network for classification training, forming a model based on the contour processing attention mechanism. Finally, classification experiments using three different types of medical image datasets were conducted. The experimental results demonstrate that the contour processing attention mechanism significantly improves the performance of residual networks in medical image classification, achieving the 0.0368 increase in classification accuracy, the 0.0413 improvement in the F1 score and the 0.0821 improvement in the Kappa score. Furthermore, the proposed model demonstrates versatility, with potential applications not only in medical image classification but also in other domains, such as remote sensing, urban landscape analysis, and transportation vehicle image classification.
Collapse
Affiliation(s)
- Yongnan Jia
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, PR China; Key Laboratory of Knowledge Automation for Industrial Processes of Ministry of Education, University of Science and Technology Beijing, Beijing, 100083, PR China.
| | - Linjie Dong
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, PR China
| | - Yuhang Jiao
- School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, PR China
| |
Collapse
|
33
|
Chen T, Ma Y, Pan Z, Wang W, Yu J. Fusion of multi-scale feature extraction and adaptive multi-channel graph neural network for 12-lead ECG classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108725. [PMID: 40184850 DOI: 10.1016/j.cmpb.2025.108725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 03/14/2025] [Accepted: 03/14/2025] [Indexed: 04/07/2025]
Abstract
BACKGROUND AND OBJECTIVE The 12-lead electrocardiography (ECG) is a widely used diagnostic method in clinical practice for cardiovascular diseases. The potential correlation between interlead signals is an important reference for clinical diagnosis but is often overlooked by most deep learning methods. Although graph neural networks can capture the associations between leads through edge topology, the complex correlations inherent in 12-lead ECG may involve edge topology, node features, or their combination. METHODS In this study, we propose a multi-scale adaptive graph fusion network (MSAGFN) model, which fuses multi-scale feature extraction and adaptive multi-channel graph neural network (AMGNN) for 12-lead ECG classification. The proposed MSAGFN model first extracts multi-scale features individually from 12 leads and then utilizes these features as nodes to construct feature graphs and topology graphs. To efficiently capture the most correlated information from the feature graphs and topology graphs, AMGNN iteratively performs a series of graph operations to learn the final graph-level representations for prediction. Moreover, we incorporate consistency and disparity constraints into our model to further refine the learned features. RESULTS Our model was validated on the PTB-XL dataset, achieving an area under the receiver operating characteristic curve score of 0.937, mean accuracy of 0.894, and maximum F1 score of 0.815. These results surpass the corresponding metrics of state-of-the-art methods. Additionally, we conducted ablation studies to further demonstrate the effectiveness of our model. CONCLUSIONS Our study demonstrates that, in 12-lead ECG classification, by constructing topology graphs based on physiological relationships and feature graphs based on lead feature relationships, and effectively integrating them, we can fully explore and utilize the complementary characteristics of the two graph structures. By combining these structures, we construct a comprehensive data view, significantly enhancing the feature representation and classification accuracy.
Collapse
Affiliation(s)
- Teng Chen
- College of Computer Science & Technology, Qingdao University, Qingdao 266071, PR China.
| | - Yumei Ma
- College of Computer Science & Technology, Qingdao University, Qingdao 266071, PR China.
| | - Zhenkuan Pan
- College of Computer Science & Technology, Qingdao University, Qingdao 266071, PR China.
| | - Weining Wang
- College of Computer Science & Technology, Qingdao University, Qingdao 266071, PR China.
| | - Jinpeng Yu
- School of Automation, Qingdao University, Qingdao 266071, PR China.
| |
Collapse
|
34
|
Priego-Torres B, Sanchez-Morillo D, Khalili E, Conde-Sánchez MÁ, García-Gámez A, León-Jiménez A. Automated engineered-stone silicosis screening and staging using Deep Learning with X-rays. Comput Biol Med 2025; 191:110153. [PMID: 40252290 DOI: 10.1016/j.compbiomed.2025.110153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 03/09/2025] [Accepted: 04/04/2025] [Indexed: 04/21/2025]
Abstract
Silicosis, a debilitating occupational lung disease caused by inhaling crystalline silica, continues to be a significant global health issue, especially with the increasing use of engineered stone (ES) surfaces containing high silica content. Traditional diagnostic methods, dependent on radiological interpretation, have low sensitivity, especially, in the early stages of the disease, and present variability between evaluators. This study explores the efficacy of deep learning techniques in automating the screening and staging of silicosis using chest X-ray images. Utilizing a comprehensive dataset, obtained from the medical records of a cohort of workers exposed to artificial quartz conglomerates, we implemented a preprocessing stage for rib-cage segmentation, followed by classification using state-of-the-art deep learning models. The segmentation model exhibited high precision, ensuring accurate identification of thoracic structures. In the screening phase, our models achieved near-perfect accuracy, with ROC AUC values reaching 1.0, effectively distinguishing between healthy individuals and those with silicosis. The models demonstrated remarkable precision in the staging of the disease. Nevertheless, differentiating between simple silicosis and progressive massive fibrosis, the evolved and complicated form of the disease, presented certain difficulties, especially during the transitional period, when assessment can be significantly subjective. Notwithstanding these difficulties, the models achieved an accuracy of around 81% and ROC AUC scores nearing 0.93. This study highlights the potential of deep learning to generate clinical decision support tools to increase the accuracy and effectiveness in the diagnosis and staging of silicosis, whose early detection would allow the patient to be moved away from all sources of occupational exposure, therefore constituting a substantial advancement in occupational health diagnostics.
Collapse
Affiliation(s)
- Blanca Priego-Torres
- Bioengineering, Automation and Robotics Research Group, Department of Automation Engineering, Electronics and Computer Architecture and Networks, School of Engineering, University of Cadiz, Puerto Real, 11519, Cádiz, Spain; Biomedical Research and Innovation Institute of Cadiz (INiBICA), Puerta del Mar University Hospital, Cádiz, 11009, Spain.
| | - Daniel Sanchez-Morillo
- Bioengineering, Automation and Robotics Research Group, Department of Automation Engineering, Electronics and Computer Architecture and Networks, School of Engineering, University of Cadiz, Puerto Real, 11519, Cádiz, Spain; Biomedical Research and Innovation Institute of Cadiz (INiBICA), Puerta del Mar University Hospital, Cádiz, 11009, Spain
| | - Ebrahim Khalili
- Bioengineering, Automation and Robotics Research Group, Department of Automation Engineering, Electronics and Computer Architecture and Networks, School of Engineering, University of Cadiz, Puerto Real, 11519, Cádiz, Spain; Biomedical Research and Innovation Institute of Cadiz (INiBICA), Puerta del Mar University Hospital, Cádiz, 11009, Spain
| | | | - Andrés García-Gámez
- Radiology Department, Puerta del Mar University Hospital, Cádiz, 11009, Spain
| | - Antonio León-Jiménez
- Biomedical Research and Innovation Institute of Cadiz (INiBICA), Puerta del Mar University Hospital, Cádiz, 11009, Spain; Pulmonology Department, Puerta del Mar University Hospital, Cádiz, 11009, Spain
| |
Collapse
|
35
|
Wang Q, Huang T, Luo X, Luo X, Li X, Cao K, Li D, Shen L. An Efficient Acute Lymphoblastic Leukemia Screen Framework Based on Multi-Modal Deep Neural Network. Int J Lab Hematol 2025; 47:454-462. [PMID: 39810306 DOI: 10.1111/ijlh.14424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 11/18/2024] [Accepted: 12/19/2024] [Indexed: 01/16/2025]
Abstract
BACKGROUND Acute lymphoblastic leukemia (ALL) is a leading cause of death among pediatric malignancies. Early diagnosis of ALL is crucial for minimizing misdiagnosis, improving survival rates, and ensuring the implementation of precise treatment plans for patients. METHODS In this study, we propose a multi-modal deep neural network-based framework for early and efficient screening of ALL. Both white blood cell (WBC) scattergrams and complete blood count (CBC) are employed for ALL detection. The dataset comprises medical data from 233 patients with ALL, 283 patients with infectious mononucleosis (IM), and 183 healthy controls (HCs). RESULTS The combination of CBC data with WBC scattergrams achieved an accuracy of 98.43% in fivefold cross-validation and a sensitivity of 96.67% in external validation, demonstrating the efficacy of our method. Additionally, the area under the curve (AUC) of this model surpasses 0.99, outperforming well-trained medical technicians. CONCLUSIONS To the best of our knowledge, this framework is the first to incorporate WBC scattergrams with CBC data for ALL screening, proving to be an efficient method with enhanced sensitivity and specificity. Integrating this framework into the screening procedure shows promise for improving the early diagnosis of ALL and reducing the burden on medical technicians. The code and dataset are available at https://github.com/cvi-szu/ALL-Screening.
Collapse
Affiliation(s)
- Qiuming Wang
- Computer Vision Institute, College of Computer Science and Software, Shenzhen University, China
| | - Tao Huang
- Department of Laboratory Medicine, Shenzhen Children's Hospital, Shenzhen, China
| | - Xiaojuan Luo
- Department of Laboratory Medicine, Shenzhen Children's Hospital, Shenzhen, China
| | - Xiaoling Luo
- Computer Vision Institute, College of Computer Science and Software, Shenzhen University, China
| | - Xuechen Li
- School of Electronic and Information Engineering, Wuyi University, China
| | - Ke Cao
- Department of Laboratory Medicine, Shenzhen Children's Hospital, Shenzhen, China
| | - Defa Li
- Department of Laboratory Medicine, Shenzhen Children's Hospital, Shenzhen, China
| | - Linlin Shen
- Computer Vision Institute, College of Computer Science and Software, Shenzhen University, China
- Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, China
| |
Collapse
|
36
|
Gao G, Lv Z, Zhang Y, Qin AK. Advertising or adversarial? AdvSign: Artistic advertising sign camouflage for target physical attacking to object detector. Neural Netw 2025; 186:107271. [PMID: 40010291 DOI: 10.1016/j.neunet.2025.107271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 01/07/2025] [Accepted: 02/11/2025] [Indexed: 02/28/2025]
Abstract
Deep learning models are often vulnerable to adversarial attacks in both digital and physical environments. Particularly challenging are physical attacks that involve subtle, unobtrusive modifications to objects, such as patch-sticking or light-shooting, designed to maliciously alter the model's output when the scene is captured and fed into the model. Developing physical adversarial attacks that are robust, flexible, inconspicuous, and difficult to trace remains a significant challenge. To address this issue, we propose an artistic-based camouflage named Adversarial Advertising Sign (AdvSign) for object detection task, especially in autonomous driving scenarios. Generally, artistic patterns, such as brand logos and advertisement signs, always have a high tolerance for visual incongruity and are widely exist with strong unobtrusiveness. We design these patterns into advertising signs that can be attached to various mobile carriers, such as carry-bags and vehicle stickers, to create adversarial camouflage with strong untraceability. This method is particularly effective at misleading self-driving cars, for instance, causing them to misidentify these signs as 'stop' signs. Our approach combines a trainable adversarial patch with various signs of artistic patterns to create advertising patches. By leveraging the diversity and flexibility of these patterns, we draw attention away from the conspicuous adversarial elements, enhancing the effectiveness and subtlety of our attacks. We then use the CARLA autonomous-driving simulator to place these synthesized patches onto 3D flat surfaces in different traffic scenes, rendering 2D composite scene images from various perspectives. These varied scene images are then input into the target detector for adversarial training, resulting in the final trained adversarial patch. In particular, we introduce a novel loss with artistic pattern constraints, designed to differentially adjust pixels within and outside the advertising sign during training. Extensive experiments in both simulated (composite scene images with AdvSign) and real-world (printed AdvSign images) environments demonstrate the effectiveness of AdvSign in executing physical attacks on state-of-the-art object detectors, such as YOLOv5. Our training strategy, leveraging diverse scene images and varied artistic transformations to adversarial patches, enables seamless integration with multiple patterns. This enhances attack effectiveness across various physical settings and allows easy adaptation to new environments and artistic patterns.
Collapse
Affiliation(s)
- Guangyu Gao
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
| | - Zhuocheng Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yan Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - A K Qin
- Department of Computing Technologies, Swinburne University of Technology, Hawthorn, VIC 3122, Australia
| |
Collapse
|
37
|
Ning Z, Zhang Y, Zhang S, Lin X, Kang L, Duan N, Wang Z, Wu S. Deep learning-assisted cellular imaging for evaluating acrylamide toxicity through phenotypic changes. Food Chem Toxicol 2025; 200:115401. [PMID: 40118138 DOI: 10.1016/j.fct.2025.115401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Revised: 02/27/2025] [Accepted: 03/18/2025] [Indexed: 03/23/2025]
Abstract
Acrylamide (AA), a food hazard generated during thermal processing, poses significant safety risks due to its toxicity. Conventional methods for AA toxicology are time-consuming and inadequate for analyzing cellular morphology. This study developed a novel approach combining deep learning models (U-Net and ResNet34) with cell fluorescence imaging. U-Net was used for cell segmentation, generating a single-cell dataset, while ResNet34 trained the dataset over 200 epochs, achieving an 80 % validation accuracy. This method predicts AA concentration ranges by matching cell fluorescence features with the dataset and analyzes cellular phenotypic changes under AA exposure using k-means clustering and CellProfiler. The approach overcomes the limitations of traditional toxicological methods, offering a direct link between cell phenotypes and hazard toxicology. It provides a high-throughput, accurate solution to evaluate AA toxicology and refines the understanding of its cellular impacts.
Collapse
Affiliation(s)
- Zhiyuan Ning
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China
| | - Yingming Zhang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China
| | - Shikun Zhang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China
| | - Xianfeng Lin
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China
| | - Lixin Kang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China
| | - Nuo Duan
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China; International Joint Laboratory on Food Safety, Jiangnan University, Wuxi, 214122, China
| | - Zhouping Wang
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China; International Joint Laboratory on Food Safety, Jiangnan University, Wuxi, 214122, China
| | - Shijia Wu
- State Key Laboratory of Food Science and Resources, Jiangnan University, Wuxi, 214122, China; School of Food Science and Technology, Jiangnan University, Wuxi, 214122, China; International Joint Laboratory on Food Safety, Jiangnan University, Wuxi, 214122, China.
| |
Collapse
|
38
|
Yang Y, Pan H, Jiang QY, Xu Y, Tang J. Learning to Rebalance Multi-Modal Optimization by Adaptively Masking Subnetworks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:4553-4566. [PMID: 40184294 DOI: 10.1109/tpami.2025.3547417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/06/2025]
Abstract
Multi-modal learning aims to enhance performance by unifying models from various modalities but often faces the "modality imbalance" problem in real data, leading to a bias towards dominant modalities and neglecting others, thereby limiting its overall effectiveness. To address this challenge, the core idea is to balance the optimization of each modality to achieve a joint optimum. Existing approaches often employ a modal-level control mechanism for adjusting the update of each modal parameter. However, such a global-wise updating mechanism ignores the different importance of each parameter. Inspired by subnetwork optimization, we explore a uniform sampling-based optimization strategy and find it more effective than global-wise updating. According to the findings, we further propose a novel importance sampling-based, element-wise joint optimization method, called Adaptively Mask Subnetworks Considering Modal Significance (AMSS). Specifically, we incorporate mutual information rates to determine the modal significance and employ non-uniform adaptive sampling to select foreground subnetworks from each modality for parameter updates, thereby rebalancing multi-modal learning. Additionally, we demonstrate the reliability of the AMSS strategy through convergence analysis. Building upon theoretical insights, we further enhance the multi-modal mask subnetwork strategy using unbiased estimation, referred to as AMSS+. Extensive experiments reveal the superiority of our approach over comparison methods.
Collapse
|
39
|
Irfan M, Haq IU, Malik KM, Muhammad K. One-shot learning for generalization in medical image classification across modalities. Comput Med Imaging Graph 2025; 122:102507. [PMID: 40049026 DOI: 10.1016/j.compmedimag.2025.102507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 01/22/2025] [Accepted: 01/29/2025] [Indexed: 03/24/2025]
Abstract
Generalizability is one of the biggest challenges hindering the advancement of medical sensing technologies across multiple imaging modalities. This issue is further impaired when the imaging data is limited in scope or of poor quality. To tackle this, we propose a generalized and robust, lightweight one-shot learning method for medical image classification across various imaging modalities, including X-ray, microscopic, and CT scans. Our model introduces a collaborative one-shot training (COST) approach, incorporating both meta-learning and metric-learning. This approach allows for effective training on only one image per class. To ensure generalization with fewer epochs, we employ gradient generalization at dense and fully connected layers, utilizing a lightweight Siamese network with triplet loss and shared parameters. The proposed model was evaluated on 12 medical image datasets from MedMNIST2D, achieving an average accuracy of 91.5 % and area under the curve (AUC) of 0.89, outperforming state-of-the-art models such as ResNet-50 and AutoML by over 10 % on certain datasets. Further, in the OCTMNIST dataset, our model achieved an AUC of 0.91 compared to ResNet-50's 0.77. Ablation studies further validate the superiority of our approach, with the COST method showing significant improvement in convergence speed and accuracy when compared to traditional one-shot learning setups. Additionally, our model's lightweight architecture requires only 0.15 million parameters, making it well-suited for deployment on resource-constrained devices.
Collapse
Affiliation(s)
- Muhammad Irfan
- SMILES LAB, College of Innovation & Technology, University of Michigan-Flint, Flint, MI 48502, USA
| | - Ijaz Ul Haq
- SMILES LAB, College of Innovation & Technology, University of Michigan-Flint, Flint, MI 48502, USA
| | - Khalid Mahmood Malik
- SMILES LAB, College of Innovation & Technology, University of Michigan-Flint, Flint, MI 48502, USA.
| | - Khan Muhammad
- VIS2KNOW Lab, Department of Applied Artificial Intelligence, School of Convergence, College of Computing and Informatics, Sungkyunkwan University, Seoul 03063, South Korea.
| |
Collapse
|
40
|
Jiang Y, Sun N, Xie X, Yang F, Li T. ADFQ-ViT: Activation-Distribution-Friendly post-training Quantization for Vision Transformers. Neural Netw 2025; 186:107289. [PMID: 40010296 DOI: 10.1016/j.neunet.2025.107289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 12/16/2024] [Accepted: 02/13/2025] [Indexed: 02/28/2025]
Abstract
Vision Transformers (ViTs) have exhibited exceptional performance across diverse computer vision tasks, while their substantial parameter size incurs significantly increased memory and computational demands, impeding effective inference on resource-constrained devices. Quantization has emerged as a promising solution to mitigate these challenges, yet existing methods still suffer from significant accuracy loss at low-bit. We attribute this issue to the distinctive distributions of post-LayerNorm and post-GELU activations within ViTs, rendering conventional hardware-friendly quantizers ineffective, particularly in low-bit scenarios. To address this issue, we propose a novel framework called Activation-Distribution-Friendly post-training Quantization for Vision Transformers, ADFQ-ViT. Concretely, we introduce the Per-Patch Outlier-aware Quantizer to tackle irregular outliers in post-LayerNorm activations. This quantizer refines the granularity of the uniform quantizer to a per-patch level while retaining a minimal subset of values exceeding a threshold at full-precision. To handle the non-uniform distributions of post-GELU activations between positive and negative regions, we design the Shift-Log2 Quantizer, which shifts all elements to the positive region and then applies log2 quantization. Moreover, we present the Attention-score enhanced Module-wise Optimization which adjusts the parameters of each quantizer by reconstructing errors to further mitigate quantization error. Extensive experiments demonstrate ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit. Specifically, when quantizing the ViT-B model to 4-bit, we achieve a 5.17% improvement in Top-1 accuracy on the ImageNet dataset. Our code is available at: https://github.com/llwx593/adfq-vit.git.
Collapse
Affiliation(s)
- Yanfeng Jiang
- College of Computer Science, Nankai University, Tianjin, China; Tianjin Key Laboratory of Network and Data Security Technology, Tianjin, China
| | - Ning Sun
- Zhejiang Lab, Hangzhou, Zhejiang, China
| | | | - Fei Yang
- Zhejiang Lab, Hangzhou, Zhejiang, China.
| | - Tao Li
- College of Computer Science, Nankai University, Tianjin, China; Haihe Lab of ITAI, Tianjin, China.
| |
Collapse
|
41
|
Du P, Zhao S, Tan P, Sheng Z, Gan Z, Chen H, Li C. Towards a Theoretical Understanding of Semi-Supervised Learning Under Class Distribution Mismatch. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:4853-4868. [PMID: 40031570 DOI: 10.1109/tpami.2025.3545930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Semi-supervised learning (SSL) confronts a formidable challenge under class distribution mismatch, wherein unlabeled data contain numerous categories absent in the labeled dataset. Traditional SSL methods undergo performance deterioration in such mismatch scenarios due to the invasion of those instances from unknown categories. Despite some technical efforts to enhance SSL by mitigating the invasion, the profound theoretical analysis of SSL under class distribution mismatch is still under study. Accordingly, in this work, we propose Bi-Objective Optimization Mechanism (BOOM) to theoretically analyze the excess risk between the empirical optimal solution and the population-level optimal solution. Specifically, BOOM reveals that the SSL error is the essential contributor behind excess risk, resulting from both the pseudo-labeling error and invasion error. Meanwhile, BOOM unveils that the optimization objectives of SSL under mismatch are binary: high-quality pseudo-labels and adaptive weights on the unlabeled instances, which contribute to alleviating the pseudo-labeling error and the invasion error, respectively. Moreover, BOOM explicitly discovers the fundamental factors crucial for optimizing the bi-objectives, guided by which an approach is then proposed as a strong baseline for SSL under mismatch. Extensive experiments on benchmark and real datasets confirm the effectiveness of our proposed algorithm.
Collapse
|
42
|
Ghilea R, Rekik I. Replica tree-based federated learning using limited data. Neural Netw 2025; 186:107281. [PMID: 40015035 DOI: 10.1016/j.neunet.2025.107281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 11/07/2024] [Accepted: 02/12/2025] [Indexed: 03/01/2025]
Abstract
Learning from limited data has been extensively studied in machine learning, considering that deep neural networks achieve optimal performance when trained using a large amount of samples. Although various strategies have been proposed for centralized training, the topic of federated learning with small datasets remains largely unexplored. Moreover, in realistic scenarios, such as settings where medical institutions are involved, the number of participating clients is also constrained. In this work, we propose a novel federated learning framework, named RepTreeFL. At the core of the solution is the concept of a replica, where we replicate each participating client by copying its model architecture and perturbing its local data distribution. Our approach enables learning from limited data and a small number of clients by aggregating a larger number of models with diverse data distributions. Furthermore, we leverage the hierarchical structure of the clients network (both original and virtual), alongside the model diversity across replicas, and introduce a diversity-based tree aggregation, where replicas are combined in a tree-like manner and the aggregation weights are dynamically updated based on the model discrepancy. We evaluated our method on two tasks and two types of data, graph generation and image classification (binary and multi-class), with both homogeneous and heterogeneous model architectures. Experimental results demonstrate the effectiveness and outperformance of RepTreeFL in settings where both data and clients are limited.
Collapse
Affiliation(s)
- Ramona Ghilea
- BASIRA Lab, Imperial-X (I-X) and Department of Computing, Imperial College London, London, UK
| | - Islem Rekik
- BASIRA Lab, Imperial-X (I-X) and Department of Computing, Imperial College London, London, UK.
| |
Collapse
|
43
|
Zhou J, He Z, Zhang D, Liu S, Fu X, Li X. Spatial Residual for Underwater Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:4996-5013. [PMID: 40048345 DOI: 10.1109/tpami.2025.3548652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]
Abstract
Feature drift is caused by the dynamic coupling of target features and degradation factors, which reduce underwater detector performance. We redefine feature drift as the instability of target features within boundary constraints while solving partial differential equations (PDEs). From this insight, we propose the Spatial Residual (SR) block, which uses SkipCut to establish effective constraints across the network width for solving PDEs and optimizes the solution space. It is implemented as a general-purpose backbone with 5 Spatial Residuals (BSR5) for complex feature scenarios. Specifically, BSR5 extracts discrete channel slices through SkipCut, where each sliced feature is parsed within the appropriate data capacity. In gradient backpropagation, SkipCut functions as a ShortCut, optimizing information flow and gradient allocation to enhance performance and accelerate training. Experiments on the RUOD dataset show that BSR5-integrated DETRs and YOLOs achieve state-of-the-art results for conventional and end-to-end detectors. Specifically, our BSR5-DETR improves 1.3% and 2.7% AP than RT-DETR with ResNet-101, while reducing parameters by 41.6% and 6.6%, respectively. Further validation highlights BSR5's strong convergence and robustness, especially in training from scratch scenarios, making it well suited for data-scarce, resource-constrained, and real-time tasks.
Collapse
|
44
|
Xu X, Wang C, Yi Q, Ye J, Kong X, Ashraf SQ, Dearn KD, Hajiyavand AM. MedBin: A lightweight End-to-End model-based method for medical waste management. WASTE MANAGEMENT (NEW YORK, N.Y.) 2025; 200:114742. [PMID: 40088805 DOI: 10.1016/j.wasman.2025.114742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 03/04/2025] [Accepted: 03/07/2025] [Indexed: 03/17/2025]
Abstract
The surge in medical waste has highlighted the urgent need for cost-effective and advanced management solutions. In this paper, a novel medical waste management approach, "MedBin," is proposed for automated sorting, reusing, and recycling. A comprehensive medical waste dataset, "MedBin-Dataset" is established, comprising 2,119 original images spanning 36 categories, with samples captured in various backgrounds. The lightweight "MedBin-Net" model is introduced to enable detection and instance segmentation of medical waste, enhancing waste recognition capabilities. Experimental results demonstrate the effectiveness of the proposed approach, achieving an average precision of 0.91, recall of 0.97, and F1-score of 0.94 across all categories with just 2.51 M parameters (where M stands for million, i.e., 2.51 million parameters), 5.20G FLOPs (where G stands for billion, i.e., 5.20 billion floating-point operations per second), and 0.60 ms inference time. Additionally, the proposed method includes a World Health Organization (WHO) Guideline-Based Classifier that categorizes detected waste into 5 types, each with a corresponding disposal method, following WHO medical waste classification standards. The proposed method, along with the dedicated dataset, offers a promising solution that supports sustainable medical waste management and other related applications. To access the MedBin-Dataset samples, please visit https://universe.roboflow.com/uob-ylti8/medbin_dataset. The source code for MedBin-Net can be found at https://github.com/Wayne3918/MedbinNet.
Collapse
Affiliation(s)
- Xiazhen Xu
- Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Chenyang Wang
- Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Qiufeng Yi
- Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Jiaqi Ye
- Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Xiangfei Kong
- Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Shazad Q Ashraf
- Queen Elizabeth Hospital, Mindelsohn Way, Birmingham B15 2GW, UK
| | - Karl D Dearn
- Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham B15 2TT, UK
| | - Amir M Hajiyavand
- Department of Mechanical Engineering, School of Engineering, University of Birmingham, Birmingham B15 2TT, UK.
| |
Collapse
|
45
|
Xu J, Cao R, Luo P, Mu D. Break Adhesion: Triple adaptive-parsing for weakly supervised instance segmentation. Neural Netw 2025; 186:107215. [PMID: 39951880 DOI: 10.1016/j.neunet.2025.107215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 01/22/2025] [Accepted: 01/23/2025] [Indexed: 02/17/2025]
Abstract
Weakly supervised instance segmentation (WSIS) aims to identify individual instances from weakly supervised semantic segmentation precisely. Existing WSIS techniques primarily employ a unified, fixed threshold to identify all peaks in semantic maps. It may lead to potential missed or false detections due to the same category but with diverse visual characteristics. Moreover, previous methods apply a fixed augmentation strategy to broadly propagate peak cues to contributing regions, resulting in instance adhesion. To eliminate these manually fixed parsing patterns, we propose a triple adaptive-parsing network. Specifically, an adaptive Peak Perception Module (PPM) employs the average degree of feature as a learning base to infer the optimal threshold. Simultaneously, we propose the Shrinkage Loss function (SL) to minimize outlier responses that deviate from the mean. Finally, by eliminating uncertain adhesion, our method effectively obtains Reliable Inter-instance Relationships (RIR), enhancing the representation of instances. Extensive experiments on the Pascal VOC and COCO datasets show that the proposed method improves the accuracy by 2.1% and 4.3%, achieving the latest performance standard and significantly optimizing the instance segmentation task. The code is available at https://github.com/Elaineok/TAP.
Collapse
Affiliation(s)
- Jingting Xu
- School of Automation, Northwestern Polytechnical University, Xi'an, 710129, China.
| | - Rui Cao
- School of Computer Science and Technology, Northwest University, Xi'an, 710127, China.
| | - Peng Luo
- School of Automation, Northwestern Polytechnical University, Xi'an, 710129, China.
| | - Dejun Mu
- School of Automation, Northwestern Polytechnical University, Xi'an, 710129, China; Research & Development Institute of Northwestern Polytechnical University, Shenzhen, 518057, China.
| |
Collapse
|
46
|
Wang Q, Zhang S, Zeng D, Xie Z, Guo H, Zeng T, Fan FL. Don't fear peculiar activation functions: EUAF and beyond. Neural Netw 2025; 186:107258. [PMID: 39987712 DOI: 10.1016/j.neunet.2025.107258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 12/16/2024] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR-10, Tiny-ImageNet, and ImageNet. The models utilizing PEUAF achieve the best performance across several baseline industrial datasets. Specifically, in image datasets, the models that incorporate mixed activation functions (with PEUAF) exhibit competitive test accuracy despite the low accuracy of models with only PEUAF. Moreover, we significantly generalize the family of super-expressive activation functions, whose existence has been demonstrated in several recent works by showing that any continuous function can be approximated to any desired accuracy by a fixed-size network with a specific super-expressive activation function. Specifically, our work addresses two major bottlenecks in impeding the development of super-expressive activation functions: the limited identification of super-expressive functions, which raises doubts about their broad applicability, and their often peculiar forms, which lead to skepticism regarding their scalability and practicality in real-world applications.
Collapse
Affiliation(s)
- Qianchao Wang
- Center of Mathematical Artificial Intelligence, Department of Mathematics, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China
| | - Shijun Zhang
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| | - Dong Zeng
- Department of Biomedical Engineering, Southern Medical University, Guangzhou, China
| | - Zhaoheng Xie
- Institute of Medical Technology, Peking University Health Science Center, Peking University, Beijing, China
| | - Hengtao Guo
- Independent Researcher, 708 6th Ave N, Seattle, WA 98109, United States of America
| | - Tieyong Zeng
- Center of Mathematical Artificial Intelligence, Department of Mathematics, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China
| | - Feng-Lei Fan
- Center of Mathematical Artificial Intelligence, Department of Mathematics, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China.
| |
Collapse
|
47
|
Harris CE, Liu L, Almeida L, Kassick C, Makrogiannis S. Artificial intelligence in pediatric osteopenia diagnosis: evaluating deep network classification and model interpretability using wrist X-rays. Bone Rep 2025; 25:101845. [PMID: 40343188 PMCID: PMC12059325 DOI: 10.1016/j.bonr.2025.101845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 04/11/2025] [Accepted: 04/21/2025] [Indexed: 05/11/2025] Open
Abstract
Osteopenia is a bone disorder that causes low bone density and affects millions of people worldwide. Diagnosis of this condition is commonly achieved through clinical assessment of bone mineral density (BMD). State of the art machine learning (ML) techniques, such as convolutional neural networks (CNNs) and transformer models, have gained increasing popularity in medicine. In this work, we employ six deep networks for osteopenia vs. healthy bone classification using X-ray imaging from the pediatric wrist dataset GRAZPEDWRI-DX. We apply two explainable AI techniques to analyze and interpret visual explanations for network decisions. Experimental results show that deep networks are able to effectively learn osteopenic and healthy bone features, achieving high classification accuracy rates. Among the six evaluated networks, DenseNet201 with transfer learning yielded the top classification accuracy at 95.2 %. Furthermore, visual explanations of CNN decisions provide valuable insight into the blackbox inner workings and present interpretable results. Our evaluation of deep network classification results highlights their capability to accurately differentiate between osteopenic and healthy bones in pediatric wrist X-rays. The combination of high classification accuracy and interpretable visual explanations underscores the promise of incorporating machine learning techniques into clinical workflows for the early and accurate diagnosis of osteopenia.
Collapse
Affiliation(s)
- Chelsea E. Harris
- Division of Physics, Engineering, Mathematics, and Computer Science, Delaware State University, 1200 N. Dupont Hwy., Dover, 19901, DE, USA
| | - Lingling Liu
- Division of Physics, Engineering, Mathematics, and Computer Science, Delaware State University, 1200 N. Dupont Hwy., Dover, 19901, DE, USA
| | - Luiz Almeida
- Department of Orthopaedic Surgery, Duke University, 2080 Duke University Road, Durham, 27710, NC, USA
| | - Carolina Kassick
- Division of Physics, Engineering, Mathematics, and Computer Science, Delaware State University, 1200 N. Dupont Hwy., Dover, 19901, DE, USA
| | - Sokratis Makrogiannis
- Division of Physics, Engineering, Mathematics, and Computer Science, Delaware State University, 1200 N. Dupont Hwy., Dover, 19901, DE, USA
| |
Collapse
|
48
|
Morikawa T, Shingyouchi M, Ariizumi T, Watanabe A, Shibahara T, Katakura A. Performance of image processing analysis and a deep convolutional neural network for the classification of oral cancer in fluorescence visualization. Int J Oral Maxillofac Surg 2025; 54:511-518. [PMID: 39672733 DOI: 10.1016/j.ijom.2024.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 11/25/2024] [Accepted: 11/27/2024] [Indexed: 12/15/2024]
Abstract
The aim of this prospective study was to determine the effectiveness of screening using image processing analysis and a deep convolutional neural network (DCNN) to classify oral cancers using non-invasive fluorescence visualization. The study included 1076 patients with diseases of the oral mucosa (oral cancer, oral potentially malignant disorders (OPMDs), benign disease) or normal mucosa. For oral cancer, the rate of fluorescence visualization loss (FVL) was 96.9%. Regarding image processing, multivariate analysis identified FVL, the coefficient of variation of the G value (CV), and the G value ratio (VRatio) as factors significantly associated with oral cancer detection. The sensitivity and specificity for detecting oral cancer were 96.9% and 77.3% for FVL, 80.8% and 86.4% for CV, and 84.9% and 87.8% for VRatio, respectively. Regarding the performance of the DCNN for image classification, recall was 0.980 for oral cancer, 0.760 for OPMDs, 0.960 for benign disease, and 0.739 for normal mucosa. Precision was 0.803, 0.821, 0.842, and 0.941, respectively. The F-score was 0.883, 0.789, 0.897, and 0.828, respectively. Sensitivity and specificity for detecting oral cancer were 98.0% and 92.7%, respectively. The accuracy for all lesions was 0.851, average recall was 0.860, average precision was 0.852, and average F-score was 0.849.
Collapse
Affiliation(s)
- T Morikawa
- Department of Oral and Maxillofacial Surgery, Tokyo Dental College, Tokyo, Japan; Oral and Maxillofacial Surgery, Mitsuwadai General Hospital, Chiba, Japan.
| | - M Shingyouchi
- Department of Oral and Maxillofacial Surgery, Tokyo Dental College, Tokyo, Japan
| | - T Ariizumi
- Department of Oral and Maxillofacial Surgery, Tokyo Dental College, Tokyo, Japan
| | - A Watanabe
- Department of Oral and Maxillofacial Surgery, Tokyo Dental College, Tokyo, Japan
| | - T Shibahara
- Department of Oral and Maxillofacial Surgery, Tokyo Dental College, Tokyo, Japan
| | - A Katakura
- Department of Oral Pathobiological Science and Surgery, Tokyo Dental College, Tokyo, Japan
| |
Collapse
|
49
|
Lv J, Wu L, Hong C, Wang H, Wu Z, Chen H, Liu Z. Multi-class brain malignant tumor diagnosis in magnetic resonance imaging using convolutional neural networks. Brain Res Bull 2025; 225:111329. [PMID: 40180191 DOI: 10.1016/j.brainresbull.2025.111329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Revised: 02/28/2025] [Accepted: 03/31/2025] [Indexed: 04/05/2025]
Abstract
Glioblastoma (GBM), primary central nervous system lymphoma (PCNSL), and brain metastases (BM) are common malignant brain tumors with similar radiological features, while the accurate and non-invasive dialgnosis is essential for selecting appropriate treatment plans. This study develops a deep learning model, FoTNet, to improve the automatic diagnosis accuracy of these tumors, particularly for the relatively rare PCNSL tumor. The model integrates a frequency-based channel attention layer and the focal loss to address the class imbalance issue caused by the limited samples of PCNSL. A multi-center MRI dataset was constructed by collecting and integrating data from Sir Run Run Shaw Hospital, along with public datasets from UPENN and TCGA. The dataset includes T1-weighted contrast-enhanced (T1-CE) MRI images from 58 GBM, 82 PCNSL, and 269 BM cases, which were divided into training and testing sets with a 5:2 ratio. FoTNet achieved a classification accuracy of 92.5 % and an average AUC of 0.9754 on the test set, significantly outperforming existing machine learning and deep learning methods in distinguishing among GBM, PCNSL, and BM. Through multiple validations, FoTNet has proven to be an effective and robust tool for accurately classifying these brain tumors, providing strong support for preoperative diagnosis and assisting clinicians in making more informed treatment decisions.
Collapse
Affiliation(s)
- Junhui Lv
- Department of Neurosurgery, Sir Run Run Shaw Hospital, College of Medicine, Zhejiang University, Qingchun Road, No. 3, Hangzhou, Zhejiang 310016, China.
| | - Liyang Wu
- School of Life and Environmental Sciences, Guilin University of Electronic Technology, Jinji Road No.1, Guilin, Guangxi 541004, China.
| | - Chenyi Hong
- Zhejiang University - Universityof Illinois Urbana-Champaign Institute,Zhejiang University, Haizhou East Road No. 718, Haining, Zhejiang 314400, China.
| | - Hualiang Wang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, Hong Kong SAR.
| | - Zhuoxuan Wu
- Department of Medical Oncology, Sir Run Run Shaw Hospital, College of Medicine, Zhejiang University, Qingchun Road, No. 3, Hangzhou, Zhejiang 310016, China.
| | - Hongbo Chen
- School of Life and Environmental Sciences, Guilin University of Electronic Technology, Jinji Road No.1, Guilin, Guangxi 541004, China.
| | - Zuozhu Liu
- Zhejiang University - Universityof Illinois Urbana-Champaign Institute,Zhejiang University, Haizhou East Road No. 718, Haining, Zhejiang 314400, China.
| |
Collapse
|
50
|
Sobhi N, Sadeghi-Bazargani Y, Mirzaei M, Abdollahi M, Jafarizadeh A, Pedrammehr S, Alizadehsani R, Tan RS, Islam SMS, Acharya UR. Artificial intelligence for early detection of diabetes mellitus complications via retinal imaging. J Diabetes Metab Disord 2025; 24:104. [PMID: 40224528 PMCID: PMC11993533 DOI: 10.1007/s40200-025-01596-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 02/23/2025] [Indexed: 04/15/2025]
Abstract
Background Diabetes mellitus (DM) increases the risk of vascular complications, and retinal vasculature imaging serves as a valuable indicator of both microvascular and macrovascular health. Moreover, artificial intelligence (AI)-enabled systems developed for high-throughput detection of diabetic retinopathy (DR) using digitized retinal images have become clinically adopted. This study reviews AI applications using retinal images for DM-related complications, highlighting advancements beyond DR screening, diagnosis, and prognosis, and addresses implementation challenges, such as ethics, data privacy, equitable access, and explainability. Methods We conducted a thorough literature search across several databases, including PubMed, Scopus, and Web of Science, focusing on studies involving diabetes, the retina, and artificial intelligence. We reviewed the original research based on their methodology, AI algorithms, data processing techniques, and validation procedures to ensure a detailed analysis of AI applications in diabetic retinal imaging. Results Retinal images can be used to diagnose DM complications including DR, neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as to predict the risk of cardiovascular events. Beyond DR screening, AI integration also offers significant potential to address the challenges in the comprehensive care of patients with DM. Conclusion With the ability to evaluate the patient's health status in relation to DM complications as well as risk prognostication of future cardiovascular complications, AI-assisted retinal image analysis has the potential to become a central tool for modern personalized medicine in patients with DM.
Collapse
Affiliation(s)
- Navid Sobhi
- Nikookari Eye Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | | | - Majid Mirzaei
- Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mirsaeed Abdollahi
- Nikookari Eye Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Ali Jafarizadeh
- Nikookari Eye Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Siamak Pedrammehr
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, 75 Pigdons Rd, Waurn Ponds, VIC 3216 Australia
- Faculty of Design, Tabriz Islamic Art University, Tabriz, Iran
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, 75 Pigdons Rd, Waurn Ponds, VIC 3216 Australia
| | - Ru-San Tan
- National Heart Centre Singapore, Singapore, Singapore
- Duke-NUS Medical School, Singapore, Singapore
| | - Sheikh Mohammed Shariful Islam
- Institute for Physical Activity and Nutrition, School of Exercise and Nutrition Sciences, Deakin University, Melbourne, VIC Australia
- Cardiovascular Division, The George Institute for Global Health, Newtown, Australia
- Sydney Medical School, University of Sydney, Camperdown, Australia
| | - U. Rajendra Acharya
- School of Mathematics, Physics, and Computing, University of Southern Queensland, Springfield, QLD 4300 Australia
- Centre for Health Research, University of Southern Queensland, Springfield, Australia
| |
Collapse
|